diff --git a/AdjointFunctorTheorems.tex b/AdjointFunctorTheorems.tex new file mode 100644 index 0000000..8f36acf --- /dev/null +++ b/AdjointFunctorTheorems.tex @@ -0,0 +1,410 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{tikz-cd} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\newmdtheoremenv{defn}{Definition} +\newmdtheoremenv{thm}{Theorem} + +\title{Motivation for the General Adjoint Functor Theorem} +\author{Patrick Stevens} +\date{24th December 2015} + +\begin{document} + +\maketitle +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/AdjointFunctorTheorems/AdjointFunctorTheorems.pdf} \end{center} + +\normalsize +\emph{You should draw diagrams yourself throughout this document. It will be unreadable as mere symbols.} + +\section{Primeval Adjoint Functor Theorem} + +Recall the theorem (``RAPL'') that Right Adjoints Preserve Limits. + +The PAFT is an attempt at a converse to this theorem. +It states that if $G: \mathcal{D} \to \mathcal{C}$ preserves limits, and $\mathcal{D}$ is small and complete, then $G$ is a right adjoint. + +The problem with the PAFT is that it's actually a very weak theorem: small complete categories are preorders. + +So we want to weaken that requirement of smallness and completeness. +How do we usually weaken smallness? +The next best thing is local smallness, but we're going to lose something if we just replace ``small'' by ``locally small''. + +\section{General Adjoint Functor Theorem} +The question is therefore, ``what requirement do we need to add to augment local smallness?'' + +Recall the most ``concrete'' definition of a left adjoint to $G: \mathcal{D} \to \mathcal{C}$: it is a specification of $F: \mathcal{C} \to \mathcal{D}$ together with $\eta_A: A \to GFA$ for each $A$, such that any $g: A \to GB$ has a unique $h: FA \to B$ with $(Gh) \circ \eta_A = g$. + +\[ +\begin{tikzcd} +FA + \arrow[r, dashrightarrow, "h"'] +& B +\\ +GFA + \arrow[r, "Gh"] +& GB +\\ +A + \arrow[u, "\eta_A"] + \arrow[ur, "f"'] +& +\end{tikzcd} +\] + +This definition is how I remember what an adjoint is, and it very closely parallels the UMP of the free group on a given set. + +Now, by Then A Miracle Occurs\footnote{\tiny{\url{https://web.archive.org/web/20070703151645/http://star.psy.ohio-state.edu/coglab/Miracle.html}}}, I define the following Tiny Set Condition: + +\ + +\begin{defn}[Tiny Set Condition] A functor $G: \mathcal{D} \to \mathcal{C}$ has the TSC iff for every object $A \in \mathcal{C}$, there is $B \in \mathcal{D}$ and $\eta_A : A \to GB$ such that every $g: A \to GX$ factors as $A \xrightarrow[\eta_A]{} GB \xrightarrow[Gh]{} GX$ for some $h: B \to X$. + +\[ +\begin{tikzcd} +B + \arrow[r, "h"'] +& X +\\ +GB + \arrow[r, "Gh"] +& GX +\\ +A + \arrow[u, "\eta_A"] + \arrow[ur, "g"'] +& +\end{tikzcd} +\] +\end{defn} + +\ + +Notice how closely this mirrors the definition of an adjoint: it is a very slight weakening of my favourite definition, in that we don't require $h$ to be unique. +In particular, using that definition it is a completely content-free statement that if $G$ has a left adjoint, then it satisfies the TSC: simply take $\eta_A$ to be the unit of the adjunction. + +We are assuming access to RAPL, so this gives the following theorem: + +\ + +\begin{thm} A functor $G: \mathcal{D} \to \mathcal{C}$, which has a left adjoint, must satisfy the TSC and must preserve small limits. +\end{thm} + +\ + +\subsection{Relation to the GAFT} + +\emph{This is an overview of where we are heading.} + +The GAFT talks about what happens if we replace the TSC with a weaker condition: + +\ + +\begin{defn}[Solution Set Condition] A functor $G: \mathcal{D} \to \mathcal{C}$ has the SSC iff for every object $A \in \mathcal{C}$, there is a set $\{ B_i \in \mathcal{D} : i \in I \}$ and $\{ \eta_A^i : A \to G B_i : i \in I \}$ such that all $f: A \to GB$ factor as some $A \xrightarrow[\eta_A^i]{} G B_i \xrightarrow[Gh]{} GB$. +\end{defn} + +\ + +That is, we relax the uniqueness of $B$ in the statement of the TSC. + +Then the GAFT states that: + +\ + +\begin{thm}[General adjoint functor theorem] A functor $G: \mathcal{D} \to \mathcal{C}$ (where $\mathcal{D}$ is complete and locally small) has a left adjoint iff it satisfies the SSC and preserves small limits. +\end{thm} + +\ + +\section{Converse to Theorem 1} + +We wish to show the following: + +\ + +\begin{thm}[Tiny Adjoint Functor Theorem] If $G: \mathcal{D} \to \mathcal{C}$ has the TSC and preserves small limits, and $\mathcal{D}$ is complete and locally small, then $G$ has a left adjoint. +\end{thm} + +\ + +This will give us a slightly more general version of the PAFT, because we've relaxed smallness of $\mathcal{D}$ and still given an equivalent condition for being an adjoint. + +To be clear, what we have done is taken the PAFT, replaced ``small'' with ``locally small'', and imposed the TSC. +The TSC is ostensibly only a slight weakening of the definition of an adjoint, so it's not too much to ask that they would be equivalent. + +I will write $FA$ for the object $B$ guaranteed by the TSC; this anticipates the definition of $F$ as a functor. + +\subsection{Proof} +Recall the theorem that a specification of a left adjoint is equivalent to a specification of an initial object of $(A \downarrow G)$ for each object $A \in \mathcal{C}$. +There's one obvious choice for such an initial object: $(FA, \eta_A)$. +(Note for the future that this might not actually be initial, but it's the obvious choice.) +Is this actually initial? + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +A + \arrow[dr, "f"] + \arrow[d, "\eta_A"] +& +\\ +GFA + \arrow[r, dashrightarrow] +& GX +\end{tikzcd} +\] + +Well, it certainly has an arrow from it into any other $(X \in \mathcal{D}, f: A \to GX)$, because that's just the statement of the TSC: any arrow $f: A \to GX$ factors through the map $\eta_A : A \to GFA$. + +Is that arrow unique? +Well, OK, maybe it isn't. The TSC didn't tell us much, after all. +But we are in a complete and locally small category, so what we can do is equalise out all the ways in which the arrow fails to be unique. + +\subsubsection{Try and make the arrow unique} +Say $\{ G(h_i): i \in I \}$ is the set of distinct arrows $GFA \to GX$ with $G(h_i) \eta_A = f$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +A + \arrow[dr, "f"] + \arrow[d, "\eta_A"] +& +\\ +GFA + \arrow[r, shift left, "G(h_1)"] + \arrow[r, shift right, "G(h_2)"'] +& GX +\end{tikzcd} +\] + +Let $E$ be the ``industrial-strength equaliser'' of the $h_i$ in $\mathcal{D}$: an arrow $e: E \to FA$ such that $h_i e = h_j e$ for all $i, j$. + +Since $G$ preserves limits, $Ge: GE \to GFA$ must be an industrial-strength equaliser of the $G(h_i)$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dr, "f"] + \arrow[d, "\eta_A"] +& +\\ +GE + \arrow[r, "Ge"'] +& GFA + \arrow[r, shift left, "G(h_1)"] + \arrow[r, shift right, "G(h_2)"'] +& GX +\end{tikzcd} +\] + +Since $\eta_A$ equalises all the $G(h_i)$, it must lift uniquely over $Ge$: say $\eta_A = G(e) \circ \overline{\eta_A}$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dl, "\overline{\eta_A}"'] + \arrow[dr, "f"] + \arrow[d, "\eta_A"] +& +\\ +GE + \arrow[r, "Ge"'] +& GFA + \arrow[r, shift left, "G(h_1)"] + \arrow[r, shift right, "G(h_2)"'] +& GX +\end{tikzcd} +\] + +Now, in our dream world, the $G(h_i)$ would all be equal: that is, $Ge$ would be an iso. +Can we find an inverse? +The TSC tells us that there is an arrow $G(\gamma) : GFA \to GE$ such that $G(\gamma) \eta_A = \overline{\eta_A}$. +That is, $G(e) G(\gamma) \eta_A = \eta_A$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dl, "\overline{\eta_A}"'] + \arrow[dr, "f"] + \arrow[d, "\eta_A"] +& +\\ +GE + \arrow[r, shift left, "Ge"] +& GFA + \arrow[l, shift left, "G(\gamma)"] + \arrow[r, shift left, "G(h_1)"] + \arrow[r, shift right, "G(h_2)"'] +& GX +\end{tikzcd} +\] + +This is as far as we can go, because of the weakness of the TSC, telling us nothing about the arrows whose existence it guarantees. + +\subsubsection{Try and produce an object which works} +We see that $G(e) G(\gamma)$ is a map $GFA \to GFA$, and we wanted it to be the identity. + +What we could do is equalise out all the maps $GFA \to GFA$, and that would tell us that $\eta_A$ lifted over the equaliser. +This would produce an object which we really hope would actually have the uniqueness property. + +Let $G(i): GI \to GFA$ be the industrial strength equaliser of all maps $r_i: FA \to FA$ such that $G(r_i) \eta_A = \eta_A$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dr, "\eta_A"] + \arrow[d, "\eta_A"] +& +\\ +GI + \arrow[r, "G(i)"'] +& GFA + \arrow[r, shift left, "G(r_1)"] + \arrow[r, shift right, "G(r_2)"'] +& GFA +\end{tikzcd} +\] + +Now $G(e \gamma) \eta_A = \eta_A$, so $\eta_A$ lifts over $Gi$ to $\eta_A': A \to GI$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dl, "\eta'_A"'] + \arrow[dr, "\eta_A"] + \arrow[d, "\eta_A"] +& +\\ +GI + \arrow[r, "G(i)"'] +& GFA + \arrow[r, shift left, "G(r_1)"] + \arrow[r, shift right, "G(r_2)"'] +& GFA +\end{tikzcd} +\] + + +Finally, I claim that this is initial. Indeed, it certainly has maps into every $(GX, f)$, because we can just compose $A$'s TSC-map with $G(i)$. + +\[ +\begin{tikzcd}[row sep=large, column sep=huge] +& A + \arrow[dl, "\eta'_A"'] + \arrow[d, "\eta_A"] + \arrow[dr, "\eta_A"'] + \arrow[drr, "f"] +& +& +\\ +GI + \arrow[r, "G(i)"'] +& GFA + \arrow[r, shift left, "G(r_1)"] + \arrow[r, shift right, "G(r_2)"'] +& GFA + \arrow[r, "G(h)"'] +& GX +\end{tikzcd} +\] + +The map is unique: if we had $G(x), G(y): GI \to GX$ such that $G(x)\eta_A' = G(y) \eta_A' = f$, say, then make their equaliser $G(e)$ as above. +$\eta_A'$ lifts over $G(e): GE \to GI$, say to $\eta_A'': A \to GE$. + +\[ +\begin{tikzcd}[row sep=large, column sep=large] +& A + \arrow[dl, "\eta''_A"'] + \arrow[dr, "f"] + \arrow[d, "\eta'_A"] +& +\\ +GE + \arrow[r, "G(e)"'] +& GI + \arrow[r, shift left, "G(x)"] + \arrow[r, shift right, "G(y)"'] +& GX +\end{tikzcd} +\] + +By the TSC, there is a map $G(\gamma) : GFA \to GE$ with $G(\gamma) \eta_A = \eta_A''$. + +So $G(i) G(e) G(\gamma) \eta_A = \eta_A$. + +\[ +\begin{tikzcd}[row sep=large, column sep=huge] +& & & A + \arrow[dlll, "\eta'_A"'] + \arrow[dll, "\eta_A"] + \arrow[dl, "\eta''_A"] + \arrow[d, "\eta'_A"] + \arrow[dr, "f"] +& +\\ +GI + \arrow[r, "G(i)"'] +& GFA + \arrow[r, "G(\gamma)"'] +& GE + \arrow[r, "G(e)"'] +& GI + \arrow[r, shift left, "G(x)"] + \arrow[r, shift right, "G(y)"'] +& GX +\end{tikzcd} +\] + +Now $G(i)$ is an industrial equaliser of things of this form, so $G(i) G(e) G(\gamma) G(i) = G(i)$; $G(i)$ is monic and so $G(e) G(\gamma) G(i) = 1_{GI}$. + +Therefore $G(e)$ is split epic; it's also monic because it's an equaliser, so it's iso. +That is, $G(x), G(y)$ are equal after all, because an equaliser is iso iff the arrows it's equalising are the same. + +\subsection{Summary} +We've seen that $G: \mathcal{D} \to \mathcal{C}$ (where $\mathcal{D}$ is complete and locally small) has a left adjoint iff it preserves small limits and has the TSC. + +Notice how often in the proof we had lines like ``this thing lifts over this thing'': effectively telling us that the limits we had constructed were genuinely members of $(A \downarrow G)$, so that we could use the TSC. +There is actually a theorem about this, and it follows basically the same pattern as those lines do. +It tells us that limits in the arrow category are basically the same as limits in $\mathcal{D}$. +However, I wanted to stay as concrete as possible here. + +\section{More generality} + +This condition has told us that ``if you look like you've got a unit of an adjunction, then you really do have a unit''. +That's still quite restrictive, and it turns out to be possible to relax the TSC to the SSC. + +Recall that the difference between the TSC and the SSC is just that the SSC asserts the existence of several objects which work, rather than just one. + +\ + +\begin{thm}[GAFT from TAFT] If $G: \mathcal{D} \to \mathcal{C}$ has the SSC and preserves products, and $\mathcal{D}$ has products and is locally small, then $G$ has the TSC. +\end{thm} + +\ + +The proof is by the only way we have of combining objects: that is, by taking products. + +If $G$ has the SSC, we construct for each $A$ an object $P = \prod_j B_j$. +Since $G$ preserves products, $GP$ is a product of the $G B_i$. +Define $\eta_A: A \to GP$ componentwise by $\pi_{G B_i} \eta_A = \eta_A^i$. + +Now any $f:A \to GB$ factors as $A \xrightarrow[\eta_A^i]{} G B_i \xrightarrow[Gh]{} GB$, because we had a solution set. +Therefore it factors as $A \xrightarrow[\pi_{G B_i} \eta_A]{} G B_i \xrightarrow[Gh]{} GB$; this is already a factorisation through $P$ in the first arrow! +(Of course, the remaining components of $A \to GP$ are defined by anything we like; the solution-set gives us arrows $A \to G B_j$ for $j \not = i$.) + +So we are done: given the SSC, we have produced an object $P$ and arrow $\eta_A: A \to GP$ which together witness that $G$ satisfies the TSC. + +Combining Theorem 4 with Theorem 3 gives exactly the General Adjoint Functor Theorem. + +\end{document} \ No newline at end of file diff --git a/EmbedMMIntoTuringMachine.tex b/EmbedMMIntoTuringMachine.tex new file mode 100644 index 0000000..aabcf54 --- /dev/null +++ b/EmbedMMIntoTuringMachine.tex @@ -0,0 +1,488 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{xparse} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +% Following code for \machine is from http://tex.stackexchange.com/a/305621/103801 + +\ExplSyntaxOn + +\int_new:N\g_patrick_int% + +\NewDocumentCommand{\machine}{m}{% + \clist_set:Nn \l_tmpa_clist {#1} + \int_gzero:N \g_patrick_int + \begin{array}{|c|} + \hline + \prg_replicate:nn { \clist_count:N \l_tmpa_clist } {% + \int_gincr:N \g_patrick_int + \clist_item:Nn \l_tmpa_clist { \g_patrick_int } \\ + \hline + } + \end{array} +} + +\ExplSyntaxOff + +% end of \machine code + +\newmdtheoremenv{defn}{Definition} + +\theoremstyle{remark} +\newtheorem*{remark}{Remark} +\newtheorem*{example}{Example} + +\title{Embedding a modular machine into a group} +\author{Patrick Stevens} +\date{21st April 2016} + +\begin{document} + +\maketitle +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/ModularMachines/EmbedMMIntoTuringMachine.pdf} \end{center} +\normalsize + +\section{Introduction} +This document was born from a set of lectures I attended, given by Dr Maurice Chiodo +in his half of the Part III Maths course \emph{Infinite Groups and Decision Problems}, which he lectured jointly with Dr Jack Button in 2016. +The treatment of modular machines was a bit heavy in text, and they are actually fairly intuitive objects, so I decided to expand on them here. + +A warning for those who are following the official course notes: I accidentally interchanged $L$ and $R$ in my definition of a modular machine below, relative to the official notes. +It doesn't change any of the ideas, and is just a different labelling convention, but by the time it was pointed out to me, most of the document was written, and I can't be bothered to fix it now. +Use at your own risk. + +Mistakes in this article are due entirely to me; thanks to Joshua Hunt and Daniel Zheng for catching some before this went to press. + +\section{What is a modular machine?} + +A modular machine is a Turing-equivalent form of computation. +They operate on two tapes, which are each bounded at one end. +Each cell of a tape may be filled with an integer from $0$ to $m-1$ inclusive; this is where the name ``modular'' comes from. + +$$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, a_n} \ \machine{\ , b_1, b_2, \vdots, b_{k-1}, b_k}$$ + +The machine is considered to have a ``head'' which is looking at the bottom two cells (which are on the edge of the tape); in this instance, the head is looking at $a_n$ and $b_k$. +The machine also has a list of instructions, of the form $$(\alpha, \beta, (x, y), L)$$ or $$(\alpha, \beta, (x,y), R)$$ +These instructions are constantly active, and the machine just does whichever it can at each stage. +To ensure that the machine only has one thing it can do at any time, +we insist any $\alpha, \beta$ may only have one instruction in which they appear in the first two places. +For example, if $(1,2,(3,3),L)$ is an instruction, then $(1,2,(5,2),R)$ cannot also be an instruction. + +Since $\alpha, \beta$ will appear on the tape during execution, we require that $0 \leq \alpha, \beta < m$; +since $x, y$ will be written onto the tape during execution, we require that $0 \leq x, y < m$. + +When the head sees entries $(\alpha = a_n, \beta = b_k)$ and it has the instruction $$(a_n, b_k, (x, y), L)$$ +it executes the following procedure: + +\begin{enumerate} +\item Shift the left-hand tape up one: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, a_n, \ } \ \machine{\ , \ , b_1, b_2, \vdots, b_{k-1}, b_k}$$ +\item Write $x, y$ into the left-hand tape's bottom cells: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, x, y} \ \machine{\ , \ , b_1, b_2, \vdots, b_{k-1}, b_k}$$ +\item Shift the right-hand tape down one: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, x, y} \ \machine{\ ,\ , \ , b_1, b_2, \vdots, b_{k-1}}$$ +\end{enumerate} + +If, instead, the instruction was $$(a_n, b_k, (x, y), R)$$ then the same procedure would be carried out, but with right and left interchanged. +This would result in the final state $$\machine{\ , a_1, a_2, a_3, \vdots, a_{n-1}} \ \machine{b_1, b_2, \vdots, b_{k-1}, x, y}$$ + +\begin{remark} +The very convenient thing about modular machines is that they can be easily coded into numbers. +All the state can be tracked with just two numbers: $$(A, B) := (\sum_{i=0}^n a_i m^{n-i}, \sum_{i=0}^k b_i m^i)$$ +and, for instance, the operation $(\alpha, \beta, (x,y), L)$ produces $$([A-(A \mod{m})]m + xm+y, [B-(B \mod{m})]/m)$$ +We could even collapse $(x, y)$ into a single integer $xm+y$ which is less than $m^2$. +\end{remark} + +\section{Turing equivalence} + +We will take our Turing machines to be in ``quintuple form'': they consist of a list of instructions of the form $$(q, a, a', q', L/R)$$ +where: +\begin{itemize} +\item $q$ is the current state +\item $a$ is the symbol under the head +\item $a'$ is the symbol to write to the tape +\item $q'$ is the state to move to +\item $L/R$ is the direction the head moves after this instruction executes. +\end{itemize} + +Note that the machine always writes and always moves on every execution step. + +We may implement a Turing machine as a modular machine as follows. + +\ + +\begin{defn}[Instantaneous description] +An \emph{instantaneous description} of a Turing machine is a string of the form +$$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$ +where $s_i$ are the symbols written on the alphabet, $q$ is the state the machine is currently in, and $a$ is the symbol under the head. +It captures completely the state of execution at a given instant in time. +\end{defn} + +\ + +We will implement the instantaneous description $s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$ as both of two possible modular machine states: +$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, q} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, a}$$ +or +$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, a} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, q}$$ +It will soon become clear why we want to be able to use both these states. +While the modular machine can only ever occupy one of the states, the Turing machine it's emulating will be in the same state whichever +of the two the modular machine happens to be in. + +Define the modulus $m$ to be the number of Turing-machine states, plus the number of Turing-machine symbols, plus $1$; this is just to +make sure we have plenty of symbols to work with, and can store all the information we need in any given cell of the tape. + +How can we express a Turing-machine instruction? +Remember, they are one of the two forms +\begin{itemize} +\item $(q, a, a', q', L)$, which would convert $$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$ to $$s_1 s_2 \dots s_{k-1} q' s_k a' s_{k+2} \dots s_r$$ +\item $(q, a, a', q', R)$, which would convert $$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$ to $$s_1 s_2 \dots s_k a' q s_{k+2} \dots s_r$$ +\end{itemize} +Therefore, taking our correspondence between Turing-machine instantaneous descriptions and internal states of a modular machine, +we need the instruction $(q, a, a', q', L)$ to take +$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, q} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, a} \mapsto +\machine{\ ,s_1, s_2, \dots, s_{k-1}, q'} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', s_k}$$ +or to $$\machine{\ ,s_1, s_2, \dots, s_{k-1}, s_k} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', q'}$$ + +Now it is clear why we needed the two possible representations of a single Turing-machine instantaneous description: +only the second of the above transitions is easy to implement in the modular machine, +but it has swapped $q$ from the left-hand register to the right-hand register. +This is perfectly kosher, but only because we were careful to state that the current-state and current-symbol +letters were interchangeable in the modular machine. +It can be performed using the modular machine instruction $(q, a, (a', q'), R)$. + +Similarly, since we might have started this whole affair with the other representation of the TM instantaneous description, +we need to do the same with $$\machine{s_1, s_2, \dots, s_{k-1}, s_k, a} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, q}$$ +which, by modular machine instruction $(a, q, (a', q'), R)$ is taken to +$$\machine{\ , s_1, s_2, \dots, s_{k-1}, s_k} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', q'}$$ + +Notice, as an aside, that the Turing-machine head was moving left, and the modular-machine ``head'' symbol $q'$ has ended up on the right-hand tape +whichever of the two representations of the instantaneous description we used. + +We can do the same for the Turing machine instructions which involve moving rightwards. + +\section{Summary} + +If we take our Turing machine states $(q, a, a', q', L/R)$ and, for each one, create a pair of modular machine instructions +$(a, q, (a', q'), L/R)$ and $(q, a, (a', q'), L/R)$, we end up with a modular machine that precisely emulates the Turing machine. +We could represent the pair $(a', q')$ as a single integer which is between $0$ and $m^2$: namely, by taking $a'm+q'$. +This makes the strings a bit shorter, but not as comprehensible. + +\ + +\begin{defn}[Halting set] +We define the ``halting set'' of a modular machine to be the collection of states of the tape from which, when the machine runs, +we eventually end up with both tapes zeroed out. +For this to be a sensible definition, we want the machine not to have an instruction corresponding to head-states $(0,0)$, so that the machine really does stop eventually if it started in a halting state. +\end{defn} + +\section{Embedding a modular machine into a group} + +What we seek now is a way to embed a MM into a group. +A MM has two pieces of state: the left-hand tape and the right-hand tape. +These can be easily coded as integers. + +\subsection{How could we apply a machine instruction?} +Let's imagine we have a way of representing the state $(a,b)$ as a group word in some group: $t(a, b)$. +What we now want is a way of applying a transformation to obtain the different word $t(a', b')$ which corresponds to +executing the MM instruction $(\alpha, \beta, (x, y), L/R)$. +A very good way of applying reversible transformations is to conjugate, so let's add lots of letters to the group: +one $r_i$ for each instruction $(\alpha, \beta, (x,y), L)$, and one $s_i$ for each $(\alpha, \beta, (x,y), R)$. +Conjugating by the letter $r_i$ will apply the $i$th $L$-instruction. + +The required effect is $$r_i t(\alpha + m u, \beta + m v) r_i^{-1} = t(xm+y+m^2u, v)$$ + +\subsection{How do we store the states?} +The next idea is that since our states are merely integers, we might store them as exponents of a group generator: +$x^a y^b$ where $a$ is the left-hand tape and $b$ the right-hand. +In this scheme, it'll be easier if we allow $x$ and $y$ to commute, since all we care about is their exponents. + +Now, it will be convenient to introduce a third letter, $t$, which will let us store separately the ``head'' and the ``body'' of the tapes. +Our storing scheme will be $$y^{m v} x^{mu} (x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-m v}$$ +which we will denote $t(\alpha+m u, \beta+m v)$, +corresponding to the tape which has $\alpha, \beta$ as the two heads, and then $u, v$ as the data on the rest of the tape. + +So we define $$K := \langle x, y, t \mid xy=yx \rangle$$ + +\begin{remark} +Notice that the only words in $K := \langle x, y, t \mid xy=yx \rangle$ which can be expressed in this form are the words $y^a x^b t x^{-b} y^{-a}$. +Therefore it makes sense to define a subgroup $$T := \langle t(r, s) : r, s \in \mathbb{Z} \rangle$$ +which consists of ``all possible machine states''. +\end{remark} + +\subsection{How do the machine instructions work?} + +How does $r_i$ act, then? +$(\alpha, \beta, (p,q), L)$ needs to take $$t(\alpha + m u, \beta + m v) = y^{mv}x^{mu} (x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-m v}$$ +to $$t(pm+q+m^2u, v'm + \nu) = y^{v'm} x^{(um+p)m}(x^{q} y{^\nu} t y^{-\nu} x^{-q}) x^{-(um+p) m} y^{-v' m}$$ +where we are writing $v = v'm + \nu$. +More concretely, +$$y^{v'm^2+\nu m} x^{m u}(x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-v'm^2-\nu m}$$ +maps to $$y^{v' m} x^{um^2+pm}(x^{q} y{^\nu} t y^{-\nu} x^{-q}) x^{-um^2-p m} y^{-v' m}$$ + +Notice that this is exactly performed by the map $x^m \mapsto x^{m^2}, y^m \mapsto y, t(\alpha, \beta) \mapsto t(q+p m, 0)$. +How can we make that well-defined (since we clearly can't send $y^m$ to $y$ without messing up the map of $t(\alpha, \beta)$)? +The trick is to forget that we're working with $x,y,t$, and start working with $t(\alpha, \beta)$, $x^m$, $y^n$ as atomic blocks. + +Define a new subgroup $$K_{\alpha, \beta}^{M, N} := \langle \overline{t(\alpha, \beta)}, \overline{x}^M, \overline{y}^N \rangle \leq K$$ +Then the map $$\phi_i: t(\alpha,\beta) \mapsto t(q+pm, 0), x^m \mapsto x^{m^2}, y^m \mapsto y$$ +would do what we want. +What is the domain and range of that map? +If we view it as being $K_{\alpha, \beta}^{m,m} \to K_{q+pm, 1}^{m^2, 0}$, then it's actually an isomorphism: +it's just matching up the generators of the respective groups. + +OK, we have a map which we want to apply whenever we see $r_i w r_i^{-1}$. +The way we can do that is to create an HNN extension: +take $K *_{\phi_i}$ with stable letter $r_i$. + +We can do the whole lot again with $\psi_i$ corresponding to $s_i w s_i^{-1}$, which performs an $R$ instruction (where $\phi$ performed an $L$ instruction). + +\subsection{How do we turn this all into a group?} + +We've got all these groups floating around; to specialise to a group in which we can only be dealing with machine states, +consider $$T'_{\mathcal{M}} := \langle \overline{t(\alpha, \beta)}: (\alpha, \beta) \in H_0(\mathcal{M}); \overline{r_i}: i \in I; \overline{s_j}: j \in J \rangle \leq K *_{\phi_i; \psi_j}$$ +where $H_0$ refers to the halting set of the modular machine $\mathcal{M}$. + +If we take an element $\overline{t(\alpha, \beta)}$ of $T'_{\mathcal{M}}$ which contains no $r_i$ or $s_j$, it reduces to the word $t$ in this group if and only if, when applying $r_i$'s and $s_j$'s, we end up in the halting set of the modular machine: +the HNN extension has quotiented out by the relation ``conjugating by $r_i$ applies effect $\phi_i$, which moves the modular machine according to the $i$th $L$-instruction''. +The word $t$ is precisely symbolising the empty tape. +Of course, we could have lots of unused instructions or parts of instructions floating around: the group word $r_i t$ still has empty tape, so is still a halting state. + +\subsection{Subgroup membership to word problem} +We have constructed a subgroup $\langle t \rangle' := \langle t, r_i, s_j \rangle$ such that $(\alpha, \beta)$ is a halting state of the modular machine if and only if $t(\alpha, \beta)$ lies in that subgroup. +Using an HNN extension, we can make a group where we just need to check equality of words: +create the group $$G_{\mathcal{M}} := \langle K*_{\phi_i, \psi_j}; k \mid khk^{-1} = h \ \forall h \in \langle t \rangle' \rangle$$ +Then conjugating an element by $k$ does nothing precisely when that element was in $\langle t \rangle'$: that is, precisely when the modular machine halts from that starting state. + +\subsection{What have we missed out?} +There are many checks to be done along the way. +Our HNN extensions need to be well-defined. +The $\phi_i, \psi_j$ need to be iso. +$G_{\mathcal{M}}$ is in fact finitely presented, but we need to show that. + +However, once those checks are done, we have produced an explicit recipe for constructing a finitely presented group which can simulate a given arbitrary modular machine. + +\section{Higman's construction} + +It turns out that, somewhat shockingly, there is a beautiful way to use the above to embed a recursively-presented group $C = \langle X \mid R \rangle$ into a finitely-presented group. +Think of the earlier construction as telling us how to execute a Turing machine inside a group. +Then, morally, we do the following. + +\begin{enumerate} +\item Take the machine that halts precisely on members of $R$; +\item Embed it into a group $G$; +\item Glue $G$ onto $C = \langle X \mid R \rangle$ in such a way that we can use the machine-part, $G$, to decide which reductions to make (rather than querying the infinite set $R$). +\end{enumerate} + +Of course, the construction is rather complicated, but the upshot is that the finite presentation which defines $G$ can be used to capture all the information that lies in the infinite relator-set $R$. + +\subsection{General approach} + +We will make a (large!) group whose elements include what I will call ``collections'', which have the following structure: + +\begin{enumerate} +\item machine state (for those following the course notes, this is $K_{\mathcal{M}}$ in Step 13 of Construction 11.2) +\item word under consideration (this is $\langle b_1, \dots, b_n \mid \cdot \rangle$) +\item group element that word corresponds to (this is $\overline{C}$) +\item marker (this is $d$) +\end{enumerate} + +The group $\langle X \mid R \rangle$ embeds in the third component (``group element that word corresponds to''). +The ``machine state'' section will be implemented as $K_{\phi_i; \psi_j}$ from earlier (which, recall, was finitely presented). +The ``word under consideration'' will be how we extract information from the ``machine state''. +The ``marker'' serves no purpose for interpretation, but it turns out to be an important fiddly detail in the construction. + +\subsection{Construction} + +We are going to use a whole lot of HNN extensions to manipulate collections. + +\subsubsection{Which modular machine to use?} + +$$\machine{c_{i_n}, \dots, c_{i_2}, c_{i_1}} \ \machine{0,\dots,0,0}$$ +Take a modular machine such that, if we start the tape with the above state, we halt with an empty tape if and only if $c_{i_1} \dots c_{i_n}$ is in $R$. +(Recall that our recursive presentation is $C = \langle X \mid R \rangle$, and this is the group we want to embed.) +It is possible to do this (for those following the course notes, this is steps 1 through 5): symmetrise the generators if necessary +so that each generator $c \in X$ has an inverse $c^{-1}$ and a relator $c c^{-1} = e$. +Then make a Turing machine that enumerates the trivial words in this new presentation (which is, in spirit, the same as the old presentation; +it certainly defines the same group). +Convert that Turing machine into a modular machine, where each $c$ or $c^{-1}$ of the group corresponds to a cell-state $a_c$ or $a_{c^{-1}}$. + +Once we've got that modular machine, we can embed it into a group using what's happened already, although we're actually only going to need $$K_{\mathcal{M}} := K *_{\phi_i, \psi_j}$$ +which, recall, is the group which holds an MM state as the word $x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}$ which can also have the MM instructions directly applied to it, by conjugating with the stable letters $r_i, s_j$ respectively to apply the $i$th Left-instruction or $j$th Right-instruction. + +\subsubsection{Creating a collection} + +We're only interested in certain modular machine states: namely, those corresponding to $t(\alpha, 0)$ for certain $\alpha$. +(Recall that $t(\alpha, \beta)$ is our notation for how the group $K_{\mathcal{M}}$ stores the current state $(\alpha, \beta)$ of the MM.) +That is, we're only interested in how the modular machine behaves when we start it off with input $\alpha = \sum_{i=0}^r c_{k_i} m^i$, say: +equivalently, when we ask it the question ``Is $c_{k_0} \dots c_{k_r}$ in the relating set of $C$?''. + +So, while other MM-states appear encoded in our $K_{\mathcal{M}}$---for example, the state corresponding to ``starting with $5$ on the left-hand tape and $2$ on the right-hand, perform the fourth left-instruction in the list of possible instructions''---our remaining manipulations to the group will all refer to $t(\alpha, 0)$ directly. +That is, our remaining manipulations will ignore non-interesting MM states. + +Given an MM state $t(\alpha, 0)$, we unpack it into a collection by conjugating with a new stable letter, $p$, and taking an HNN extension. +(In the course notes, this is steps 17 and 18.) +The extension will take $t(\alpha, 0)$ and unpack it into the word $$[t(\alpha, 0), w_{\alpha}(b)]$$ +where $w_{\alpha}(b)$ simply means ``take the word which is currently loaded into the MM's memory, as its left-hand tape where the right-hand tape is $0$, and write it down with $b$'s in the abstract''. + +\begin{example} +If $\alpha = c_3 + c_7 m + c_1 m^2$, we would have $w_{\alpha}(b) = b_3 b_7 b_1$, a word in the abstract group $\langle b_1, \dots, b_n \mid \cdot \rangle$. +I'm playing fast and loose with the ordering here; I may mean $b_1 b_7 b_3$, but I can't be bothered to check which is right. +\end{example} + +To recap, the unpacked word $[t(\alpha, 0), w_{\alpha}(b)]$ has two components so far, then: the modular machine state $t(\alpha, 0)$, and an abstract statement $w_{\alpha}(b)$ of the word we're asking about. +I'll add a third component: the ``marker'' letter $d$, so our unpacked word actually has three components and looks like $$[t(\alpha, 0), w_{\alpha}(b), d]$$ +We obtained the unpacked word by conjugating $t(\alpha, 0)$ by a new stable letter $p$, and taking an HNN extension which adds the relators $$p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$$ +for each $\alpha$. + +For reasons which will be important later, I added the $d$ at this point: an end-marker, sitting after the $w_{\alpha}(b)$. +I can't motivate its presence in this section, but eventually we will start appending things to $w_{\alpha}(b)$, and it will become vital to know where the word ends. +We use the marker $d$ for that. + +\begin{remark} +This is an awful lot of relators. +Infinitely many, in fact! +Don't panic; we will eventually show that we can actually replace most of them with a finite number of relators. +We haven't actually tied the behaviour of the machine to the manipulation of collections yet; only expanded a machine into a collection. +\end{remark} + +Later on, we will add one more component (our last one) to the collection. +It will be a bona fide word on $C$'s generators, and it will be the word which $\alpha$ represents. +We can do this by insisting that $b_i c_j = c_j b_i$ in general, and by doing another HNN extension, +which will have the effect of taking $b_j$ to $b_j c_j$. +That is, $b_3 b_2 \mapsto b_3 c_3 b_2 c_2 = b_3 b_2 c_3 c_2$; +then we view the $b$ chunk and the $c$ chunk as being distinct, forming the second and third components of our four-component collection respectively. + +Formally, in a little while we will add a stable letter $V$ and add the relators that $V b_j V^{-1} = b_j c_j$ +(and that $V a V^{-1} = a$ for most of the other possible $a$ -- for example, for $a = t$ or $a = r_i$). +In the course notes, $\psi_+$ adds this component to the collection. + +So our final collection will be $$[t(\alpha, 0), w_{\alpha}(b), w_{\alpha}(c), d]$$ + +However, we don't bother adding this yet. +In essence, the original three components of the collection are where the magic happens; +but because $w_{\alpha}(b)$ is a ``purely syntactic'' readout of the machine $t(\alpha, 0)$, +we won't actually have an embedded copy of $C$ in the group unless we add one. +So as the last step of our entire construction, we will put in this fourth component that ``evaluates the word $w_{\alpha}(b)$ as an element of $C$''. + +\subsubsection{Tying the $w_{\alpha}(b)$-component to the machine component} + +The real magic happens at this point. +Everything we've done so far has added only finitely many relators, \emph{except} $$p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$$ +Recall that this was unpacking a machine (with a word loaded onto its tape) into a pair of (the machine, the word). + +How can we specify this with only finite amounts of information? +Well, the word is taken over a finite alphabet $b_1, \dots, b_n$, +so we'd be done if, starting from an empty-tape machine, we had a way of loading a word onto the tape of the ``machine'' component of the collection one letter at a time, and at the same time appending the word to the ``word'' component of the collection one letter at a time. +This will involve reaching down into the implementation of the machine, which (recall) is currently being held as one component of a collection; +the collection itself is just a word over a particular alphabet which we haven't yet specified (we can extract it at the end). + +Define a bunch of HNN extensions, one for each $b_i$, taking $t(\alpha, 0)$ to $t(\alpha m + c_i, 0)$ +and $w_{\alpha}(b)$ to $w_{\alpha}(b) \cdot b_i$. +Here is where we need $d$: we need to know where the end of $w_{\alpha}(b)$ is, so that we can append something to it. + +Formally, define stable letters $U_i$ such that: +\begin{itemize} +\item $U_i b_r U_i^{-1} = b_r$ (dealing with the $w_{\alpha}(b)$-chunk) +\item $U_i d U_i^{-1} = b_i d$ (appending $b_i$ to the abstract word) +\item $U_i x U_i^{-1} = x^m$ (shifting the left-hand tape of the machine up by one, to make room for the new symbol) +\item $U_i t U_i^{-1} = x^i t x^{-i}$ (filling that new empty slot with the required state) +\end{itemize} + +The miracle of what we have just done is that we can implement all the infinitely-many $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$ in terms of these finitely-many new relators! +To obtain the effect of $p t(\alpha, 0) p^{-1}$ where $\alpha$ represents the word $c_3 c_6$, just conjugate the collection $ptp^{-1} = [t(0,0), \text{empty word}, d]$ in turn by $U_3$ and $U_6$. + +\subsubsection{Adding an embedded copy of $C$} + +So far, we have been dealing syntactically with symbols $b_i$ which represent the generators of $C$. +But we haven't actually got an embedded copy of $C$ yet, or at least we haven't obviously got one. +What we need is a way of evaluating $w_{\alpha}(b)$ to obtain $w_{\alpha}(c) \in C$. + +That's easy, though: do one HNN extension that will have the effect of taking $b_i$ to $b_i c_i$, and insist that the $b_i, c_j$ all commute. + +Formally, add a single stable letter $V$ such that: +\begin{itemize} +\item $V b_j V^{-1} = b_j c_j$ +\item $V d V^{-1} = d$ (so the marker is unchanged) +\item $V t V^{-1} = t$, $V r_i V^{-1} = r_i$, $V s_j V^{-1} = s_j$ (so the $K_{\mathcal{M}}$ component is unchanged) +\item $V p V^{-1} = p$ (so that $V$'s unpacking won't do anything until we have explicitly performed $p$'s unpacking) +\end{itemize} +and add (finitely many) relators $b_i c_j = c_j b_i$. +(This HNN extension is $\psi^+$ in the course notes.) + +\begin{remark} +One might wonder why we don't need $V$ to commute with the $U_j$ (which, recall, load letters onto the tape of the machine). +The answer is that we only want to do the unpacking right at the end of the procedure: +at the ``load letters onto tape'' stage, we don't do any unpacking. +Therefore, in our final uber-group which encodes all of Higman's construction (whose elements are words which are collections), +there may be strange fragments of instructions floating around, like in $V t(\alpha, 0) w_{\alpha}(b) d$ where $V$ is ``half of an instruction''. +We just let these hang around, and ignore them: Britton's lemma on HNN extensions tells us that none of these are equal to words in which +the instructions have all been fully executed (that is, in which no $V$'s appear). +While all this junk is still sitting around in the group, it doesn't interfere with the interesting part of the group. + +The situation is analogous to the embedding of a modular machine into a group, where we ignore any chunks of half-instruction like $r_i x^a y^b t y^{-b} x^{-a}$ (where $r_i$ is an un-completed instruction) because they don't interfere with what we're really doing. +\end{remark} + +Consider that a word $w_{\alpha}$ evaluates in $C$ to something trivial if and only if the modular machine halts in the zero state from $t(\alpha, 0)$. +(That's how we defined the modular machine.) +The modular machine halts in the zero state from $t(\alpha, 0)$ if and only if conjugating by $r_i, s_j$ in some order causes $t(\alpha,0)$ to be taken to the single letter $t$. +That is, if and only if the machine, starting from state $t(\alpha, 0)$, eventually reaches the state that is the single letter $t$. + +But because $V$ commutes with the $r_i$ and $s_j$ (that is, the ``instructions'' in $K_{\mathcal{M}}$ telling it to execute the appropiate machine instructions), we have $$r^i s^j V t(\alpha, 0) V^{-1} s^{-j} r^{-i} = V r^i s^j t(\alpha, 0) s^{-j} r^{-i} V^{-1} = V t(0, 0) V^{-1} = t$$ if the machine halts on the zero state through applying instructions $r_i, s_j$. + +That is, $$V t(\alpha, 0) V^{-1} = r^{-i} s^{-j} t r^i s^j$$ which, since the machine is deterministic (and we're undoing the instructions that took us from $t(\alpha, 0)$ to $t(0,0)$), is $t(\alpha, 0)$. + +Otherwise, if the machine doesn't halt through $r_i, s_j$, there are some interfering $x$'s left at the end (we end up with $V t(a, b) V^{-1} = V x^i y^j t y^{-j} x^{-i} V^{-1}$) so $V$ can't cancel. + +The upshot is that conjugating the three-element collection $[K_{\mathcal{M}}, w_{\alpha}(b), d]$ by $V$ unpacks into a collection $$[K_{\mathcal{M}}, w_{\alpha}(b), w_{\alpha}(c), d]$$ without any $V$'s if and only if $w_{\alpha}(c)$ is trivial as a member of $C$. + +\begin{example} +Suppose $$C = \mathbb{Z}_2 = \langle c_1 \mid c_1^2 = e \rangle$$ +Expand the presentation to $$\langle c_1, c_2 \mid c_1 c_2 = e, c_1^2 = e, c_1^2 c_2^2 = e, \dots \rangle$$ +where the ellipsis indicates every trivial word. + +Let us examine $c_1^2$, which is trivial; it is coded into the machine-component of the group as $t(1+m, 0)$. + +There is a sequence of $r_i, s_j$ such that conjugating $t(1+m, 0)$ by $r_i s_j$ results in $t(0,0)=t$. +To make this more readable, let's say $r_i s_j$ are precisely the two instructions we need to use to do this. + +Now, $V t(1+m, 0) V^{-1}$ is equal to $$V r_i s_j t(0,0) s_j^{-1} r_u^{-1} V^{-1}$$ +But $V$ commutes with all those terms, so it is just $t(1+m, 0)$. + +Hence $V t(1+m, 0) b_1^2 d V^{-1}$ is $$t(1+m, 0) b_1^2 c_1^2 d$$ +because of the effect of $V$ on the $b_i$ terms. + +On the other hand, $V t(1+m, 0) b_1^2 d V^{-1}$ is $$V p t(1+m, 0) p^{-1} V^{-1}$$ +and $V$ commutes with $p$, so it comes to $$p t(1+m, 0) p^{-1} = t(1+m, 0) b_1^2 d$$ + +Therefore $$t(1+m, 0) b_1^2 c_1^2 d = t(1+m, 0) b_1^2 d$$ +so $c_1^2$ is trivial. +\end{example} + +\subsection{List of generators} +As a recap, here is a list of all the relators (HNN or otherwise) of the uber-group we have made. + +\begin{itemize} +\item $r_i t(\alpha, \beta) r_i^{-1} = t(\text{new machine state})$ for each $L$-machine-instruction +\item $s_j t(\alpha, \beta) s_j^{-1} = t(\text{new machine state})$ for each $R$-machine instruction +\item $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$, the instructions for unpacking a machine into a machine with its word (all these can be made by other instructions) +\item $p t p^{-1} = t d$ (a single special case of the above) +\item $U_i b_r U_i^{-1} = b_r$ (when loading $i$ onto the tape, deals with the $w_{\alpha}(b)$-chunk) +\item $U_i d U_i^{-1} = b_i d$ (when loading $i$ onto the tape, appends $b_i$ to the abstract word) +\item $U_i x U_i^{-1} = x^m$ (when loading $i$ onto the tape, shift the left-hand tape of the machine up by one, to make room for the new symbol) +\item $U_i t U_i^{-1} = x^i t x^{-i}$ (filling that new empty slot with the required state) +\item $V b_j V^{-1} = b_j c_j$ (unpacking a syntactic word into its represented element) +\item $V d V^{-1} = d$ +\item $V t V^{-1} = t$, $V r_i V^{-1} = r_i$, $V s_j V^{-1} = s_j$ (so the $K_{\mathcal{M}}$ component is unchanged) +\item $V p V^{-1} = p$ +\item $b_i c_j = c_j b_i$ +\end{itemize} + +There are finitely many of all of these, except the $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$ which can be made by other instructions. +\end{document} diff --git a/FriedbergMuchnik.tex b/FriedbergMuchnik.tex new file mode 100644 index 0000000..4c393fd --- /dev/null +++ b/FriedbergMuchnik.tex @@ -0,0 +1,171 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + +\newmdtheoremenv{defn}{Definition} +\newmdtheoremenv{thm}{Theorem} +\newmdtheoremenv{motiv}{Motivation} + +\title{Friedberg-Muchnik theorem} +\author{Patrick Stevens, with tip of the hat to Dr Thomas Forster} +\date{5th February 2016} + +\begin{document} + +\maketitle + +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/FriedbergMuchnik/FriedbergMuchnik.pdf} \end{center} + +\normalsize + +\section{Introduction} + +We consider Turing machines which query an oracle $O$: that is, they come equipped with an extra instruction ``call the oracle with this input'', where a call to the oracle is simply a test for membership in $O$. + +We may supply different oracles to the same Turing machine, and potentially get different results: for example, the Turing machine which has as its only instruction ``output the result of calling the oracle on my input'' precisely mimics the oracle. + +Recall that a set $A$ is \emph{semi-decidable} if there is a Turing machine $T$ such that for all $n \in \mathbb{N}$, $T(n)$ halts iff $n \in A$. + +\ + +\begin{thm}[Friedberg-Muchnik theorem] There are semidecidable sets $A$ and $B$ such that for all Turing machines $T$ which may query an oracle, the following fail to be equivalent: + +\begin{enumerate} +\item $T$-querying-$B$ doesn't halt with output $0$ +\item $T$-querying-$B$ has input in $A$ +\end{enumerate} + +and the following fail to be equivalent: +\begin{enumerate} +\item $T$-querying-$A$ doesn't halt with output $0$ +\item $T$-querying-$A$ has input in $B$ +\end{enumerate} +\end{thm} + +\ + +That is, we can find two semidecidable sets $A$ and $B$ such that neither allows a Turing machine to decide the other, where by ``$T$ decides $A$'' we mean ``$T$ halts and outputs $0$ iff its input is not in $A$, and it halts and outputs $1$ iff its input is in $A$''. +(Equivalently, $T$ is a machine which computes the characteristic function of $A$.) + +\section{Proof} + +We can enumerate all the Turing machines which call an oracle; write $[ n ]^X$ for the $n$th Turing machine in the enumeration, calling oracle $X$. + +What would it mean for the Friedberg-Muchnik theorem to hold? +We would be able to find $e_n$ (resp. $f_n$) that witness in turn that the $n$th Turing machine doesn't manage to decide $A$ in the presence of $B$ (resp. $B$ in the presence of $A$). + +That is, it would be enough to show that: + +\ + +\begin{thm} There are semidecidable sets $A, B$ such that for all $n \in \mathbb{N}$: +\begin{itemize} +\item there is $e_n \in \mathbb{N}$ such that \begin{itemize} \item $e_n \in A$ but $[n]^B(e_n)$ halts with output $0$, or \item $e_n \not \in A$ but $[n]^B(e_n)$ fails to halt, or halts at something other than $0$ \end{itemize} +\item there is $f_n \in \mathbb{N}$ such that \begin{itemize} \item $f_n \in B$ but $[n]^A(f_n)$ halts with output $0$, or \item $f_n \not \in B$ but $[n]^A(f_n)$ fails to halt, or halts at something other than $0$ \end{itemize} +\end{itemize} +\end{thm} + +\ + +The way we are going to do this is as follows. +We'll construct our $A$ and $B$ iteratively, starting from the empty set and only ever adding things in to our current attempts. + +For each $n \in \mathbb{N}$, we can make an infinite list of ``possible'' witnesses: numbers which might eventually be our choice for $e_n$. +We don't care what these guesses are at the moment, but we just insist that they be disjoint and sorted into increasing order. +Write $$G_i = \{g^{(i)}_1, g^{(i)}_2, \dots \}$$ for the set of possibilities for $e_i$, and $$H_i = \{ h^{(i)}_1, h^{(i)}_2, \dots \}$$ for the set of possibilities for $f_i$. +(I emphasise again that we don't assume any properties of these numbers, other than that $g^{(i)}_m$ and $h^{(i)}_m$ are increasing with $m$, and that no $g^{(i)}_m, g^{(j)}_n, h^{(k)}_p, h^{(l)}_q$ are equal.) + +Now, at time-step $0$, we have no information about what's going to be in $A$ and $B$, so let $$A_0 = B_0 = \emptyset$$ + +Every $G_i$ and $H_i$ is looking for a witness among its members. +We assign a priority order to them: $$G_1 > H_1 > G_2 > H_2 > \dots$$ +The idea is that the high-priority sets quickly decide on their witness, and the lower-priority sets get to choose their witness subject to not being allowed to mess up any higher-priority set's decision. + +At time-step $t$, we've already built $A_{t-1}, B_{t-1}$ as our best guesses at $A$ and $B$. +We seek an $e_i$ for any $G_i$ which hasn't got one (for $1 \leq i \leq t$), and then we can work on finding $f_i$ for the $H_i$ next. + +Run the machines $[i]^{B_{t-1}}$ for $i=1, 2, \dots, t$, for $t$ steps each, each on input $g^{(i)}_1$. +This will approximate $[i]^B$, because $B_{t-1} \subseteq B = \cup_{t=1}^{\infty} B_t$, but it is by no means exactly what we need yet. + +\begin{itemize} +\item If our machine $[i]^{B_{t-1}}$ ever attempts to query its oracle on a value greater than $\text{max}(B_{t-1})$, we declare that the machine crashes for this round, and sits out. +Indeed, $B_{t-1}$ is incomplete at this point as a reflection of $B$ - we only have information about it up to $\text{max}(B_{t-1})$ - so it would be useless to try and infer information about $B$ from parts of $B_{t-1}$ which are even bigger than that maximum. + +\item If $[i]^{B_{t-1}}$ halts at something other than $0$, then it's no use to us: it can't possibly be a witness we can add to $B$, because such witnesses $e_i$ must satisfy ``$[i]^B(e_i)$ halts at $0$''. +So $G_i$ will sit out of this round. + +\item If $[i]^{B_{t-1}}$ fails to halt in the allotted $t$ steps, it is likewise not something we can add to $B$, because we can't (yet) even prove that the machine \emph{halts} on this input, let alone that it halts on $0$. +So $G_i$ will again sit out of this round. + +\item But if $[i]^{B_{t-1}}$ halts and outputs $0$ in the allotted $t$ steps, we're in business: for some collection of $i$, we have found some things ($g^{(i)}_1$) that might serve as witnesses. +Throw each of these into $B_{t-1}$ to make $B_t$. +\end{itemize} + +OK. Now $G_i$ is happy, but remember we might have had a side-effect here, because if (for the sake of argument) $G_1$ had already decided on its witness during time-step $t-1$, it made that decision with reference to $B_{t-1}$ and not with reference to $B_t$. +The fact that $g^{(i)}_1$ is now in our $B$-guess may alter the computation that $[1]^{B_{t-1}}$ performed to decide on its witness. +(This is because $[1]^{B_{t-1}}(e_1)$ is not in general equal to $[1]^{B_t}(e_1)$.) + +How can we ensure that in fact $G_1$'s witness isn't broken? +Well, $[1]$ is a finite machine which we have run for a finite time, so it can only have asked the oracle for values up to some finite number $\beta_1$ before it halted. +So if we can make sure we only ever added $g^{(i)}_n$ to $B_{t-1}$ if it was above $\beta_1$, then we haven't actually changed $B$ from the point of view of $[1]$. +Even after $B_{t-1}$ becomes $B_t$, the computation $[1]^{B_{t-1}}$ performs is exactly the same as the computation $[1]^{B_t}$ performs, because the oracles are the same on all points $[1]$ might query. + +Therefore, after adding something from $G_i$ to make $B_t$, we need to delete all numbers below $\beta_i$ from all lower-priority $G_j$ and $H_j$. +(This is easy to do because of our stipulation that the $G_j$ be listed with elements in ascending order.) +That way, no $G_j$ will never even consider any element that breaks a higher-priority $G_i$. + +Once we've found the $G$-witnesses at time-step $t$, we can find the $H$-witnesses in exactly the same way; and finally, we move on to time-step $t+1$. + +\subsection{Problem} This procedure works pretty well, but there's a problem with it. You might like to meditate on this for a few minutes, because it's revealed in the next paragraph. + +The problem is simply that while no lower-priority entry can break a higher-priority one, the reverse might happen! +It might be that $G_1, G_2, G_3$ take ten steps of execution before halting, while $G_4$ halts after just one step and so decides on its witness immediately (that is, at time $t=4$, as opposed to $G_1$'s $t=10$). +Subsequently, $G_1$ will decide on its witness, and the act of throwing its witness into $B$ might break $G_4$'s choice. +Remember that $G_4$ only eliminated breaking-values from \emph{lower}-priority $G_j$, not the higher-priority $G_1$. +(Allowing it to eliminate breaking-values from higher-priority sets could cause the entire protocol to enter an infinite loop, with $G_1$ and $G_4$ each invalidating the results of the other on successive time-steps.) + +However, this isn't actually too much of a problem. +Since $G_1$ can only ever decide on its witness once (being the highest-priority), that means $H_1$ will only ever need to decide twice; $G_2$ only ever needs to decide at most four times (it could be first to pick its witness, then $H_1$ overrules it, then it picks again, then $G_1$ overrules it and $H_1$, then it picks again, then $H_1$ overrules it, and finally it picks again). +In general, the $i$th element of the priority order can only be overruled $2^i - 1$ times. + +So if $G_i$ is overruled, we can just keep churning through the procedure, chucking more and more elements into $B$; $G_i$ can only be overruled finitely many times, and it has infinitely many elements in its list to play with, so eventually it will work its way into a position when it can never be overruled. + +\subsection{Final problem} +OK, this works fine if every $G_i$ eventually finds a witness. +But there's another case: $G_1$ may never find a witness. +For example, $[1]^{B_t}(g^{(1)}_1)$ may never halt, or it may halt but output the value $1$ instead of $0$ (so the protocol sees it as ``uninteresting'' and just repeatedly tells $G_1$ to sit out of the round). + +But remember that we're trying to construct a witness that a certain equivalence fails, and so far we've been constructing witnesses that it fails in one of its directions. +(Remember: the equivalence we want to fail is that $T^B$ halts with $0$ iff it has input not in $A$.) +We could still win by finding $e_1$ such that $[1]^B(e_1)$ fails to halt at $0$ despite not being in $A$. +And look! We've precisely got one of those elements, and it's $g^{(1)}_1$. + +\section{Summary} + +The output of this procedure is a pair of sets $$\displaystyle A = \cup_{t=1}^{\infty} A_t, B = \cup_{t=1}^{\infty} B_t$$ +which are semi-decidable (because we built them as a union of finite sets $A_t, B_t$). +For each Turing machine $[i]^X$, we have a witness $e_n$, such that: +\begin{itemize} +\item either $[i]^B(e_n)$ halts with output $0$ and $e_n \in A$, or +\item $[i]^B(e_n)$ fails to halt, or halts with output not equal to $0$, and $e_n \not \in A$. +\end{itemize} +(This can be proved by induction: if $e_n$ is a witness at time $t$ and it is never overruled, then it remains a witness when we pass to $B$, because by construction its computation doesn't change on passing to $B$.) + +That is, no Turing machine $[i]^B$ decides $A$. + +Likewise, no Turing machine $[i]^A$ decides $B$. + +\end{document} \ No newline at end of file diff --git a/MonadicityTheorems.tex b/MonadicityTheorems.tex new file mode 100644 index 0000000..51c8d01 --- /dev/null +++ b/MonadicityTheorems.tex @@ -0,0 +1,624 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{tikz-cd} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + +\newmdtheoremenv{defn}{Definition} +\newmdtheoremenv{thm}{Theorem} +\newmdtheoremenv{motiv}{Motivation} + +\title{Motivation for the Monadicity Theorems} +\author{Patrick Stevens} +\date{12th December 2015} + +\begin{document} + +\maketitle + +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/MonadicityTheorems/MonadicityTheorems.pdf} \end{center} + +\normalsize + +\emph{You should draw diagrams yourself throughout this document. It will be unreadable as mere symbols.} + +\section{Definitions} + +\begin{defn} +Given a monad $\mathbb{T}$ on $\mathcal{C}$, the Eilenberg-Moore category is $\mathcal{C}^{\mathbb{T}}$, the category of all algebras $(A, \alpha)$ with $A \in \mathcal{C}$ and $\alpha: TA \to A$, whose arrows are ``homomorphisms'' $f \in \text{mor}(\mathcal{C})$ such that $(A, \alpha) \to (B, \beta)$ has $f \alpha: TA \to B$ equalling $(Tf) \beta : TA \to B$. +\end{defn} + +\ + +\begin{defn} Given $\mathbb{T}$ a monad induced by $(F: \mathcal{C} \to \mathcal{D}) \dashv (G: \mathcal{D} \to \mathcal{C})$, where the unit and counit of the adjunction are $\eta, \epsilon$ respectively, the comparison functor $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ is given by $K(B) = (GB, G \epsilon_B)$ and $K(f: A \to B) = Gf$. +\end{defn} + +\ + +\begin{defn}An adjunction $(F: \mathcal{C} \to \mathcal{D}) \dashv (G: \mathcal{D} \to \mathcal{C})$ is monadic if the comparison functor $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ is part of an equivalence, where $\mathbb{T}$ is the monad induced by the adjunction. +\end{defn} + +\ + +That is, a monadic adjunction is one of the form ``$\mathcal{C}$ is being taken to a category of algebras'', like the free-group/forgetful functor adjunction. + +\ + +\begin{defn} +A functor $G: \mathcal{D} \to \mathcal{C}$ is monadic if it has a left adjoint and the adjunction is monadic. +\end{defn} + +\ + +\section{Algebras are coequalisers} \label{coeq} + +This is a lemma we're going to need a lot, so without motivating why, I'll just present it. +I'll use the perhaps nonstandard terminology that $\alpha$ \emph{coequalises} two maps if the composites are equal, and it is a \emph{coequaliser} if it is universal among all coequalising maps. + +\ + +\begin{thm} +Let $(A, \alpha)$ be an algebra for monad $T: \mathcal{C} \to \mathcal{C}$. +Then $\alpha: TA \to A$ is a coequaliser. +\end{thm} + +\ + +The way to think about this is from the free-group/forgetful-functor monad on $\mathbf{Sets}$. +In this context, it means: ``every group is a quotient of a free group''. + +In the free-group context, $(A, \alpha)$ is a set $A$ together with a map $\alpha: TA \to A$ which ``tells us how to multiply''. +That is, $\alpha$ precisely specifies the multiplication table for the group. + +Consider $TTA$, which is the set of words of words of elements of $A$. +On the one hand, to get from a word $((a_1, a_2, \dots), (b_1, b_2, \dots), \dots)$ to an element of $A$, we could multiply out each word and then multiply again; +on the other, we could flatten the word-of-words and then multiply. + +(Remember, $\alpha: TA \to A$ is the multiplication, and $\mu_A: TTA \to TA$ is the flattening operation. +Note that $\alpha: (TA, \mu_A) \to (A, \alpha)$ is an algebra homomorphism, because $\alpha \mu_A = \alpha (T \alpha)$ is just part of the definition of an algebra homomorphism.) + +\[ +\begin{tikzcd} +(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"'] +& (TA, \mu_A) \arrow[r, "\alpha"] +& (A, \alpha) +\end{tikzcd} +\] + +I claim that doing these two gives the same result: that is, $\alpha$ coequalises the maps $T \alpha$ and $\mu_A : (TTA, \mu_{TA}) \to (TA, \mu_A)$. +Indeed, the fact that $\alpha (T \alpha) = \alpha \mu_A : TTA \to A$ is just part of the definition of an algebra homomorphism. +From the groups point of view, it's true that we can re-group terms and then multiply, and still get the same answer as just multiplying. + +I now claim that it's actually a coequaliser: any homomorphism $f: (TA, \mu_A) \to (B, \beta)$ which has $f (T \alpha) = f \mu_A$ must factor through $\alpha: TA \to A$. +From the groups point of view, if we do some operation on the words which doesn't care about the grouping of the terms, it must be defined by its action on the generators (elements of the set $A$). + +\[ +\begin{tikzcd} +(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"'] +& (TA, \mu_A) + \arrow[r, "\alpha"] + \arrow[dr, "f"] +& (A, \alpha) + \arrow[d, dashrightarrow, "g"] +\\ +& +& +(B, \beta) +\end{tikzcd} +\] + +Indeed, if we apply the forgetful functor, we obtain a split coequaliser (and hence a fortiori a coequaliser), because: + +\begin{itemize} +\item it coequalises +\item $\eta_A : A \to TA$ inverts $\alpha: TA \to A$ (``inclusion into the free group'' vs ``multiplying out one-element words'') +\item $\eta_{TA}: TA \to TTA$ inverts $T \alpha: TTA \to TA$ +\end{itemize} + +So there can only be one possible factorisation of $f$ through $\alpha$ (because forgetting the algebra structure would keep the factorisation the same, and we've just shown that forgetting the algebra structure we already have a coequaliser). + +There actually is a factorisation of $f$ through $\alpha$: say $g$ is the factorisation $A \to B$ once we've forgotten the algebra structure. +Then we just need to show that $g$ is an algebra homomorphism $(A, \alpha) \to (B, \beta)$. +(That is, we need to show that after we've forgotten the algebra structure, we can add in the algebra structure again without breaking $g$.) + +But it is: $g \alpha = \beta (Tg)$ because: + +\begin{itemize} +\item We already know $g \alpha = f: TA \to B$ (by ``$g$ is a factorisation of $f$ through $\alpha$'') +\item $f \mu_A = \beta (T f): $ because $f: TA \to B$ is an algebra hom +\end{itemize} + +From the second and first together, we have $g \alpha \mu_A = \beta (T g) (T \alpha)$; +since $\alpha$ is an algebra hom, we have $\alpha \mu_A = \alpha (T \alpha)$. + +Therefore $g \alpha T(\alpha) = \beta (Tg)(T \alpha)$, so $f T(\alpha) = \beta (Tg)(T\alpha)$. + +So it's enough for $T(\alpha)$ to be epic. +Fortunately, we have already got that it's even split epic, so it is indeed epic, so we do have $g \alpha = \beta (Tg)$, so $g$ is an algebra hom. + +\subsection{Summary} + +That is, every algebra $(A, \alpha)$ is a coequaliser of two free algebras, $(TTA, \mu_{TA}) \to (TA, \mu_A)$ by $T \alpha$ and $\mu_A$. + +\section{Overview} + +Obviously we can't get rid of the need for $G$ to have an adjoint, because ``monadic'' is solely a property of an adjunction. +So we want to find an equivalent way to characterise the fact that the adjunction is monadic. + +The ``equivalence of categories'' condition tells us that we somehow need to characterise when $K$ has a left adjoint $L$. +(Recall that every equivalence can be refined to an adjoint equivalence.) + +Once we've done that, we'll need natural isomorphisms $\alpha: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$ and $\beta: LK \to 1_{\mathcal{D}}$. +The way we'll do that is to show that $A$ and $LKA$ are both coequalisers of the same thing, and $(A, \alpha)$ and $KL(A, \alpha)$ are also both coequalisers of the same thing. +Since two objects which satisfy the UMP of the coequaliser must be isomorphic by a \emph{unique} isomorphism, this will prove the existence (``isomorphic'') and naturality (``unique'') of the bijection. +(Recall that a natural transformation is iso if and only if it is pointwise iso.) + +\section{Adjoint for \texorpdfstring{$K$}{K}} + +We're looking for $L: \mathcal{C}^{\mathbb{T}} \to \mathcal{D}$, such that $LKB = L(GB, G \epsilon_B)$ is isomorphic to $B$ for each $B \in \mathcal{D}$. + +\subsection{Definition of \texorpdfstring{$L$}{L}} + +Can we show, then, that $L(GB, G \epsilon_B) \cong B$? + +This is a bit of a rabbit out of a hat, but remember that we've shown every $B$ is a coequaliser (as is $(B, \beta)$ for any hom $\beta$). +Therefore, we could try and show that the following construction generalises. +(Note from the far future: this diagram is wrong. $B \in \mathcal{D}$, but $GFB \in \mathcal{C}$.) + +\[ +\begin{tikzcd} +GFGFB \arrow[r, shift left, "GF(G \epsilon_B)"] \arrow[r, shift right, "\mu_B"'] +& GFB \arrow[r, "G \epsilon_B"] +& B \cong^{?} L(GB, G \epsilon_B) +\end{tikzcd} +\] + +(For the moment, there's not really any reason to suspect that this is an adjoint to $K$. I'm sorry. It is, though.) + +Well, there's a way to generalise it so that the objects look right, anyway: just throw away the initial $G$ from anywhere we can. +Now the arrows are going to be nonsense, of course, and now the congruence doesn't make any sense. +(Replace $\epsilon_B$ by $\beta$, because now it's a general arrow.) + +\[ +\begin{tikzcd} +FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "?"'] +& FB \arrow[r, "?"] +& L(B, \beta) +\end{tikzcd} +\] + +What has to happen to the arrows? +We had $G \epsilon_B = \beta$ before, but remember that in the monad $(\mathbb{T}, \eta, \mu)$ induced by the adjunction $F \dashv G$, we defined $\mu_B = G \epsilon_{FB}$. +So actually we can get rid of the $G$ here! + +\[ +\begin{tikzcd} +FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "\epsilon_{F B}"'] +& FB \arrow[r, "?"] +& L(B, \beta) +\end{tikzcd} +\] + +This is a general definition of $L(B, \beta)$: as the coequaliser of $F\beta$ and $\epsilon_{FB}$ from $FGFB \to FB$. + +Is it well-defined? That is, does the coequaliser exist? + +Notice that $F\beta$ and $\epsilon_{FB}$ both have inverse $F \eta_B$, so we have a reflexive pair; so it's enough for $\mathcal{D}$ to have coequalisers of reflexive pairs. + +Alternatively, if we apply $G$ to the diagram, we get: + +\[ +\begin{tikzcd} +TTB \arrow[r, shift left, "T \beta"] \arrow[r, shift right, "G \epsilon_{F B}"'] +& TB +\end{tikzcd} +\] + +But we've already seen that this is part of a split coequaliser diagram (the three bullet-points in Section \ref{coeq}), so it does have a coequaliser in $\mathcal{C}$; +therefore it's also, separately, enough for $G$ to create coequalisers of $G$-split pairs. + +\subsubsection{Summary} \label{summaryL} + +We have seen that if we have either of + +\begin{enumerate} +\item $\mathcal{D}$ has coequalisers of reflexive pairs +\item $G$ creates coequalisers of $G$-split pairs +\end{enumerate} + +then our putative left-adjoint $L$ is well-defined as a function. + +It is defined on objects as a coequaliser: + +\[ +\begin{tikzcd} +FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "\epsilon_{F B}"'] +& FB \arrow[r, "l"] +& L(B, \beta) +\end{tikzcd} +\] + +How do we define it on arrows? +An entirely mechanical check tells us how: + +\[ +\begin{tikzcd} +FGFB + \arrow[r, shift left, "F \beta"] + \arrow[r, shift right, "\epsilon_{F B}"'] + \arrow[d, "FGFf"] +& FB + \arrow[r, "l_B"] + \arrow[d, "Ff"] +& L(B, \beta) + \arrow[d, dashrightarrow] +\\ +FGFC + \arrow[r, shift left, "F \gamma"] + \arrow[r, shift right, "\epsilon_{F C}"'] +& FC + \arrow[r, "l_C"] +& L(C, \gamma) +\end{tikzcd} +\] + +where the upper square and the lower square both commute (functoriality of $F$ and naturality of $\epsilon$, respectively). + +We can then chase arrows to show that $l_C \circ Ff$ coequalises $F\beta$ and $\epsilon_{FB}$, and so it must factor uniquely through $l_B$. + +\subsection{Functoriality of \texorpdfstring{$L$}{L}} + +We've shown that $L$ is defined, but is it a functor? +It needs to: + +\begin{enumerate} +\item preserve identities +\item preserve composition +\end{enumerate} + +Looking at the diagram at the end of the summary above (section \ref{summaryL}), it's clear that if $f = 1_B$ then the induced $L(f)$ must be $1_{(B, \beta)}$. + +For composition, if we just shove another copy of that diagram below itself, and join up all the arrows, it's another mechanical diagram-chase to show that the composition is preserved (using the uniqueness of the dashed arrows). + +\subsection{Adjointness of \texorpdfstring{$K$}{K} and \texorpdfstring{$L$}{L}} + +Finally, we get to the key bit! + +We'll use my favourite definition of an adjoint: as an initial object of $((A, \alpha) \downarrow L)$ for each $(A, \alpha) \in \mathcal{C}^{\mathbb{T}}$. + +The diagram we need to verify is: + +\[ +\begin{tikzcd} +FGFA + \arrow[r, shift left, "F \alpha"] + \arrow[r, shift right, "\epsilon_{F A}"'] +& F A + \arrow[r] +& L(A, \alpha) + \arrow[r, dashrightarrow, "g"] +& B +\\ +& & KL(A, \alpha) + \arrow[r, "K g"] +& K B +\\ +& & (A, \alpha) + \arrow[u, "i_{(A, \alpha)}"] + \arrow[ur, "f"'] +& +\end{tikzcd} +\] + +Since $KB = (GB, G \epsilon_B)$, the diagram becomes: + +\[ +\begin{tikzcd} +FGFA + \arrow[r, shift left, "F \alpha"] + \arrow[r, shift right, "\epsilon_{F A}"'] +& F A + \arrow[r] +& L(A, \alpha) + \arrow[r, dashrightarrow, "g"] +& B +\\ +& & KL(A, \alpha) + \arrow[r, "Kg"] +& (GB, G \epsilon_B) +\\ +& & (A, \alpha) + \arrow[u, "i_{(A, \alpha)}"] + \arrow[ur, "f"'] +& +\end{tikzcd} +\] + +\subsubsection{Uniqueness of $g$} + +Notice that the UMP of the coequaliser guarantees uniqueness of the arrow $g$, if it exists, because $FA \to B$ coequalises $F\alpha$ and $\epsilon_{FA}$ so it factors \emph{uniquely} through the coequaliser $L(A, \alpha)$. + +So we just need to define $i_{(A, \alpha)}$ and to show that $g$ exists for every $f$. + +\subsubsection{Existence of $g$} + +The only thing we can do with $f$ is to view it as a map $A \to GB$ and then apply $F$ to it, to raise it up to $\mathcal{D}$. +And once we do that, it immediately becomes clear that we want to show that $\epsilon_B \circ Ff: FA \to B$ coequalises $F\alpha$ and $\epsilon_{FA}$. +When we've got that, $\epsilon_B \circ Ff$ will factor through $L(A, \alpha)$ to give our map $g$. + +Since $f$ is an algebra hom, we have $f \alpha = G(\epsilon_B) \circ (Tf) = G(\epsilon_B \circ Ff)$. +Applying $F$ obtain $F(f \alpha) = FG(\epsilon_B \circ Ff)$; if we hit this on the left with $\epsilon_B$ then we can use naturality on the right-hand side, so have $$\epsilon_B F(f \alpha) = \epsilon_B FG(\epsilon_B \circ Ff) = \epsilon_B \circ Ff \circ \epsilon_{FA}$$ + +That is, for every arrow $f: (A, \alpha) \to K(B)$ there is exactly one arrow $g: L(A, \alpha) \to B$ such that $\epsilon_B \circ Ff = g \circ l: FA \to B$. + +\subsubsection{Definition of $i_A$} + +Conversely, any arrow $g: L(A, \alpha) \to B$ has a unique arrow $h = g \circ l: FA \to B$ satisfying $h(F \alpha) = h (\epsilon_{FA})$, and by the $F \dashv G$ adjunction, this corresponds to a unique arrow $\bar{h} = (Gh) \eta_A : A \to GB$. +This tells us to define $i_{(A, \alpha)} : (A, \alpha) \to KL(A, \alpha)$ by $i_{(A, \alpha)} = (Gl) \eta_A$, where recall that $\eta$ is the unit of the $F \dashv G$ adjunction. + +$i_{(A, \alpha)}$ really is a homomorphism: have $KL(A, \alpha) = (GL(A, \alpha), G \epsilon_{L(A, \alpha)})$, so need $$G \epsilon_{L(A, \alpha)} \circ T(i_{(A, \alpha)}) = i_{(A, \alpha)} \alpha$$ +The right-hand side is $(Gl) \eta_A \alpha$, which by definition of an algebra hom is just $Gl$; the left-hand side is $$G(\epsilon_{L(A, \alpha)} \circ FGl \circ F\eta_A)$$ +which by naturality of $\eta$ is $$G(l \epsilon_{FA} \circ F \eta_A)$$ +which by a monad identity is just $Gl$. + +So we have the homomorphism $i_{(A, \alpha)}$. + +\subsubsection{Verify the triangle} + +We have one thing left to do: to show that $Kg \circ i_{(A, \alpha)} = f$ for all $f: (A, \alpha) \to K(B)$. + +We have $$Kg \circ i_{(A, \alpha)} = Gg \circ Gl \circ \eta_A = G(\epsilon_B \circ Ff) \circ \eta_A$$ +so by naturality of $\eta$, this is just $f$ straight away. + +\subsection{Unmotivated bits} + +Once you have the idea of ``generalise the fact that $GB$ is a coequaliser'', the rest is basically forced, although it's taken me several hours to work out the steps. +(``Forced'' is not the same as ``easy''!) +That is the only unmotivated step in the construction of this adjoint. + +\section{Monadicity Theorems} + +We've actually done most of the hard work now. +We've shown that every algebra occurs as a certain coequaliser, and that under one of two conditions + +\begin{enumerate} +\item $\mathcal{D}$ has coequalisers of reflexive pairs +\item $G$ creates coequalisers of $G$-split pairs +\end{enumerate} +the comparison functor has a left adjoint which is defined as another coequaliser. +I'll restate them here for convenience. + +\ + +\begin{thm}[Algebra as coequaliser] +\[ +\begin{tikzcd} +(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"'] +& (TA, \mu_A) \arrow[r, "\alpha"] +& (A, \alpha) +\end{tikzcd} +\] +\end{thm} + +\ + +\begin{defn}[Adjoint $L$ of $K$] +\[ +\begin{tikzcd} +FGFA + \arrow[r, shift left, "F \alpha"] + \arrow[r, shift right, "\epsilon_{F A}"'] +& F A + \arrow[r, "l"] +& L(A, \alpha) +\end{tikzcd} +\] + +The unit of the adjunction is $i_{(A, \alpha)} = (Gl) \eta_A : (A, \alpha) \to KL(A, \alpha)$. +\end{defn} + +\ + +We also showed that if we forget the algebra structure from the coequaliser, then we get another coequaliser, this time in $\mathcal{C}$ rather than $\mathcal{C}^{\mathbb{T}}$; this was because the algebra-coequaliser diagram is in fact a split coequaliser, so is preserved by all functors. + +Remembering that $\mu_A = G \epsilon_{FA}$, we see that actually these two diagrams are extremely similar. +$(A, \alpha)$ is a coequaliser of the diagram you get when you apply $G$ to the diagram for $L(A, \alpha)$. + +We want to find two natural isomorphisms $\gamma: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$ and $\delta: 1_{\mathcal{D}} \to LK$; this will demonstrate the equivalence of the two categories. + +\subsection{Finding \texorpdfstring{$\delta: 1_{\mathcal{D}} \to LK$}{delta}} \label{findingdelta} + +Recall that for $B \in \mathcal{D}$, have $LK(B) = L(GB, G \epsilon_B)$. +That is, $LK(B)$ is the coequaliser in the following diagram: + +\[ +\begin{tikzcd} +FGFGB + \arrow[r, shift left, "FG \epsilon_B"] + \arrow[r, shift right, "\epsilon_{FGB}"'] +& FGB + \arrow[r, "l"] +& L(GB, G \epsilon_B) +\end{tikzcd} +\] + +Now observe that $\epsilon_B: FGB \to B$ coequalises that diagram too - that's just what it means for $\epsilon$ to be natural - so it must factor through the coequaliser. + +\[ +\begin{tikzcd} +FGFGB + \arrow[r, shift left, "FG \epsilon_B"] + \arrow[r, shift right, "\epsilon_{FGB}"'] +& FGB + \arrow[r, "l"] + \arrow[dr, "\epsilon_B"'] +& L(GB, G \epsilon_B) = LKB + \arrow[d, dashrightarrow, "\nu_B"] +\\ +& & B +\end{tikzcd} +\] + +We want $\nu_B$ to be an isomorphism. + +We had two conditions for $L$ being well-defined. + +\subsubsection{$G$ creates coequalisers of $G$-split pairs} +In this case, in fact $\epsilon_B$ is a coequaliser, not just coequalising, because when we apply $G$ to the diagram we get a coequaliser of a split pair. +So immediately $\nu_B$ is an iso. + +\subsubsection{$\mathcal{D}$ has coequalisers of reflexive pairs} +In this case, applying $G$ to the diagram, and assuming that $G$ preserves coequalisers of reflexive pairs, we will get a diagram as follows, where the top line is a coequaliser: + +\[ +\begin{tikzcd} +GFGFGB + \arrow[r, shift left, "GFG \epsilon_B"] + \arrow[r, shift right, "G \epsilon_{FGB}"'] +& GFGB + \arrow[r, "Gl"] + \arrow[dr, "G\epsilon_B"'] +& GL(GB, G \epsilon_B) = GLKB + \arrow[d, dashrightarrow, "G \nu_B"] +\\ +& & GB +\end{tikzcd} +\] + +But notice that the following is a split coequaliser diagram, and hence is a coequaliser: + +\[ +\begin{tikzcd} +GFGFGB + \arrow[r, shift left=0.7ex, "GFG \epsilon_B"] + \arrow[r, shift right=1.4ex, "G \epsilon_{FGB}"] +& GFGB + \arrow[l, shift left=2.1ex, "\eta_{GFGB}"] + \arrow[r, shift right, "G \epsilon_B"'] +& GB + \arrow[l, shift right, "\eta_{GB}"'] +\end{tikzcd} +\] + +Indeed: + +\begin{itemize} +\item The rightwards maps are coequalising (because if we remove a $G$ from the diagram, we get something which was coequalising) +\item $(G \epsilon_B) \eta_{GB} = 1_{GB}$ (adjunction identity) +\item $\eta_{GB} (G \epsilon_B) = (GFG \epsilon_B)(\eta_{GFGB})$ (naturality of $\eta$) +\item $(G \epsilon_{FGB})(\eta_{GFGB}) = 1_{GFGB}$ (adjunction identity) +\end{itemize} + +Therefore $G \nu_B$ is an isomorphism. +If we make the further assumption that $G$ reflects isos, then $\nu_B$ must be an isomorphism. + +What else did we assume in this section? +\begin{enumerate} +\item $G$ preserves coequalisers of reflexive pairs +\item $G$ reflects isomorphisms +\end{enumerate} + +\subsubsection{Naturality of the isomorphism} + +I claim that actually $\nu$ is the counit of the adjunction. +This would make it automatically natural. + +Recall from the start of this section (\ref{findingdelta}) that $\nu_B$ is just the factorisation of $\epsilon_B$ across the coequaliser $l: FGB \to L(GB, G \epsilon_B)$. + +Therefore it corresponds to $\epsilon_B$ in the natural bijection between ``maps $FGB \to B$ which coequalise $FG \epsilon_B$ and $\epsilon_{FGB}$'' and ``maps $L(GB, G \epsilon_B)$''. + +But $\epsilon_B$ corresponds naturally to the identity $1_{GB}$ via the adjunction $F \dashv G$. + +So using $U$ for the forgetful functor, $U(1_{KB}) = 1_{GB} \leftrightarrow \epsilon_B \leftrightarrow \nu_B$, all naturally, and so $\nu_B$ must be the lift of $1_{KB}$ over the adjunction $L \vdash K$. + +Therefore $\nu$ is actually the counit. + +\subsection{Finding \texorpdfstring{$\gamma: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$}{gamma}} + +Recall that the unit of the $L \dashv K$ adjunction was $i_{(A, \alpha)} = (Gl) \eta_A : (A, \alpha) \to KL(A, \alpha)$. + +We'll show that this unit is a natural isomorphism. + +Recall that $\alpha: (TA, \mu_A) \to (A, \alpha)$ is a coequaliser; that is precisely $\alpha: KFA \to (A, \alpha)$. + +Applying $K$ to the diagram of Definition 5, we must still get something which coequalises, and so it factors through the coequaliser: + +\[ +\begin{tikzcd} +KFGFA + \arrow[r, shift left, "KF \alpha"] + \arrow[r, shift right, "K \epsilon_{F \alpha}"'] +& KFA + \arrow[r, "\alpha"] + \arrow[dr, "K l"'] +& (A, \alpha) + \arrow[d, dashrightarrow] +\\ +& & KL(A, \alpha) +\end{tikzcd} +\] + +I claim that $i_{(A, \alpha)}$ is this dashed arrow. +Indeed, $i_{({A, \alpha})} \circ \alpha = (Gl) \eta_A \circ \alpha = Gl = Kl$, so by uniqueness of the lift, it must be $i_{(A, \alpha)}$. + +It is therefore immediately a natural transformation, being the unit of an adjunction. + +It is an iso: if we forget the algebra structure, we end up with the following diagram: + +\[ +\begin{tikzcd} +GFGFA + \arrow[r, shift left, "GF \alpha"] + \arrow[r, shift right, "G \epsilon_{F \alpha}"'] +& GFA + \arrow[r, "\alpha"] + \arrow[dr, "G l"'] +& A + \arrow[d, dashrightarrow, "i_A"] +\\ +& & GL(A, \alpha) +\end{tikzcd} +\] + +But $\alpha$ is still a coequaliser here (we proved earlier that it remained a coequaliser after forgetting the algebra structure), and $Gl$ is also a coequaliser because of our two assumptions: either $G$ preserves coequalisers of reflexive pairs (and hence the coequaliser structure of Definition 5's diagram), or else it creates coequalisers of $G$-split pairs and so we're immediately done. + +Therefore $i_A$ must be an isomorphism in $\mathcal{C}$, because $\alpha$ and $Gl$ are both coequalisers. + +It is also an isomorphism in $\mathcal{C}^{\mathbb{T}}$, because the forgetful functor reflects isomorphisms: suppose $f: (A, \alpha) \to (B, \beta)$ has an inverse $g: B \to A$ once the algebra structure is forgotten. +Then $g$ is an algebra homomorphism, because we just need $g \beta = \alpha T(g)$; but that is equivalent to $\beta T(f) = f \alpha$ by pre-composing $f$ and post-composing $T(f)$. +Because $f$ is already an algebra homomorphism, this condition is true. + +Therefore $i_A$ is an iso in $\mathcal{C}^{\mathbb{T}}$, and so we are finished. + +\section{Summary} + +We have shown that $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ has a left adjoint $L$ under one of two conditions; +that the adjunction's unit is an isomorphism under a further assumption; +and that the adjunction's counit is an isomorphism under a further assumption. + +Together, this tells us that $K$ is part of an equivalence: that is, $G$ is monadic. + +\ + +\begin{thm}[Monadicity Theorems, one direction] +Suppose $G: \mathcal{D} \to \mathcal{C}$ is a functor which has a left adjoint. If +\begin{enumerate} +\item $G$ creates coequalisers of $G$-split pairs, or +\item $G$ reflects isomorphisms, $\mathcal{D}$ has coequalisers of reflexive pairs, and $G$ preserves them +\end{enumerate} +then $G$ is monadic. +\end{thm} + +\end{document} \ No newline at end of file diff --git a/MultiplicativeDetProof.tex b/MultiplicativeDetProof.tex new file mode 100644 index 0000000..e6da07a --- /dev/null +++ b/MultiplicativeDetProof.tex @@ -0,0 +1,85 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{amsmath,amsthm} +\usepackage{epstopdf} +\usepackage{hyperref} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + +\title{Proof that the determinant is multiplicative} +\author{Patrick Stevens} +\date{17th April 2014} +\newtheorem{theorem}{Theorem}[section] +\newtheorem{lemma}[theorem]{Lemma} +\newenvironment{definition}[1][Definition]{\begin{trivlist} +\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}} + +\begin{document} + +\maketitle + +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/MultiplicativeDetProof/MultiplicativeDetProof.pdf} \end{center} + +\normalsize + +\ + +This is a very concrete proof of the multiplicity of the determinant. +It contains no cleverness at all, and is simply manipulation of expressions. + +\ + +\begin{definition} +The \emph{determinant} of a matrix $A$ is given by $$\sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{i, \sigma(i)}$$ +where $S_n$ is the symmetric group on $n$ elements, and $\epsilon$ is the signature of that element. +\end{definition} + +\begin{lemma} +Let $\rho \in S_n$, and let $A$ be a matrix. Then +$$\sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{\rho(i) \sigma(i)} = \epsilon(\rho) \det(A)$$ +\end{lemma} + +\begin{proof} +\begin{align*} +\epsilon(\rho) \det(A) &= \epsilon(\rho) \sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{i, \sigma(i)} +\\&= \sum_{\sigma \in S_n} \epsilon(\sigma \rho) \prod_{i=1}^n A_{i, \sigma(i)} +\\&= \sum_{\sigma \in S_n} \epsilon(\sigma \rho) \prod_{i=1}^n A_{\rho(i), \sigma(\rho(i))} +\\&= \sum_{\tau \in S_n} \epsilon(\tau) \prod_{i=1}^n A_{\rho(i), \tau(i)} +\\&= \sum_{\tau \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{\rho(i), \sigma(i)} +\end{align*} + +\end{proof} + +\begin{theorem}$$\det(AB) = \det(A) \det(B)$$ +\end{theorem} +\begin{proof} +We use summation convention throughout. +\begin{align*} +\det(AB) &= \sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n (AB)_{i, \sigma(i)} +\\ &= \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, k_1} B_{k_1, \sigma(1)} A_{2, k_2} B_{k_2, \sigma(2)} \dots A_{n, k_n} B_{k_n, \sigma(n)} +\\ &= \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, k_1} A_{2, k_2} \dots A_{n, k_n} B_{k_1, \sigma(1)} \dots B_{k_n, \sigma(n)} +\end{align*} +But the $k_1, \dots, k_n$ only ever contribute when they are a permutation of $1, \dots, n$, because (assuming wlog $k_1 = k_2$) for each $\sigma^+$ there exists $\sigma^-$ such that $\sigma^+(1) = \sigma^-(2), \sigma^-(1) = \sigma^+(2), \sigma^-(k) = \sigma^+(k)$ for other $k$. Then we have +$$A_{1, k_1} B_{k_1, \sigma^+(1)} A_{2 k_1} B_{k_1 \sigma^+(2)} \text{terms} = A_{1, k_1} B_{k_1, \sigma^-(1)} A_{2 k_1} B_{k_1 \sigma^-(2)} \text{terms}$$ +and because $\epsilon$ negates the sign, we have that these two terms cancel. + +Hence the sum over $k_i$ is in fact a sum over all $\rho$ such that $\rho(i) = k_i$ for all $i$: +Then $$\det(AB) = \sum_{\rho \in S_n} \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, \rho(1)} A_{2, \rho(2)} \dots A_{n, \rho(n)} B_{\rho(1), \sigma(1)} \dots B_{\rho(n), \sigma(n)}$$ +Applying the lemma gives +\begin{align*} +\det(AB) &= \det(B) \sum_{\rho \in S_n} \epsilon(\rho) A_{1, \rho(1)} A_{2, \rho(2)} \dots A_{n, \rho(n)} +\\&= \det(B) \det(A) +\end{align*} + +\end{proof} + +\end{document} \ No newline at end of file diff --git a/NonstandardAnalysisPartIII.tex b/NonstandardAnalysisPartIII.tex new file mode 100644 index 0000000..dfd1751 --- /dev/null +++ b/NonstandardAnalysisPartIII.tex @@ -0,0 +1,1695 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{amsthm} +\usepackage{epstopdf} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{mathtools} +\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + +\newmdtheoremenv{defn}{Definition}[section] +\newmdtheoremenv{thm}{Theorem}[section] +\newmdtheoremenv{motiv}{Motivation}[section] +\newmdtheoremenv{lemma}[thm]{Lemma} + +\theoremstyle{remark} +\newtheorem*{remark}{Remark} +\newtheorem*{caveat}{Caveat} +\newtheorem*{example}{Example} +\newtheorem*{motivation}{Motivation} + +\newcommand{\st}{\mathrm{st}} +\newcommand{\app}[1]{\mathrm{app}\left[#1\right]} +% the \hyp command defaults to *\mathbb{R}, but when supplied with an [argument], puts the star in front of it +\newcommand{\hyp}[1][\mathbb{R}]{\prescript{*}{}{#1}} +\newcommand{\near}{\simeq} +\newcommand{\symdiff}{\triangle} +\newcommand{\disjointunion}{\sqcup} +\newcommand{\powerset}{\mathcal{P}} +\newcommand{\gaussian}{\Phi} + +% found on Stack Exchange http://tex.stackexchange.com/a/6071 +\newcommand{\unaryminus}{\scalebox{0.75}[1.0]{\( - \)}} + +\title{Non-standard Analysis} +\author{Patrick Stevens} +\thanks{Supervised by Dr Thomas Forster for the essay component of Part III of the Cambridge Mathematical Tripos.} +\date{Composed from October 2015 to April 2016. Released on 15th June 2016.} + +\begin{document} +\maketitle +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/NonstandardAnalysis/NonstandardAnalysisPartIII.pdf} \end{center} +Licence: CC BY-SA \url{https://creativecommons.org/licenses/by-sa/4.0/}. +You are free to share and adapt this work for any purpose, even commercially, as long as you attribute it, indicating which changes were made, without suggesting that the licensor endorses you or your use. +You must distribute derivative works under the same license as the original. +\normalsize +\pagebreak + +\tableofcontents +\section{What is non-standard analysis?} + +Non-standard analysis is the study of a model of the reals in which there are \emph{infinitesimals}: there is some $\varepsilon > 0$ such that, for all ``standard'' reals $x$, we have $\varepsilon < x$. +We shall follow P\'{e}try \cite{petry} in assuming the existence of a ``field of hyperreals'': $\hyp$, an ordered field which extends $\mathbb{R}$'s order structure and inherits its multiplicative structure. +The key fact about $\hyp$ which we shall also impose is the \emph{Transfer Principle} of Definition \ref{defn:transfer}. +The truth of the transfer principle follows from \L os's Theorem (see Theorem 4.3 of Davis \cite{davis}). +However, this essay aims for a treatment of the analysis involved, rather than getting bogged down in foundational details, so we will not prove it here. + +\ + +\begin{defn}[Transfer Principle] \label{defn:transfer} +Let $\phi$ be a first-order sentence in the language of the totally ordered field $\mathbb{R}$. +(That is, $\phi$ quantifies only over real numbers, not over subsets of the reals.) +Then $\phi$ is true in $\mathbb{R}$ if and only if $\hyp[\phi]$ is true in $\hyp$, where $\hyp[\phi]$ is obtained by replacing all quantifiers $(\forall x \in \mathbb{R})$ with $(\forall x \in \hyp)$ and $(\exists x \in \mathbb{R})$ with $(\exists x \in \hyp)$. +\end{defn} + +\ + +This version of the transfer principle is not actually the strongest possible; we will return to this in Section \ref{sec:internal}, where we examine certain cases under which we may quantify over sets. + +There is a simpler but more imprecise form, which we state here for motivation in the form given by P\'{e}try: + +\ + +\begin{defn}[Transfer Principle, imprecise form] \label{defn:transferimprecise} +If two (first-order definable) systems of equations are equivalent in $\mathbb{R}$, then they are equivalent in $\hyp$. +Moreover, every (first-order definable) function $f: \mathbb{R} \to \mathbb{R}$ extends to a function $\hyp[f] : \hyp \to \hyp$ such that $\hyp[f] \vert_{\mathbb{R}} = f$. +\end{defn} + +\ + +In the first sections of this essay, whenever we invoke the transfer principle, all the functions and sets we will consider shall be first-order definable. +Therefore this formulation suffices for the moment. + +\begin{example} +To give an example of the transfer principle in action, consider the two equivalent sets $$\{x \in \mathbb{R}: x^2 > 7 \} ; \{x \in \mathbb{R}: x > \sqrt{7} \vee -x > \sqrt{7} \}$$ + +We may define $7$ in a first-order way, as $1+1+1+1+1+1+1$ (recalling that we are working in the language of the totally ordered field $\mathbb{R}$). +Similarly, $-1$ has the following first-order description: $$(\exists x \in \mathbb{R})(x+1 = 0)$$ +The square-root function may be expressed in a first-order way: $$(\forall x \geq 0) (\exists y \geq 0)(y^2 = x)$$ so (imprecisely) $\sqrt{}: x \mapsto \sqrt{x}$ extends to a function $\hyp[\sqrt{}] : \hyp \to \hyp$, and (formally) $\hyp[\sqrt{}]$ is described by $$(\forall x \in \hyp)(x \geq 0 \rightarrow (\exists y \in \hyp)([y \geq 0] \wedge [y \times y = x]))$$ +which is a true statement of $\hyp$ by the transfer principle. + +Then viewing $7$ and $-1$ as elements of $\hyp$, the transfer principle states (imprecisely) that $$\{x \in \hyp: x^2 > 7\}; \{ x \in \hyp: x > \hyp[\sqrt{7}] \vee -x > \hyp[\sqrt{7}] \}$$ are equivalent sets in $\hyp$. +Formally, $$(\forall x \in \mathbb{R})(x \times x > 7 \leftrightarrow [x > \sqrt{7}] \vee [x < -1 \times \sqrt{7}])$$ +and so by transfer, $$(\forall x \in \hyp)(x \times x > 7 \leftrightarrow [x > \hyp[\sqrt{7}]] \vee [x < -1 \times \hyp[\sqrt{7}]] )$$ +Therefore, the inequality $x^2 > 7$ has solutions $x > \sqrt{7}$ and $x < -\sqrt{7}$ whether we are working in $\mathbb{R}$ or $\hyp$. +\end{example} + +\begin{remark}[Existence of $\hyp$] +Why are we justified in assuming the existence of $\hyp$? +The spirit in which we do so is akin to the fact that we teach arithmetic while perfectly content not to define $\mathbb{N}$ rigorously, but simply to recognise its existence. +To obtain an explicit construction, we may use ultrapowers (see Robinson \cite{robinson}); we will sketch this approach in Section \ref{sec:internal} of this essay. +Alternatively, we may understand the essence of why a model exists via the Compactness Theorem of first-order logic. +Adjoining a symbol $\varepsilon$ to the language of the reals, and adding axioms that $$\{ \varepsilon > 0 ; \varepsilon < \frac{1}{n} : n \in \mathbb{N} \}$$ +produces a system which has a model by compactness. +Such a model contains an infinitesimal $\varepsilon$. +\end{remark} + +\section{Basic definitions} + +We consider a non-standard model $\hyp$ of the reals. +The key difference between $\hyp$ and $\mathbb{R}$ is that every finite $r \in \hyp$ has a \emph{standard part}, denoted $\st(r)$, and a \emph{non-standard part}. +The standard part is a standard real (that is, a member of $\mathbb{R}$) which is ``infinitely close'' to $r$: that is, $r - \st(r)$ is infinitesimal. + +We write $r \near s$ for the relationship ``$r-s$ is infinitesimal'': explicitly, $r \near s$ iff either $r = s$ or for every $\varepsilon \in \mathbb{R}$ with $\varepsilon > 0$, we have +$$\begin{dcases} +0 < r-s < \varepsilon & \text{if $r > s$} \\ +0 < s-r < \varepsilon & \text{if $r < s$} +\end{dcases} +$$ + +\begin{defn} \label{defn:finite} +A hyperreal $r$ is \emph{finite} iff there is a standard real $B$ such that $-B \leq r \leq B$. +\end{defn} + +\begin{caveat} Several authors draw a distinction between various different flavours of ``finite'', ``limited'', ``appreciable'' and similar terms, denoting combinations of ``finite and not infinitesimal'', ``finite infinitesimal'' and so forth. +We will not need such distinctions in this essay, so will use simply the word ``finite'' as in Definition \ref{defn:finite}, but when reading other authors, make sure to look up their terminology. +\end{caveat} + +\ + +\begin{thm} +Every finite hyperreal $r$ has a unique real standard part. +\end{thm} + +\ + +We give the proof here as a simple demonstration of some of the operations on infinitesimals. + +\begin{proof} +For uniqueness: suppose $x, y \in \mathbb{R}$ are both ``infinitely close'' to $r \in \hyp$. +That is, for all $\varepsilon \in \mathbb{R}^{>0}$, $$|r-x| < \varepsilon, |r-y| < \varepsilon$$ +By the triangle inequality, $$|(r-x) - (r-y)| < 2 \varepsilon$$ +so $|x-y|$ is less than $2 \varepsilon$ for all $\varepsilon$. + +(The triangle inequality is true by the transfer principle, but we will omit the easy proof of this fact. +We will soon see a fully-worked example of the use of the transfer principle, in Theorem \ref{thm:differentiatepolynomial}.) + +Since $x, y$ are both standard reals, this means $x-y = 0$. + +For existence: if $r \in \hyp$ is finite, then there is positive $B \in \mathbb{R}$ such that $-B \leq r \leq B$. +Then $$S := \{ x \in \mathbb{R}: r \leq x \}$$ is a set of reals which is nonempty (containing $B$), and it is bounded below (by $-B$), so it has a greatest lower bound, which we shall optimistically call $\st(r)$ (for ``standard''). + +By construction, $\st(r)$ is indeed a standard real. +We will prove by contradiction that $r-\st(r)$ is infinitesimal. + +If $r = \st(r)$ then we are instantly done, so suppose that $|r - \st(r)| > \varepsilon$ for some $\varepsilon \in \mathbb{R}^{>0}$. + +If $r > \st(r)$, then $r > \varepsilon + \st(r)$. +Then in fact $\st(r) + \frac{\varepsilon}{2}$ is a lower bound for $S$, contradicting the definition of $\st(r)$ as a \emph{greatest} lower bound. + +If $r < \st(r)$, then $r < \st(r) - \varepsilon$, so $\st(r) - \frac{\varepsilon}{2}$ lies in $S$, contradicting the definition of $\st(r)$ as a lower bound for $S$. +\end{proof} + +This justifies the definition of $\st(r)$ as \emph{the} standard part of $r$. + +There are many easy results about standard parts, which all have similar patterns of proof; we omit the proofs for brevity, and we may use without comment certain obvious results such as the following. + +\begin{itemize} +\item $\st(-u) = - \st(u)$ +\item $\st(u+v) = \st(u) + \st(v)$ +\item if $u \leq v$, then $\st(u) \leq \st(v)$. +\end{itemize} + +The collection of hyperreals infinitesimally close to hyperreal $r$ is known as the \emph{monad} of $r$, and the ``local properties'' we study in Analysis can often be related to the study of behaviour in the monad of a point. + +\section{Derivatives} + +As an example of the idea of ``local properties may be defined in monads'', we give the following definition of the derivative: + +\ + +\begin{defn} \label{derivative} +The \emph{derivative} of $f: \mathbb{R} \to \mathbb{R}$ at $x \in \mathbb{R}$ is defined to be $$f'(x) := \st \left( \dfrac{\hyp[f](x+\delta)-\hyp[f](x)}{\delta} \right)$$ +for any infinitesimal $\delta$; $f$ is said to be \emph{differentiable} at $x$ if this is well-defined. +\end{defn} + +\ + +By way of demonstration, consider the function $f: \mathbb{R} \to \mathbb{R}$ by $x \mapsto x^n$ (some $n \in \mathbb{N}$). +By the (imprecise) transfer principle, this defines a function $\hyp[f]: \hyp \to \hyp$ which coincides with the standard version on standard reals. +This being our first theorem proved by non-standard methods, we will walk through it in complete pedantic detail; subsequent proofs will be considerably lighter to aid comprehension. + +\ + +\begin{thm} \label{thm:differentiatepolynomial} +If $f: \mathbb{R} \to \mathbb{R}$ by $x \mapsto x^n$, then $f'(x) = n x^{n-1}$. +\end{thm} +\begin{proof} +The function $f: \mathbb{R} \to \mathbb{R}$ by $x \mapsto x^n$ admits the first-order description $$(\forall x \in \mathbb{R})(\exists y \in \mathbb{R})(y = x \times x \times \dots \times x)$$ +where there are $n$ terms in the product. +(Henceforth we will use the shorthand $r^n$ for $r \times r \times \dots \times r$, with $n$ terms in the product.) +This is a schema of true statements in $\mathbb{R}$, one for each $n \in \mathbb{N}$. + +Therefore, by the transfer principle, for each $n \in \mathbb{N}$ the following sentence is true in $\hyp$, so we may use it to define $\hyp[f]: \hyp \to \hyp$: $$(\forall x \in \hyp)(\exists y \in \hyp)(y = x^n)$$ + +Recall that we wish to show that $$\st \left( \frac{(x+\delta)^n-x^n}{\delta} \right)$$ is well-defined as $\delta$ varies, and equal to $n x^{n-1}$; for this, we need the binomial theorem. + +We will require the following shorthand: fixing $n$ and $m$, we understand $\binom{n}{m}$ as being defined by the natural number $$\frac{n!}{m! (n-m)!}$$ +which is expressed as $1+1+\dots+1$ for appropriately many terms in the summand. + +Now for the binomial theorem itself: +$$(\forall x \in \mathbb{R})(\forall y \in \mathbb{R})((x+y)^n = \binom{n}{0} x^n + \binom{n}{1} x^{n-1} y + \dots + \binom{n}{n-1} x y^{n-1} + \binom{n}{n} y^n)$$ +This is again a schema of true statements in $\mathbb{R}$, one for each $n \in \mathbb{N}$, and each of those statements is to be understood as being written out in full with no elided terms. +We emphasise that $\binom{n}{m}$ is simply a number: it does not even contain any bound variables, and if we expanded out all of the shorthand, it would appear to be in the form $1+1+\dots+1$. + +Since the binomial theorem holds in the reals, every statement in that schema must be true in the hyperreals when we transfer to $\hyp$: + +$$(\forall x \in \hyp)(\forall y \in \hyp)((x+y)^n = \binom{n}{0} x^n + \binom{n}{1} x^{n-1} y + \dots + \binom{n}{n-1} x y^{n-1} + \binom{n}{n} y^n)$$ + +Notice that behind the scenes, $n$ is the same $1+1+\dots+1$ as it always was, so there is no transfer required in the definition of $\binom{n}{m}$. + +Therefore, in particular, we have $$\frac{(x+\delta)^n-x^n}{\delta} = n x^{n-1} + \delta \cdot \binom{n}{2} x^{n-2} + \dots + \delta^{n-2} \cdot \binom{n}{n-1} x^1 + \delta^{n-1}$$ +where $\delta$ is any infinitesimal. + +The standard part of this is $n x^{n-1}$, since every subsequent term has a factor of $\delta$. +(We are implicitly using that adding infinitesimals does not change standard parts, and multiplying a standard real or an infinitesimal by an infinitesimal results in an infinitesimal.) +\end{proof} + +How about an example where the derivative fails to be defined? +The canonical example is, of course, $g(x) = |x|$. +This extends to $\hyp[g]: \hyp \to \hyp$; for readability, we will suppress the asterisk on $\hyp[|]\cdot | $, and henceforth we may use $\hyp[f]$ without the preamble stating the first-order formula which describes it. +We shall take it as read that for any first-order definable function $f$, $\hyp[f]$ is defined by the transfer of the formula defining $f$. + +\ + +\begin{thm} +If $g: \mathbb{R} \to \mathbb{R}$ by $x \mapsto |x|$, then $g'$ is not defined at $0$, but is defined elsewhere and is equal to the sign of $x$. +\end{thm} +\begin{proof} +At $0$, the derivative would be $$\st \left( \frac{|\delta|}{\delta} \right) = \hyp[\text{sgn}](\delta)$$ which is not constant as $\delta$ varies over the monad of $0$, because there are both positive and negative infinitesimals. +(If $\delta$ is a positive infinitesimal, then $-\delta$ is negative.) + +Elsewhere, however, the derivative would be $$\st \left( \frac{|x+\delta| - |x|}{\delta} \right)$$ +which splits into two cases: + +$$\begin{dcases} ++1 = \frac{x+\delta - x}{\delta} & x > 0 \\ +-1 = \frac{-(x+\delta) - (-x)}{\delta} & x < 0 +\end{dcases}$$ + +This is because $x < 0$ implies $|x + \delta| \leq |x| + |\delta| \near |x| < 0$, and similarly for $x > 0$. +\end{proof} + +\subsection{The chain rule} + +We present the chain rule as a simple example of a theorem whose proof is made very neat and tidy by the use of infinitesimals. + +\ + +\begin{thm}[Chain rule] +Let $f, g: \mathbb{R} \to \mathbb{R}$ be differentiable [in the non-standard sense]. Then $f \circ g$ is differentiable, and $(f \circ g)'(x) = f'(g(x)) g'(x)$. +\end{thm} +\begin{proof} +Consider $$X = \st \left( \frac{f(g(x+\delta)) - f(g(x))}{\delta} \right)$$ + +We have $$g(x+\delta) = (g'(x) + \varepsilon) \delta + g(x)$$ +for some $\varepsilon$ infinitesimal (depending on $\delta$), by definition of $g'(x)$ as $\st \left( \frac{g(x+\delta) - g(x)}{\delta} \right)$. + +Substituting this into $X$, obtain $$X = \st \left( \frac{f[g(x) + \delta (g'(x) + \varepsilon)] - f(g(x))}{\delta} \right)$$ + +By differentiability of $f$, have $$f'(g(x)) = \st \left( \frac{f(g(x) + \gamma) - f(g(x))}{\gamma} \right)$$ +for any infinitesimal $\gamma$; so letting $\gamma = \delta (g'(x) + \varepsilon)$, obtain +$$f'(g(x)) = \st \left( \frac{f[g(x) + \delta (g'(x) + \varepsilon)]-f(g(x))}{\delta (g'(x) + \varepsilon)} \right)$$ + +Factoring out $\st(g'(x) + \varepsilon) = g'(x)$ from the denominator, this is simply $\frac{X}{g'(x)}$, so $$X = g'(x) f'(g(x))$$ + +\end{proof} + +Notice how we did not require any bounding of error terms: the infinitesimals ``bounded themselves'' and simply vanished at the end without further hassle. + +The rules of \emph{linearity of the derivative} and the \emph{product rule} follow very similarly. + +\ + +\begin{thm} \label{thm:increasingimpliesderivativepositive} Let $f: [a,b] \to \mathbb{R}$ be continuous on $[a,b]$ and differentiable on $(a,b)$. +If $f$ is increasing, then $f'(x) \geq 0$ for all $x$. +\end{thm} +\begin{proof} +It is a first-order fact that for all $y > x$, have $f(y) \geq f(x)$. +Therefore it remains true on moving to the hyperreals. + +Then $$\frac{f(x+\delta) - f(x)}{\delta} \geq 0$$ whenever $\delta$ is infinitesimal, so it remains so on taking standard parts. +\end{proof} +\begin{remark} +This is a partial converse to the theorem that having a positive derivative means the function is increasing; that fact, however, is a purely standard consequence of the Mean Value Theorem (which is itself a consequence of Rolle's Theorem, Theorem \ref{thm:rolle}), so we will not prove it here. +Moreover, the standard proof of the ``standard consequence'' is elegant; this is an advertisement for using both standard and non-standard methods together. +\end{remark} + +\section{Continuity} + +Recall the definition of a closed interval in $\mathbb{R}$: $$[a, b] = \{ x \in \mathbb{R}: a \leq x \leq b \}$$ +The transfer principle lets us carry this over to $\hyp$: to $$\hyp[[]a, b] = \{ x \in \hyp: a \leq x \leq b \}$$ + +\begin{remark}There are hyperreals \emph{infinitesimally} less than $a$, which do not appear in $\hyp[[]a, b]$. +Such hyperreals have standard part equal to $a$ and yet are not in $\hyp[[]a, b]$. +This demonstrates that $\st(r) \in X$ does not necessarily imply $r \in \hyp[X]$. +\end{remark} + +\ + +\begin{defn}[Continuity] We say a function $f: [a, b] \to \mathbb{R}$ is \emph{continuous} at $x \in [a, b]$ iff $\hyp[f](x+\delta) \near \hyp[f](x)$ for all $\delta$ infinitesimal with $x+\delta \in \hyp[[]a, b]$. That is, $\hyp[f]$ has a standard part which is constant on the monad of $x \in \mathbb{R}$. +\end{defn} + +\ + +For illustration, we shall prove here that this definition is equivalent to the $\varepsilon$-$\delta$ definition of continuity, but in general we will use non-standard definitions freely throughout this essay. +(We will not prove that the non-standard definition of ``derivative'' is equivalent to the standard one, for instance.) + +\ + +\begin{thm} \label{thm:equivalence_of_continuity} The function $f: [a, b] \to \mathbb{R}$ is $\varepsilon$-$\delta$ continuous at $x$ iff it is continuous at $x$ in the non-standard sense. +\end{thm} + +\ + +To make the proof easier to read, we will continue to suppress the asterisk in the extension $\hyp[|] \cdot |$ of the modulus function. +This proof may be found as Corollary 7.1.2 in Goldblatt, although it is not very difficult to come up with by oneself. + +\begin{proof} +Forward direction: let $x \in [a,b]$, and suppose for all $\varepsilon$ there is $\delta$ such that for all $y$ with $|y-x| < \delta$, have $|f(y)-f(x)| < \varepsilon$. +Then letting $\varepsilon_n = \frac{1}{n}$, obtain $\delta_n$ such that for all $y$ with $|y-x| < \delta_n$, $|f(y) -f(x) | < \frac{1}{n}$. +Transfer this collection of facts (one for each $n \in \mathbb{N}$) to $\hyp$. + +But every $y \in \hyp$ with $y \near x$ satisfies $|y-x| < \delta_n$ for all $n$, so $|\hyp[f](y) - \hyp[f](x)| < \frac{1}{n}$ for all $n$. +That is, $\hyp[f](y) \near \hyp[f](x)$ for all $y \near x$. + +Conversely, suppose $f$ is nonstandard-continuous at $x \in \mathbb{R}$. +Let $\varepsilon \in \mathbb{R}^{>0}$. +For any infinitesimal $\delta > 0$, have that for all $y \in \hyp$ with $|y-x| < \delta$, $| \hyp[f](y) - \hyp[f](x)| < \varepsilon$. +Therefore (in particular) there is some $\delta \in \hyp^{>0}$ such that for all $y \in \hyp$ with $|y-x| < \delta$, $| \hyp[f](y) - \hyp[f](x)| < \varepsilon$. + +By the reverse direction of the transfer principle, the same statement must also hold in $\mathbb{R}$: +there is $\delta \in \mathbb{R}^{>0}$ such that for all $y \in \mathbb{R}$ with $|y-x| < \delta$, have $|f(y) - f(x)| < \varepsilon$. + +\end{proof} + +This is one of the only places we will use the reverse direction of the transfer principle; Theorem \ref{thm:equivalence_of_continuity} was included only to demonstrate the usual character of such proofs. +Again, we will focus on the development of analysis results in non-standard analysis rather than on foundational rigour. +The reverse of the transfer principle appears usually when showing that standard definitions are equivalent to non-standard ones. + +Recall from Definition \ref{derivative} that $f: [a,b] \to \mathbb{R}$ is \emph{differentiable} at $x \in \mathbb{R}$ iff the quantity $$\st \left( \frac{f(x+\delta) - f(x)}{\delta} \right)$$ +is invariant for infinitesimal $\delta$. + +Then it is easy to see that continuity is required for differentiability. +Indeed, if $f(y)$ and $f(x)$ are not infinitesimally close even though $y = x+\delta \near x$, then $f(x+\delta) - f(x)$ is not infinitesimal, so $\frac{f(x+\delta) - f(x)}{\delta}$ must be infinite. + +\subsection{Uniform continuity} \label{sec:uniform} By definition, $f: [a, b] \to \mathbb{R}$ is continuous at $x \in \mathbb{R}$ iff $\hyp[f] : \hyp[[]a, b] \to \hyp$ has invariant standard part when we perturb standard-real $x \in [a,b]$ infinitesimally. +If we form the stronger requirement that $\hyp[f]$ have invariant standard part when we perturb \emph{any hyperreal} $x \in \hyp[[]a, b]$ infinitesimally, then we obtain \emph{uniform continuity}. +This fact is highly surprising, but for reasons of space we will not prove it here. + +\ + +\begin{thm} \label{thm:continuous_implies_uniformly_continuous} Let $f: [a,b] \to \mathbb{R}$ be continuous. Then $f$ is uniformly continuous. +\end{thm} +\begin{proof} +Let $x \in \hyp[[]a, b]$, and $y \near x$. Then say $z = \st(x) \in \mathbb{R}$. +Certainly $\st(y) = \st(x)$, so $f(x) \near f(z)$ by continuity of $f$ at the real $z$; and $f(z) \near f(y)$ similarly. +Hence by transitivity of $\near$, have $f(x) \near f(y)$. +\end{proof} + +\begin{example} +Notice how this fails for $f: (0, 1] \to \mathbb{R}$ by $x \mapsto \frac{1}{x}$. +Indeed, our choice of $x$ could have been infinitesimal; then $z = \st(x) = 0$ lies outside the domain of $f$. +This formulation makes it obvious what fact we used about the non-standard formulation of a compact set: namely, that the set is closed under taking standard parts. +\end{example} + +In a standard setting, the two usual proofs of Theorem \ref{thm:continuous_implies_uniformly_continuous} go via convergence of sequences and Bolzano-Weierstrass, or via the Lebesgue number lemma. +This beautifully elegant proof has eliminated an enormous amount of the complexity of the same result in the standard setting, and its simplicity and lack of clutter reveals precisely what ``compact'' should mean in more general non-standard metric spaces. + +\subsection{Compactness} + +Robinson discovered an extremely neat formulation of the notion of compactness. + +\ + +\begin{defn}[Compactness] \label{defn:compact} A set $X \subseteq \mathbb{R}$ is \emph{compact} iff every $x \in \hyp[X]$ has some $y \in X$ with $x \near y$. +\end{defn} + +\ + +From this definition, it is easy that the continuous image of a compact set is compact. + +\ + +\begin{thm} \label{thm:continuouscompact} Let $f: X \to \mathbb{R}$ be continuous and $X$ compact. +Then the image of $f$ is compact. +\end{thm} +\begin{proof} +Let $y \in \hyp[(f(X)]) = \hyp[f](\hyp[X])$. +We wish to show that there is $x \in f(X)$ with $y \near x$. + +Say $y = \hyp[f](a)$, and let $x = f(\st(a))$. +Then $\hyp[f](a) \near \hyp[f](\st(a))$ by continuity of $f$, so we are immediately done. +\end{proof} + +To obtain a more specific form of Theorem \ref{thm:continuouscompact} without using the Heine-Borel theorem---that is, using closed boundedness directly rather than through compactness---it will be useful to develop the idea of the hypernatural numbers and hyperfinite partitions of a set. + +\section{Hypernaturals} \label{sec:hypernaturals} + +For the moment, we will work within the ultrapower construction of the hyperreals. +It is possible to proceed while remaining agnostic about the construction, but the details for this section are much simpler when working with an ultrapower. +We will paint the construction in broad strokes here. + +\subsection{Internal sets} \label{sec:internal} + +\begin{motivation} +The motivating example for this section is the canonical model of Peano arithmetic embedded in $\mathbb{R}$: namely, $\mathbb{N}$. +When we pass to $\hyp$, we might expect to obtain a non-standard model of Peano arithmetic, playing the same role in $\hyp$ as $\mathbb{N}$ does in $\mathbb{R}$. + +The transfer principle tells us, informally, that ``from within the model, everything looks like $\mathbb{R}$''. +We would like a way to capture, for instance, the \emph{Archimedean property}: that for every $r \in \mathbb{R}$, there is $n \in \mathbb{N}$ such that $r < n$. +That is, we seek a set of ``hypernaturals'', forming a non-standard model of Peano arithmetic embedded in $\hyp$, such that every hyperreal $r$ has a hypernatural $N$ bounding it above. +\end{motivation} + +We will require the notion of an ``internal'' set: intuitively, ``a set which exists in the model of $\hyp$''. +For most of our purposes, it is enough to know that sets whose members are precisely those hyperreals satisfying some first-order property are internal (which is proved in Section 11.7 of Goldblatt \cite{goldblatt}, and is indeed a form of the transfer principle), but we shall be more explicit using the ultrapower construction. + +Given a nonprincipal ultrafilter $\mathcal{F}$ on $\mathbb{N}$, we construct $\hyp$ as the collection of sequences of reals, modulo the equivalence relation that $\langle r_n \rangle \sim \langle s_n \rangle$ if and only if $\{ n \in \mathbb{N} : r_n = s_n \} \in \mathcal{F}$. +The intuition is that the two sequences ``agree almost everywhere''. +(See Chapter 3 of Goldblatt \cite{goldblatt}). +We write $[r_n]$ for the equivalence class of the sequence $\langle r_n \rangle$. + +Fix a sequence of sets $\langle A_n \rangle$ with each $A_n \subseteq \mathbb{R}$. +The \emph{internal set} $[A_n]$ is defined as follows: +$$\text{$[r_n] \in [A_n]$ if and only if $\{ n \in \mathbb{N} : r_n \in A_n \} \in \mathcal{F}$.}$$ + +The internal sets are precisely those which may be obtained in this way; they are ``the sets which exist in the model'', and they will turn out to be the sets over which we may quantify. + +Likewise an \emph{internal function} is obtained from a sequence $\langle f_n \rangle$ of real-valued functions $A_n \to \mathbb{R}$: $$[f_n]: [A_n] \to \hyp, [r_n] \mapsto [f_n(r_n)]$$ + +\ + +\begin{defn} +We define the set of \emph{hypernaturals} to be the set $\hyp[\mathbb{N}]$ which is the image of the elementary embedding of $\mathbb{N} \subseteq \mathbb{R}$ into $\hyp$: that is, $$[r_n] \in \hyp[\mathbb{N}] \Leftrightarrow \{ n \in \mathbb{N}: r_n \in \mathbb{N} \} \in \mathcal{F}$$ +By construction, it is an internal subset of $\hyp$. +\end{defn} + +\ + +\begin{remark}There is an extended form of the transfer principle (where second-order statements may be transferred), which is even more difficult to justify than our original statement\footnote{For which, recall, we appealed to the power of \L os's Theorem.} of Definition \ref{defn:transfer}. +It allows us to translate between certain second-order statements: $$(\forall A \subseteq B) \leftrightarrow (\forall \text{$A$ internal} \subseteq \hyp[B])$$ +This procedure can be made to result in a formulation which is agnostic with respect to how we constructed the hyperreals, but in this essay we have only sketched a motivation, and have worked within the ultrapower construction. +See Chapter 13 of Goldblatt \cite{goldblatt} for the details. + +Later on, when we treat measure theory, we will make heavy use of this ``restricted second-order'' transfer principle. + +The definition of ``internal'' here was very much dependent on the implementation of $\hyp$ as an ultrapower. +However, it may be viewed more generally; this is the start of ``Internal Set Theory''. We will elide such details. +\end{remark} + +\ + +\begin{defn}[Transfer Principle, extended] \label{defn:secondordertransfer} +Let $\phi$ be a first-order sentence in the language of the totally ordered field $\mathbb{R}$, which is additionally allowed to mention (finitely many) specific sets $A_n \subseteq \mathbb{R}$, and is allowed to contain quantifiers $(\forall A \subseteq \mathbb{R})$ and $(\exists A \subseteq \mathbb{R})$. +Then $\phi$ is true in $\mathbb{R}$ if and only if $\hyp[\phi]$ is true in $\hyp$, where $\hyp[\phi]$ is obtained by \begin{enumerate} +\item replacing all quantifiers $(\forall x \in \mathbb{R})$ with $(\forall x \in \hyp)$ and $(\exists x \in \mathbb{R})$ with $(\exists x \in \hyp)$; +\item replacing each $A_n$ with $\hyp[A_n]$; +\item replacing all quantifiers $(\forall A \subseteq \mathbb{R})$ with $(\forall \text{$A$ internal} \subseteq \hyp)$ and likewise $(\exists A \subseteq \mathbb{R})$ with $(\exists \text{$A$ internal} \subseteq \hyp)$. +\end{enumerate} +\end{defn} + +\ + +By transfer, hypernaturals have several obvious properties: + +\begin{itemize} +\item They are closed under addition. +\item There are no hypernaturals between $m$ and $m+1$, for $m$ hypernatural. +\item For every $x \in \hyp$, have $x \in \hyp[[] \hyp[\lfloor] x \rfloor, \hyp[\lfloor] x \rfloor+1]$. +\end{itemize} + +Every hypernatural is either standard natural, or bigger than all standard naturals; this can be seen from the (transferred) fact that subtracting $1$ from a nonzero hypernatural yields another hypernatural, and the (transferred) fact that all hypernaturals are nonnegative. + +\begin{example}It is a second-order expressible fact in the reals that every nonempty finite set has a least element: +$$(\forall A \subseteq \mathbb{R})((\exists n \in \mathbb{N})(\text{$A$ has $n$ elements and $n>0$}) \Rightarrow (\exists x \in A)(\forall y \in A)(x \leq y))$$ +Here, we use a shorthand: $$(\text{$A$ has $n$ elements}) \leftrightarrow (\text{$\exists$ a bijective function $\{x \in \mathbb{N}: x < n \} \to A$})$$ + +The transfer principle from Definition \ref{defn:transfer} does not apply. +But our amended second-order version does apply, to give us +$$(\forall \text{$A$ internal} \subseteq \hyp)((\exists n \in \hyp[\mathbb{N}])(\text{$A$ has $n$ elements and $n>0$}) \Rightarrow (\exists x \in A)(\forall y \in A)(x \leq y))$$ +where $$(\text{$A$ has $n$ elements}) \leftrightarrow (\text{$\exists$ a bijective internal function $\{x \in \hyp[\mathbb{N}]: x < n\} \to A$})$$ + +That is, internal hyperfinite sets have least elements. + +In fact, if we work in the ultrapower construction, the minimal element of the internal set $[A_n]$ is $[a_n]$, where for each $n \in \mathbb{N}$, $a_n$ is the least element of $A_n$. +(We will not prove this.) + +Additionally, if $m$ is a hypernatural and $x_1, x_2, \dots, x_m$ is an internal set of hyperreals, then there is a maximum $x_i$. +\end{example} + +\subsection{Hypersequences} + +We may define \emph{hypersequences}, by means of transferring a function $f: \mathbb{N} \to \mathbb{R}$ to a function $\hyp[f]: \hyp[\mathbb{N}] \to \hyp$. +(We will assume that we can always extend $f$, whether or not the transfer principle applies to our particular choice of $f$. +This requirement is justifiable; it is studied in a little more detail in Section \ref{sec:comprehensiveness}.) +The notion of a hypersequence leads to a very neat characterisation of the convergence of a sequence. + +\ + +\begin{defn} \label{defn:convergence} If $f: \mathbb{N} \to \mathbb{R}$, have $(f(n))_{n \in \mathbb{N}} \to x \in \mathbb{R}$ if and only if $\hyp[f](k) \near x$ for all $k$ infinite. (Recall that $\hyp[f]$ maps from $\hyp[\mathbb{N}]$ to $\hyp$.) +\end{defn} + +\ + +Of course, from \emph{within} the model, the statement ``for all infinite $k$'' is not meaningful, because the set of infinite naturals is not internal. + +\ + +\begin{thm} \label{thm:infinite_not_internal} +The set of infinite hypernaturals is not internal. +\end{thm} +\begin{proof} +Every non-empty internal set of hypernaturals has a smallest element. +This is Exercise 1 from Section 12.4 of Goldblatt \cite{goldblatt}, and it follows by transferring the true statement of the well-ordering of the naturals: +$$(\forall A \subseteq \mathbb{N})(\exists n \in A)(\forall m \in A)(n \leq m)$$ +to +$$(\forall \text{$A$ internal} \subseteq \hyp[\mathbb{N}])(\exists n \in A)(\forall m \in A)(n \leq m)$$ + +But there is no smallest infinite hypernatural, because subtracting $1$ from any infinite hypernatural yields another infinite hypernatural. +\end{proof} + +\ + +\begin{thm} \label{thm:convergenceequivalent} The non-standard and the standard definitions of convergence coincide: if (and only if) $(f(n)) \to x$ in the non-standard sense of Definition \ref{defn:convergence}, then for every $\varepsilon \in \mathbb{R}^{>0}$ there is $N \in \mathbb{N}$ such that for all $n > N$, $|f(n) - x| < \varepsilon$. +\end{thm} +\begin{proof} +If the sequence $(f(n))$ converges to $x$ in the non-standard sense, then for every $\varepsilon \in \mathbb{R}^{>0}$ we have all infinite $N \in \hyp[\mathbb{N}]$ satisfying +$$\text{for all $n > N$, $|\hyp[f](n) - x| < \varepsilon$}$$ + +Therefore it must be the case that some $N \in \mathbb{N}$ has this same property transferred to the reals: +$$\text{for all $n > N$, $|f(n) - x| < \varepsilon$}$$ +since if not, this would be an internal property that distinguished between finite $N$ and infinite $N$; this contradicts Theorem \ref{thm:infinite_not_internal}. + +Conversely, suppose the sequence $(f(n))$ converges to $x$ in the standard sense. +Fix some $\varepsilon \in \mathbb{R}^{>0}$. +Then there is some particular $N_{\varepsilon}$ - say, for the sake of argument, $N_{\varepsilon} = 10$ - such that all larger $n \in \mathbb{N}$ have $|f(n) - x| < \varepsilon$. + +This property transfers to the hyperreals: for all $n \in \hyp[\mathbb{N}]$ with $n > N_{\varepsilon}$ (which, for the sake of argument, is $10$), we have $$|\hyp[f](n)-x| < \varepsilon$$ +In particular, all infinite hypernaturals $n$ satisfy $|\hyp[f](n) - x| < \varepsilon$. + +Finally, allowing $\varepsilon$ to vary over all positive reals, we discover that for all infinite $n \in \hyp[\mathbb{N}]$, it is the case that $\hyp[f](n) \near x$. +\end{proof} + +\subsection{Lattices} + +Hypernaturals yield a succinct proof of the Intermediate Value Theorem, by means of considering a ``hyperfinite lattice'' of points on the real line. +We examine the behaviour on that lattice of a $\mathbb{R}$-continuous function when it is extended to $\hyp$, and gain access to the nice properties of hyperfinite sets. +The following proof is from P\'etry \cite{petry}, where it appears as Theorem 25. + +\ + +\begin{thm} +Let $f: [a, b] \to \mathbb{R}$ be continuous with $f(a) < 0 < f(b)$. Then there is $c$ such that $f(c) = 0$. +\end{thm} +\begin{proof} +Extend $f$ to $\hyp[f]$, and consider the ``hyperpartition'' of $[a,b]$ $$S = \left \{ \frac{b-a}{m} k + a : 0 \leq k \leq m \right\}$$ for $m$ a fixed infinite hypernatural. + +Construct $$T = \left \{ \hyp[f](x): x \in S, \hyp[f](x) > 0 \right \}$$ + +Then $\bar{k} := \min \{ x \in S: \hyp[f](x) \in T \}$ has $$\st \hyp[f]\left(\bar{k} \right) = 0$$ +since certainly it is $\geq 0$ by definition as (the standard part of) a member of a set $T$ of positive numbers, while if it were strictly greater than $0$ then we could subtract $\frac{b-a}{m}$ from $\bar{k}$ to obtain another member $\kappa$ of $S$ such that $\hyp[f](\kappa) \in T$, contradicting minimality of $\bar{k}$. +$\bar{k}$ does exist, because it is the minimum of a hyperfinite set. +\end{proof} + +(This proof can be converted into another non-standard one which does not use hypernaturals, instead using the least upper bound property of the standard reals.) + +We are now ready to revisit Theorem \ref{thm:continuouscompact} in a more concrete form. + +\ + +\begin{thm} \label{thm:ACFOACBIIBAAIB} Let $f: [a, b] \to \mathbb{R}$ be continuous. Then $f$ is bounded and attains its bounds. +\end{thm} +\begin{proof} +Extend $f$ to $\hyp[f]$. + +Then for $m$ a hyperfinite integer, $$S = \left \{ \frac{b-a}{m} k + a : 0 \leq k \leq m \right\}$$ is a hyperfinite set and so $\hyp[f]$ attains a maximum and a minimum on that set; say at $m_+, m_-$ respectively. +Since $[a,b]$ is bounded, we must have $\hyp[[]a, b]$ bounded, and so $m_+$ and $m_-$ are finite. + +Now, if $\hyp[f](m_+)$ is infinite, then $\st(\hyp[f](m_+)) = f(\st(m_+))$ would be infinite too (by continuity), which is a contradiction unless $\st(m_+)$ lies outside $[a,b]$. +But this can never happen because $[a,b]$ is closed: since $m_+ \geq a$, we must have $\st(m_+) \geq \st(a) = a$, and likewise since $m_+ \leq b$, we must have $\st(m_+) \leq \st(b) = b$. + +Therefore $\hyp[f](m_+)$ is finite; so $\hyp[f]$ is bounded above on $S$ and attains its bound, at $m_+$. +Hence by continuity $f$ is bounded above and attains its bound at $\st(m_+)$, which we have already shown is in the domain $[a,b]$. + +Exactly the same argument shows that $f$ is bounded below and attains its bound. +\end{proof} + +\begin{remark} +The proof of Theorem \ref{thm:ACFOACBIIBAAIB} is so short and concise that it becomes extremely clear what its real content is: namely, that closed bounded intervals are compact (in the sense of Definition \ref{defn:compact}). +Most of the proof is simply showing that $m_+$ and $m_-$ must have standard parts in the domain of $f$, which is precisely what it means for $[a,b]$ to be compact. +\end{remark} + +We move on to the related Rolle's Theorem. + +\ + +\begin{thm}[Rolle's Theorem] \label{thm:rolle} Let $f: [a,b] \to \mathbb{R}$ be differentiable on $(a,b)$ and continuous on $[a,b]$. +Suppose that $f(a) = f(b)$. +Then there is $c \in (a,b)$ such that $f'(c) = 0$. +\end{thm} +\begin{proof} +The following proof is from Section 8.5 of \cite{goldblatt}. +Since $f$ is continuous on a closed bounded interval, it is bounded and attains its bounds (Theorem \ref{thm:ACFOACBIIBAAIB}). +If $f(x) = f(a) = f(b)$ for all $x$, then we are done: $f$ is constant and so has zero derivative everywhere. +Otherwise, without loss of generality, $f$ attains a global maximum at $x \not \in \{a, b\}$, say. +(If in fact $f(a)$ is a global maximum, then consider $-f$ instead.) + +Now, it is a first-order fact that $f(y) \leq f(x)$ for every $y \in [a,b]$; so by transfer, $x$ maximises $\hyp[f]$. + +Therefore $$\frac{f(x+\varepsilon) - f(x)}{\varepsilon} \leq 0 \leq \frac{f(x+\delta) - f(x)}{\delta}$$ +for any $\varepsilon > 0, \delta < 0$ both infinitesimal. + +Since the left-hand side and right-hand side are both infinitesimally close to each other, on taking standard parts we obtain that $f'(x)$ is both nonnegative and nonpositive, so it must be $0$. +\end{proof} + +The Mean Value Theorem and then Taylor's theorem can be proved in an $\varepsilon$-$\delta$ free way from Rolle's theorem (for example, as in Theorem 4 of Chapter 11 in Spivak \cite{spivak}). + +\subsection{Hyper-sums} \label{sec:hypersums} + +Continuing the theme of the hypernaturals, we investigate \emph{hyper-sums}. + +Let $S(n) = \sum_{i=0}^n f(i)$, for $f: \mathbb{N} \to \mathbb{R}$. +This function $S$ extends to a function $\hyp[S]: \hyp[\mathbb{N}] \to \hyp$, since we may express $S$ in a transferrable way as $$S(0) = f(0); \ (\forall n \in \mathbb{N}^{>0})(S(n) = S(n-1) + f(n))$$ +and then (using our second-order transfer principle) show that there is only one $\hyp[S]$ which satisfies the transferred condition. + +The transfer principle yields properties such as \begin{equation}\label{sums} \hyp[|] \sum_{k=0}^m \lambda_k u_k| \leq (\max \hyp[|] u_k|) \sum_{k=0}^m \hyp[|] \lambda_k|\end{equation} +where for clarity we have implicitly suppressed the asterisk on the $\hyp[\sum]$ symbol. + +\section{Integration} + +The definition of the Riemann integral is made especially comprehensible by the notion of the hyper-sum. +Taking the standard method of approximating an integral by the area of rectangles, and then taking the number of rectangles to be larger and larger and eventually infinite (i.e. hyperfinite), is a highly intuitive idea. + +The ``obvious'' choice of the definition of $\int_a^b f$ would be $$\st \left( \sum_{k=0}^{m-1} \hyp[f](x_k) (x_{k+1}-x_k)\right)$$ +where $m$ is some hyperfinite integer, and $$x_k = \frac{b-a}{m} k + a$$ + +However, a little thought suggests that some rather pathological functions would thereby be considered to have very wrong integrals, because we can only ever sample $\hyp[f]$ at some fixed points; letting our function misbehave away from those points would yield counterintuitive results. + +The correct refinement is as follows. + +\ + +\begin{defn} +The \emph{integral} of $f: [a,b] \to \mathbb{R}$ is the following expression, if it is well-defined: +$$\int_a^b f = \st \left( \sum_{k=0}^{m-1} \hyp[f](\hyp[\phi](x_k, x_{k+1})) (x_{k+1}-x_k) \right)$$ +where $\phi$ is any function $[a, b]^2 \to \mathbb{R}$ such that $r \leq \phi(r, s) \leq s$, and $(x_{k+1} - x_k) = \frac{1}{m}$, and $m$ is an infinite hypernatural. +\end{defn} + +\ + +This captures the idea that our $m$ sampling points are allowed to vary their position slightly in their respective intervals. + +We say a function $f: [a, b] \to \mathbb{R}$ is \emph{integrable} if $\int_a^b f$ is well-defined as hyperfinite $m$ varies and for all choices of $\phi$. + +Showing that a function is integrable is approximately as difficult using the non-standard definition as it is using the standard definition (as a limit of a Riemann sum taken over smaller and smaller dissections). +However, the non-standard definition is perhaps conceptually a little simpler, because it lacks a limiting process. + +To illustrate the process of showing a function is integrable, we prove that continuous functions are integrable. +The proof here, rendered down from P\'{e}try's (\cite{petry}, Section 12.5), is very similar to a standard proof. + +\ + +\begin{thm} +Let $f: [a,b] \to \mathbb{R}$ be continuous. Then $f$ is integrable. +\end{thm} +\begin{proof} +We need to show that given two different discretisations $(x_k)_{k=0}^m$ and $(y_k)_{k=0}^n$, and two ``nearness'' functions $\phi$ and $\psi$, the following is true: +$$\sum_{k=0}^{m-1} \hyp[f](\hyp[\phi](x_k, x_{k+1}))(x_{k+1}-x_k) \near \sum_{k=0}^{n-1} \hyp[f](\hyp[\psi](y_k, y_{k+1})) (y_{k+1}-y_k)$$ + +Consider the more general discretisation given by taking the union of the $x_i$ and $y_i$: label this list $(w_i)_{i=0}^l$. +(Note that this is no longer ``uniform'': the $w_i$ do not necessarily have equal intervals between them, although the $x_i$ and $y_i$ did.) +We will suppress the asterisk on $\hyp[[]\alpha, \beta]$ henceforth. + +Each $[w_i, w_{i+1}]$ lies fully within some $[x_j, x_{j+1}]$, by construction of the $w_i$, so $$[x_k, x_{k+1}] = [w_{i_k}, w_{i_k+1}] \cup \dots \cup [w_{r_k-1}, w_{r_k}]$$ +for some $i_k, r_k$. + +Therefore, suppressing the asterisk on $\hyp[\phi](x_m, x_{m+1})$, we may un-telescope the sum: +$$\hyp[f](\phi(x_k, x_{k+1})) (x_{k+1} - x_k) = \sum_{j=i_k}^{r_k-1} \hyp[f](\phi(x_k, x_{k+1}))(w_{j+1}-w_j)$$ +so +$$\sum_{k=0}^{m-1} \hyp[f](\phi(x_k, x_{k+1})) (x_{k+1} - x_k) = \sum_{k=0}^{m-1} \sum_{j=i_k}^{r_k-1} \hyp[f](\phi(x_k, x_{k+1})) (w_{j+1}-w_j)$$ + +Now, since $x_k \near x_{k+1}$ and $w_j, w_{j+1}$ are both in $[x_k, x_{k+1}]$, we have $$\phi(x_k, x_{k+1}) \near w_j \near w_{j+1}$$ +Hence in fact the right-hand side is an expression for a Riemann sum with $(w_i)$ as a dissection, although recall that the dissection does not have equal intervals between successive points. + +Relabelling the sum on the right-hand side, for some $(c_k)_{k=0}^l$ (which, if we were so inclined, we could express in terms of $\phi$ and the $x_i$), we have $$\sum_{k=0}^{m-1} \hyp[f](\phi(x_k, x_{k+1}))(x_{k+1}-x_k) = \sum_{k=0}^{l-1} \hyp[f](c_k) (w_{k+1}-w_k)$$ + +Symmetrically, $$\sum_{k=0}^{n-1} \hyp[f](\psi(y_k, y_{k+1})) (y_{k+1}-y_k) = \sum_{k=0}^{l-1} \hyp[f](d_k) (w_{k+1}-w_k)$$ for some $(d_k)_{k=0}^m$ with each $c_k \near d_k$. +Notice that the upper limit of this sum is indeed the same $l-1$ as before, because the dissection is taken over the same sequence $(w_i)$. + +Taking the modulus of the difference of the two expressions, obtain $$\left \vert \sum_{k=0}^{l-1} (\hyp[f](c_k) - \hyp[f](d_k)) (w_{k+1} - w_k) \right \vert$$ + +But $f$ is continuous on a compact set, so is uniformly continuous (see Section \ref{sec:uniform}); so $\hyp[f](c_k) - \hyp[f](d_k)$ is infinitesimal for all $k$, because $c_k \near d_k$. + +By equation \ref{sums} in Section \ref{sec:hypersums}, for every $\varepsilon \in \mathbb{R}^{> 0}$ we have +$$\left \vert \sum_{k=0}^{l-1} (\hyp[f](c_k) - \hyp[f](d_k)) (w_{k+1} - w_k) \right \vert \leq \varepsilon \sum_{k=0}^{l-1} |w_{k+1} - w_k|$$ + +Therefore the two expressions for the integral are indeed infinitesimally close, since $$\sum_{k=0}^{l-1} |w_{k+1}-w_k| = \sum_{k=0}^{l-1} (w_{k+1}-w_k) = b-a$$ +\end{proof} + +Before we introduce the link between integration and differentiation (the Fundamental Theorem of Calculus), we first require a lemma, which is P\'etry's Theorem 32. + +\ + +\begin{thm}[Integral mean value theorem] \label{thm:integralmvt} +Let $f: [a, b] \to \mathbb{R}$ be continuous. +Then there is a real $u \in [a,b]$ such that $$\int_a^b f(x) dx = (b-a) f(u)$$ +\end{thm} +\begin{proof} +Since $f$ is continuous on a closed bounded interval, it attains its bounds; say $f(c) \leq f(x) \leq f(d)$ for all $x$. + +Now, $(b-a) f(c) \leq S(\delta) \leq (b-a) f(d)$ +for any Riemann sum $S(\delta)$ with box-width $\delta$, so $$f(c) \leq \frac{1}{b-a} \int_a^b f(x) dx \leq f(d)$$ + +Then we are done by the Intermediate Value Theorem: there is $u \in [c, d]$ such that $$f(u) = \frac{1}{b-a} \int_a^b f(x) dx$$ +\end{proof} + +\ + +\begin{thm}[Fundamental Theorem of Calculus, first part] \label{thm:FTC1} +Let $f: [a,b] \to \mathbb{R}$ be continuous, and $x_0 \in [a,b]$. +Then the function $$g: x \mapsto \int_{x_0}^x f(t) dt$$ is differentiable on $(a,b)$ with derivative $g'(x) = f(x)$. +That is, an antiderivative of $f$ is given by the integral. +\end{thm} +\begin{proof} +Let $\delta$ be infinitesimal. +Then $$\frac{\hyp[g](x+\delta) - \hyp[g](x)}{\delta} = \frac{1}{\delta} \int_x^{x+\delta} f(t) dt$$ + +By transferring the integral mean value theorem (Theorem \ref{thm:integralmvt}), there is $w \in \hyp[[]x, x+\delta]$ such that $$\frac{1}{\delta} \int_x^{x+\delta} f(t) dt = \hyp[f](w)$$ + +On taking standard parts and using continuity, we obtain $$\st \left( \frac{\hyp[g](x+\delta) - \hyp[g](x)}{\delta} \right) = \st(\hyp[f](w)) = f(x)$$ +\end{proof} + +The first part of the FTC told us how to find an antiderivative by integrating. +There is a second part to the FTC, which will tell us how to integrate $f$ in terms of a known antiderivative. +To prove it, we shall require a standard theorem on antiderivatives. + +\ + +\begin{thm} \label{thm:antiderivativesunique} Antiderivatives are unique up to the addition of a constant. +That is, if $H_1, H_2: [a, b] \to \mathbb{R}$ satisfy $H_i' = f$, then $H_1 = H_2 + k$ for some constant $k$. +\end{thm} +\begin{proof} +Consider $H_1 - H_2: [a,b] \to \mathbb{R}$. +This function has derivative $0$ at all points. +By the Mean Value Theorem, this means $H_1 - H_2$ is constant. +(This follows from the remark after Theorem \ref{thm:increasingimpliesderivativepositive}, by which we deduce that $H_1 - H_2$ is both nondecreasing and nonincreasing.) +\end{proof} + +\ + +\begin{thm}[FTC, second part] \label{thm:FTC2} +Let $f: [a,b] \to \mathbb{R}$ be continuous, and suppose $F: [a,b] \to \mathbb{R}$ is an antiderivative for $f$ (so $F' = f$). +Then +$$\int_a^b f = F(b) - F(a)$$ +\end{thm} +\begin{proof} +Break the integral up at the point $x_0$, and define $$G(x) = \int_{x_0}^x f$$ + +By the first part of the FTC (Theorem \ref{thm:FTC1}), $G$ is an antiderivative for $f$; so since $F$ and $G$ are both antiderivatives, they differ by a constant (Theorem \ref{thm:antiderivativesunique}): $$F(x)+c = G(x)$$ +Therefore $$\int_a^b f = \int_a^{x_0} f + \int_{x_0}^b f = -G(a) + G(b) = F(b)-F(a)+c-c$$ +where the transformation $$\int_a^b f = \int_a^{x_0} f + \int_{x_0}^b f$$ follows from splitting up the Riemann sum. +(Because the integral is well-defined, we are free to choose a convenient Riemann sum where $x_0$ is a point of the discretisation, so that the sum splits up perfectly into two chunks.) +\end{proof} + +\ + +\begin{remark} +One consequence of this theorem is that if $f'$ is continuous between $a$ and $b$, then its integral is $f(b) - f(a)$. +\end{remark} + +\ + +\begin{defn}[Improper integral] +We write $\int_a^{\infty} f(x) dx$ for the expression $$\st \left( \int_a^M f(x) dx \right)$$ +where $M$ is an arbitrary infinite positive hyperreal, if that expression is finite and well-defined as $M$ varies. +\end{defn} + +\ + +\begin{example} +When does $\int_a^{\infty} x^{\theta} dx$ exist? +Assuming $\theta \not = -1$, the integral is $$\st \left( \int_a^M x^{\theta} dx \right) = \st \left( \left[\frac{x^{\theta + 1}}{\theta + 1} \right]_a^{M} \right) = \st \left( \frac{M^{\theta+1}}{\theta+1} - \frac{a^{\theta+1}}{\theta+1} \right)$$ + +This is well-defined if and only if $\theta+1 < 0$, in which case its value is $\frac{-a^{\theta+1}}{\theta+1}$. + +If instead $\theta = -1$, the integral is $\st(\log(M) - \log(a))$, which is infinite, so the integral is not defined in this case either. +\end{example} + +\section{Series} + +Intuitively, since series are simply infinite sums, it should be the case that hyperfinite sums express series neatly. +This turns out to be the case. +We are still assuming throughout that we may extend standard sequences to non-standard sequences; see further discussion in Section \ref{sec:comprehensiveness}. + +\ + +\begin{defn} Let $c: \mathbb{N} \to \mathbb{R}$ by $k \mapsto c_k$. +We say $$\sum_{i=0}^{\infty} c_k$$ \emph{converges} iff $$\st \left( \sum_{k=0}^m \hyp[c]_k \right)$$ is finite and is well-defined as $m$ varies over all infinite hypernaturals in $\hyp[\mathbb{N}]$. +\end{defn} + +\ + +Two of the standard examples of infinite sums are $$\sum_{k=1}^{\infty} \frac{1}{2^k}$$ and $$\sum_{k=1}^{\infty} 1$$ +The former, of course, converges. +In this instance, it is a fact that $$\sum_{k=1}^{m} \frac{1}{2^k} = 1-2^{-m}$$ +and this fact is true in the limited second-order sense of Section \ref{sec:internal}, +so it remains true when we transfer to infinite $m$; in particular, then $2^{-m}$ is infinitesimal, so the standard part of our resulting sum is simply $1$. + +For the latter sum (that is, the sum of infinitely many of the constant $1$), we have $$\sum_{k=1}^{m} 1 = m$$ which has infinite standard part when $m$ is infinite; so in this instance, as expected, the sum fails to converge. + +To understand convergence, the following result is very useful; it basically states that a sequence is Cauchy if and only if it converges, in the specific case that the sequence is a sequence of partial sums of a series. + +\ + +\begin{thm}[Cauchy's criterion] \label{thm:cauchycriterion} $\sum_{k=1}^{\infty} a_k$ converges if and only if $$\sum_{k=m}^n \hyp[a]_k$$ is infinitesimal for all $m \leq n$ infinitely large. +\end{thm} +\begin{proof} +This proof is from P\'etry \cite{petry}, section 19.2, where it appears as Theorem 53. + +Suppose $\sum_{k=1}^{\infty} a_k$ converges to $a$. +Then $$\sum_{k=m}^n \hyp[a]_k = \sum_{k=1}^n \hyp[a]_k - \sum_{k=1}^{m-1} \hyp[a]_k \near a-a = 0$$ + +Conversely, suppose $$\sum_{k=m}^n \hyp[a]_k$$ is infinitesimal for all $m \leq n$ infinite. +We need $\sum_{k=1}^m \hyp[a]_k$ to have the same, finite, standard part as $\sum_{k=1}^n \hyp[a]_k$ for all $m < n$ infinite. + +Clearly they have the same standard part if both sums are finite, because when we subtract them, we obtain $$\sum_{k=m+1}^{n} \hyp[a]_k$$ which is infinitesimal by assumption. +So it remains to show that they are indeed finite. + +Let $\hyp[n]$ be some fixed infinite hypernatural (which we decorate with the asterisk as a cue for the fact that is infinite). +For all infinite $m < \hyp[n]$, we have that $$\left| \sum_{k=m}^{\hyp[n]} \hyp[a]_k \right| < 1$$ +(In fact, we have much more: we have that the left-hand side is infinitesimal.) + +Since the property $$P(m) = \left[ (m < \hyp[n]) \rightarrow \left( \left| \sum_{k=m}^{\hyp[n]} \hyp[a]_k \right| < 1 \right) \right]$$ is internal, it cannot suffice by itself as a means of distinguishing between finite and infinite integers $m$. +(Indeed, no such means exists, by Theorem \ref{thm:infinite_not_internal}.) +So there must be a finite natural $p$ such that $P(p)$ holds: $$\left| \sum_{k=p}^{\hyp[n]} \hyp[a]_k \right| < 1$$ + +Therefore $\sum_{k=1}^{\hyp[n]} \hyp[a]_k$ is finite, being the sum of two finite quantities $$\sum_{k=1}^{p-1} \hyp[a]_k + \sum_{k=p}^{\hyp[n]} \hyp[a]_k$$ +\end{proof} + +\begin{example}[The harmonic series] +$$\sum_{k=m+1}^{2m} \frac{1}{k} \geq m \times \frac{1}{2m} = \frac{1}{2}$$ +which is not infinitesimal when $m$ is infinite, so Cauchy's criterion fails for the harmonic series. +(This is basically the usual proof by Cauchy condensation.) +\end{example} + +\ + +\begin{thm}Absolute convergence implies convergence. +\end{thm} +\begin{proof} +Because $$\left | \sum_{k=m}^n a_k \right | \leq \sum_{k=m}^n |a_k|$$ (for $n, m$ either infinite or standard integers), the result is immediately clear from Cauchy's criterion. +\end{proof} + +The comparison test likewise follows by transferring the fact that $$\sum_{k=m}^n a_k \leq \sum_{k=m}^n b_k$$ whenever $a_k \leq b_k$ for all $k \in [m, n]$. + +\pagebreak + +\begin{thm}[Ratio test] Let $a_k$ be a sequence of reals. +\begin{enumerate} +\item If there is some real $L$ such that for all infinitely large $m$, $$\left | \frac{a_{m+1}}{a_m} \right | \leq L < 1$$ then $\sum_{k=1}^{\infty} a_k$ converges absolutely. +\item If instead for all infinite $m$ we have $$\left | \frac{a_{m+1}}{a_m} \right | \geq 1$$ then the sum diverges. +\end{enumerate} +\end{thm} +\begin{proof} +We prove only the first of these, since the second is almost identical in proof. +Notice that we are omitting the asterisk of $\hyp[a]_{m+1}$, to make a more readable theorem statement. + +Using the idea from Theorem \ref{thm:cauchycriterion} that ``for all $m > m_0$, have $\left| \frac{a_{m+1}}{a_m} \right| \leq L$'' cannot be a way to distinguish those $m_0$ which are infinite from those $m_0$ which are finite, there must be some $m_0$ finite with $$\left| \frac{a_{m+1}}{a_m} \right| \leq L$$ for all $m > m_0$. +Then we simply proceed by comparison with the convergent geometric series with common ratio $L$. +\end{proof} + +Notice that this easily implies the standard statement of the ratio test, because if $$\frac{a_{m+1}}{a_m} \to L < 1$$ as $m \to \infty$, then for all infinitely large $m$, have $$\frac{a_{m+1}}{a_m} \near L < \frac{1+L}{2} < 1$$ + +\begin{remark} +The non-standard formulation of the ratio test is a little more clumsy in its proof than the standard version, because it essentially states that ``it is enough to pass to the standard version'' and then proves the standard version by comparison. +However, the non-standard version has the aesthetic benefit of being free of any explicit limits. +\end{remark} + +\ + +\begin{thm}[Alternating series test] +Let $(a_n)$ be a sequence of positive reals which are decreasing to $0$. +Then $$\sum_{k=1}^{\infty} (-1)^k a_k$$ converges. +\end{thm} +\begin{proof} +This proof is derived from Theorem 59 of P\'etry \cite{petry}. + +Let $m, n$ be infinite hypernaturals. +The sum $$S = a_m - a_{m+1} + a_{m+2} - \dots + (-1)^{n-m} a_n$$ is precisely one of the two following: $$a_m - (a_{m+1} - a_{m+2}) - \dots - (a_{n-2} - a_{n-1}) - a_n$$ or $$a_m - (a_{m+1} - a_{m+2}) - \dots - (a_{n-3} - a_{n-2}) - (a_{n-1} - a_n)$$ +depending on the parity of $n$. +Because the $a_i$ are decreasing, the result must be less than $a_m$ in either case, since each bracketed term is nonnegative. +If we omit the first term, we obtain a quantity $$-S' := S - a_m = -(a_{m+1} - a_{m+2} + \dots + (-1)^{n-m+1} a_n)$$ where $S'$ is less than $a_{m+1}$, so $-S'$ is greater than $-a_{m+1}$. +Hence in fact $S \geq a_m - a_{m+1} \geq 0$, so $S$ is nonnegative. + +Therefore $$S = \left | \sum_{k=m}^n (-1)^k a_k \right | \leq a_m$$ + +Since $(a_n)$ converges to $0$, for infinite $M$ we have $a_M \near 0$, so $\sum_{k=M}^N (-1)^k a_k$ is infinitesimal (being bounded in modulus by $a_M$). +We are therefore done by Cauchy's criterion (Theorem \ref{thm:cauchycriterion}). +\end{proof} + +We omit the integral test for convergence, because it is a routine application of similar ideas to the above; it may be found in P\'etry, section 19.3 \cite{petry}. + +\section{More general topological ideas} + +In this section, we will discuss how the more abstract ideas of topology can be viewed in the non-standard setting of $\hyp^n$, the product of $n$ copies of $\hyp$. +Although we will not prove that the transfer principle holds between $\mathbb{R}^n$ and $\hyp^n$, it is intuitive that $(\hyp)^n$ has some of the properties we would like from $\hyp[(\mathbb{R}^n)]$: given any infinitesimal $\varepsilon \in \hyp$, we can obtain infinitesimals $\hat{\varepsilon} \in (\hyp)^n$ in any direction by simply multiplying by any unit vector, while given any infinitesimal $\hat{\varepsilon} \in (\hyp)^n$ we can take its length to get an infinitesimal in $\hyp$. +(Length, of course, is defined by transferring the usual Euclidean distance.) +In fact, it is the case that the monad of a point $(x, y)$ in a general product space is equal to the product of the monads; see Section III.1 of Hurd and Loeb \cite{hurdloeb}. + +To be clear, then, two elements of $\hyp^n$ are infinitesimally close iff their norm is infinitesimal; equivalently, if and only if their coordinates are pointwise infinitesimally close. +We define standard parts of vectors pointwise. + +\ + +\begin{defn} \label{defn:open} +A subset $X \subseteq \mathbb{R}^n$ is \emph{open} if and only if, for every $x \in X$, the monad of $x$ lies entirely in $\hyp[X]$. +\end{defn} + +\ + +\begin{remark}[Equivalence to the standard definitions] +This non-standard definition is easily implied by the standard definition when stated as being a union of open balls. +It implies the standard definition when stated as ``every point has a neighbourhood within the set'': +if $x$ has monad entirely within $\hyp[X]$, then there is $\varepsilon > 0$ (for instance, any infinitesimal $\varepsilon$) such that all $y \in \hyp[X]$, with $|y-x| < \varepsilon$, lie within $\hyp[X]$. +This property transfers to $\mathbb{R}^n$ by the reverse direction of the transfer principle. +\end{remark} + +\ + +\begin{defn} \label{defn:closed} +A subset $X \subseteq \mathbb{R}^n$ is \emph{closed} if (and only if) its complement is open. +That is, $X$ is closed if and only if, whenever $y \in \mathbb{R}^n \setminus X$, the monad of $y$ is entirely outside $\hyp[X]$. +\end{defn} + +\ + +\begin{thm} \label{thm:closeddefn} Let $X \subseteq \mathbb{R}^n$. Then $X$ is closed if and only if every convergent sequence in $X$ has its limit point within $X$. \end{thm} +\begin{proof} +Let $X$ be closed, and let $(x_i)_{i=1}^{\infty}$ be a sequence in $X$, tending to $x$. + +If $x \not \in X$, then $x \in \mathbb{R}^n \setminus X$, so the monad of $x$ lies entirely outside $\hyp[X]$. +But $(x_i)$ converges to $x$, so all infinite hypernaturals $m$ have $x_m \near x$, and this is a contradiction because all $x_n \in X \subseteq \hyp[X]$ for $n$ finite. +Since no internal property can distinguish finite hypernaturals from infinite hypernaturals, it must be the case that some infinite hypernatural $M$ has $x_M \in \hyp[X]$. + +Conversely, let $X$ be not closed. +Then $\mathbb{R}^n \setminus X$ is not open, so there is $y \in \mathbb{R}^n \setminus X$ such that some $r \in \hyp[\mathbb{R}^n]$ has $\st(r) = y$ but $r \not \in \hyp[(\mathbb{R}^n \setminus X)]$. + +We claim that $y$ is a limit point of $X$. +Indeed, fix some specific $\varepsilon \in \mathbb{R}^{\geq 0}$. +Then the following statement is true: $$(\exists r \in \hyp[\mathbb{R}]^n )(r \in \hyp[X] \wedge |y-r| < \varepsilon)$$ +so by the second-order version of transfer (Definition \ref{defn:secondordertransfer}), $$(\exists r \in \mathbb{R}^n)(r \in X \wedge |y-r| < \varepsilon)$$ +Denote such an $r$ by $x_{\varepsilon}$. + +Finally releasing $\varepsilon$, $(x_{1/n})$ converges to $x$ in the $\varepsilon$-$\delta$ sense, and therefore in the non-standard sense (by Theorem \ref{thm:convergenceequivalent}). +\end{proof} + +\begin{remark} +The proof of Theorem \ref{thm:closeddefn} is somewhat clumsy: it requires explicit uses of real $\varepsilon \in \mathbb{R}$. From a certain point of view, the proof contains two uses of the transfer principle: once to generate the sequence $(x_{1/n})$ which converges in the $\varepsilon$-$\delta$ sense, and once in the statement that $\varepsilon$-$\delta$ convergence is equivalent to convergence in the sense of Definition \ref{defn:convergence}. +(This latter usage is hidden away in the proof of Theorem \ref{thm:convergenceequivalent}.) +\end{remark} + +\ + +\begin{thm}[Robinson's theorem] Let $X \subseteq \mathbb{R}^n$. Then the following are equivalent: +\begin{enumerate} +\item \label{item:non-standard} $X$ is compact in the non-standard sense that every point $x \in \hyp[X]$ has a standard $r_x \in X$ to which it is infinitesimally close. +\item \label{item:standard} $X$ is compact in the standard sense that every open cover has a finite subcover. +\end{enumerate} +\end{thm} +\begin{proof} +(\ref{item:standard}) $\Rightarrow$ (\ref{item:non-standard}): +This direction of the proof is from Theorem 4.1.13 of Robinson \cite{robinson}. + +Suppose $x \in \hyp[X]$ has the property that no $r \in X$ has $r \near x$. +The idea is to transfer the fact that ``there is a finite collection of points which together are near to every point'', for a contradiction. + +For each $r \in X$ we can find a ball $B_r$ around $r$, of positive standard-real radius $\varepsilon_r$, such that $\hyp[B_r]$ does not contain $x$ +(indeed, if not, then $r$ would be infinitesimally near to $x$). + +This collection of balls $B_r$ forms an open cover of $X$, so it has a finite subcover; +but each ball $B_r$ can be specified in a first-order way as $\{ x \in X: d(x, r) < \varepsilon_r \}$, so we have the internal statement that $$X = B_{r_1} \cup B_{r_2} \cup \dots \cup B_{r_n}$$ + +By transfer, this must be true of $\hyp[X]$ too: $$\hyp[X] = \hyp[B_{r_1}] \cup \dots \cup \hyp[B_{r_n}]$$ which is a contradiction because we built the $B_r$ such that no $\hyp[B_r]$ contained $x$. + +(\ref{item:non-standard}) $\Rightarrow$ (\ref{item:standard}): +This direction of the proof is taken from Goldblatt \cite{goldblatt}, where it is given in Section 10.3. +The proof is beautiful, but has the weakness that it does not extend to arbitrary topological spaces, because it relies on the presence of the rationals. +(The result is true in general.) + +Suppose $X \subseteq \mathbb{R}^n$ is not compact in the standard sense. +Since the product of compact spaces is compact, there must be some coordinate $i$ such that the $i$th projection $\pi_i(X)$ is not compact in the standard sense, so it is enough to work with $\pi_i(X) = X' \subseteq \mathbb{R}$. +If we can deduce that there is some $x' \in X'$ whose standard part is not in $\hyp[X']$, then we can lift it up to any point $x \in X$ with $i$th coordinate equal to $x'$; the standard part of $x$ then does not lie in $\hyp[X]$. + +To reiterate, then, we are working with $X' \subseteq \mathbb{R}$ which is not compact in the standard sense. +Let $(U_i)_{i=1}^{\infty}$ be an open cover of $X'$ without any finite subcover. +We tweak each $U_i$ into an open interval whose endpoints are rational. + +Every $r \in X'$ lies within some $U_{i_r}$, say. +Since $U_{i_r}$ is open, it contains an interval $C_r := (p_r, q_r)$ with rational endpoints, such that the interval contains $r$. +This creates an open cover $$\mathcal{C} := \langle C_r : r \in X' \rangle$$ +such that every $C_r$ is entirely contained within some $U_{i_r}$. + +But there are only countably many such intervals, so we may in fact enumerate the open cover $\mathcal{C}$: say as $$\mathcal{C} = \langle (p_n, q_n) : n \in \mathbb{N} \rangle$$ + +Certainly $\mathcal{C}$ covers $X'$, because every $r \in X'$ lies in its $C_r$, and $C_r$ is included in the enumeration. +It has no finite subcover, because each $C_r$ is contained entirely within $U_{i_r}$, so that would imply a finite subcover from the $U_i$. + +But now it is true that $$(\forall k \in \mathbb{N})(\exists x \in X')(\forall n \in \mathbb{N})[n \leq k \Rightarrow x \not \in (p_n, q_n)]$$ +which precisely states that for all $k$, $$X' \not \subseteq (p_1, q_1) \cup \dots \cup (p_k, q_k)$$ + +This is a statement which transfers to the hyperreals. +Fix some infinite hypernatural $K$ (it does not matter which), and let $x \in \hyp[(X')]$ be such that for all $n \in \hyp[\mathbb{N}]$ with $n \leq K$, we have $x \not \in \hyp[(] p_n, q_n)$. +Then for all finite $n$, we have $x \not \in \hyp[(] p_n, q_n)$. + +Finally, we claim that $x$ is our hyperreal in $\hyp[(X')]$ which has its standard part not contained in $X'$, thereby witnessing that $X'$ is not compact in the non-standard sense. +Indeed, any $r \in X'$ is contained within some $(p_n, q_n)$, so if $x \near r$ then $p_n < x < q_n$, which would contradict the previous paragraph. +\end{proof} + +\begin{thm}[Heine-Borel] The compact sets in $\mathbb{R}^n$ are precisely the closed bounded sets. +\end{thm} +\begin{proof} +Let $X \subseteq \mathbb{R}^n$ be closed and bounded. +Then take a point $x \in \hyp[X]$. +$x$ is finite, because the statement that $X$ is bounded is a restricted second-order statement in the sense of Definition \ref{defn:secondordertransfer}, so it remains true of $\hyp[X]$ by transfer. +Therefore $x$ has a standard part $\st(x) \near x$, which we claim lies in $X$. + +Indeed, if $\st(x)$ were not in $X$, then the monad of $\st(x)$ would lie outside $\hyp[X]$ (since $X$ is closed), and in particular $x$ would not be in $\hyp[X]$. + +Conversely, let $X \subseteq \mathbb{R}^n$ be compact. +Then $X$ is bounded: if $X$ were unbounded, then $\hyp[X]$ would contain an infinite $x \in \hyp[\mathbb{R}^n]$ (since if not, we could distinguish infinite hyperreals from finite ones by comparing them with members of $\hyp[X]$), and so no member of $X \subseteq \mathbb{R}^n$ could be infinitesimally close to $x$. + +$X$ is closed: let $y \in \mathbb{R}^n \setminus X$. +We wish to show that the monad of $y$ is entirely outside $\hyp[X]$. +If the monad of $y$ contained an element $x$ of $\hyp[X]$, then by compactness, there would be a member of $\mathbb{R}^n$ which lay in $X$, infinitesimally close to $x$. +The only possible such standard member of $\mathbb{R}^n$ is $y$, so in fact $y$ must lie in $X$ after all. +\end{proof} + +\section{Measure theory} +These final sections will address some meatier ideas, with the two goals of formulating Lebesgue measure and Brownian motion in the language of infinitesimals. +For Lebesgue measure, we primarily use Goldblatt \cite{goldblatt}, Chapter 16. +For the applications to Brownian motion, we use Hurd and Loeb \cite{hurdloeb}, section IV.6. + +Recall the following definitions of several objects fundamental to the study of (standard) measure theory: + +\ + +\begin{defn}[$\sigma$-algebra] A collection $\mathcal{A}$ of subsets of set $S$ is a \emph{$\sigma$-algebra} if it contains the empty set and is closed under countable unions, complements, and symmetric differences. + +If $\mathcal{A}$ is only closed under finite unions (and complements and symmetric differences), it is called a \emph{ring of sets}. +\end{defn} + +\pagebreak + +\begin{defn}[Measure] +Let $\mathcal{A}$ be a $\sigma$-algebra. +A function $\mu: \mathcal{A} \to \mathbb{R}^{\geq 0} \cup \{ \infty \}$ is a \emph{measure} if $\mu(\emptyset) = 0$ and it is countably additive: whenever $(A_n)_{n=1}^{\infty}$ is a sequence of pairwise-disjoint members of $\mathcal{A}$, we have $$\mu \left(\bigcup_n A_n \right) = \sum_n \mu(A_n)$$ + +If $\mathcal{A}$ is instead merely a ring of sets, $\mu$ is a \emph{measure} if instead it is countably additive over all sequences of pairwise-disjoint members of $\mathcal{A}$ \emph{whose union is in $\mathcal{A}$}. +\end{defn} + +\ + +It is a classical fact that any measure on a ring of sets $\mathcal{A}$ may be extended to a measure on a $\sigma$-algebra $\sigma(\mathcal{A})$, the intersection of all $\sigma$-algebras containing $\mathcal{A}$; this result is known as the Carath\'eodory extension theorem. +The theorem takes a measure $\mu$ on $\mathcal{A}$ and outputs a measure called the \emph{outer measure} $\mu^+$ on $\sigma(\mathcal{A})$. +The exact construction of the outer measure does not concern us, but it coincides with $\mu$ on members of $\mathcal{A}$. +The outer measure need not be even a finitely additive set function on the power set $\powerset(S)$, though it is countably additive on $\sigma(\mathcal{A}) \subseteq \powerset(S)$; those subsets of $S$ on which $\mu^+$ is guaranteed to behave additively are known as the \emph{measurable sets}, which we now define. + +\ + +\begin{defn}[Measurable set; see Halmos \cite{halmos}, \textsection 11] \label{defn:measurable} +Given a measure $\mu$ on ring of sets $\mathcal{A} \subseteq \powerset(S)$, we say that a set $B \subseteq S$ is \emph{$\mu^+$-measurable} if, for every $E \in \mathcal{A}$, we have $$\mu^+(E) = \mu^+(E \cap B) + \mu^+(E \setminus B)$$ +where $\mu^+$ is the outer measure on $\sigma(\mathcal{A})$. +That is, $B$ ``splits every $E \in \mathcal{A}$ in a way that is additive with respect to $\mu^+$''. + +It is a fact that every member of $\mathcal{A}$ is $\mu^+$-measurable. +\end{defn} + +\ + +Recall the definition of the Lebesgue measure on $\mathbb{R}$: + +\ + +\begin{defn}[Lebesgue measure] +The \emph{Lebesgue measure} on $\mathbb{R}$ is the measure $\lambda$, on the $\sigma$-algebra generated by the open sets of the Euclidean topology (that is, the $\sigma$-algebra whose members are the \emph{Borel sets}), such that $$\lambda([a,b]) = b-a$$ +for all reals $a \leq b$.\end{defn} +\begin{remark} +This does indeed specify a measure, by the Carath\'eodory extension theorem. +It is in fact the \emph{unique} measure such that $\lambda([a,b]) = b-a$; this is Theorem A of \textsection 13 in \cite{halmos}. +\end{remark} + +\ + +We will consider measures constructed by applying Carath\'eodory's extension theorem to measures $\mu_L$ of the following form, as in Goldblatt \cite{goldblatt}, Section 16.5. +Let $\mathcal{A}$ be an internal ring of subsets of $S \subseteq \hyp$. +Let $$\mu: \mathcal{A} \to \hyp^{\geq 0} \cup \{ \infty \}$$ be any finitely-additive function. +Define $\mu_L$ (for ``$\mu$-Loeb'') by $\mu_L(A) = \st(\mu(A))$ if $\mu(A)$ is finite, and $\infty$ if $\mu(A)$ is infinite (as a hyperreal) or is the literal value $\infty$. + +Then $\mu_L$ extends to a measure $\mu_L^+$ on $\sigma(\mathcal{A})$. We say a set $B \subseteq \hyp$ is \emph{Loeb measurable} if it is measurable with respect to $\mu_L^+$, in the sense of Definition \ref{defn:measurable}. + +The Loeb measure construction essentially lets us specify a hyperreal size for each member of an internal collection of subsets of $\hyp$, and pull that back into a true real-valued ordinary measure on that internal collection. + +\ + +\begin{example} +Take $S = \hyp^{\geq 0}$. +Let $\mathcal{A}$ be the set of singletons from $\hyp^{\geq 0}$. + +Then $\sigma(\mathcal{A})$ is the collection of countable sets and cocountable sets (that is, those whose complements are countable), since this collection is closed under taking countable unions, complements, and symmetric differences. + +Define $\mu: \{ \hyp[a] \} \mapsto 1$, the counting measure. +Then $\mu_L^+$ is the measure which takes a set $A \subseteq \hyp$, and returns the following: +\begin{itemize} +\item If $A$ has cardinality $n$, where $n$ is finite, then $\mu_L^+$ returns $n$; +\item Otherwise $\mu_L^+$ returns $\infty$ (that is, in the case that $A$ is infinite or hyperfinite). +\end{itemize} +So, for example, $\mu_L^+(\{ 0, 1, \dots, N \}) = \infty$, where $N$ is an infinite hypernatural. +Notice that $\mu_L^+$ need not be internal: the above $\mu_L^+$ is capable of distinguishing between finite and infinite hypernaturals. +\end{example} + +\subsection{Comprehensiveness} \label{sec:comprehensiveness} +We shall work in a \emph{sequentially comprehensive} system, as defined in Section 15.4 of Goldblatt \cite{goldblatt}. +That is, one in which any function $f: \mathbb{N} \to B$ from $\mathbb{N}$ to an internal set $B \subseteq \hyp$ extends to a function $\hyp[f]: \hyp[\mathbb{N}] \to B$. +Alternatively stated, any sequence $(s_n)_{n=1}^{\infty}$ of elements of internal set $B \subseteq \hyp$ will extend to an internal hypersequence $(\hyp[s_n])_{n \in \hyp[\mathbb{N}]}$. +(We implicitly used this property earlier when dealing with hypersequences; it comes free of charge through the transfer principle when $f$ admits a first-order description, for instance.) + +The ultrapower construction always creates a sequentially comprehensive system, for what Goldblatt calls ``intricate'' reasons which we will not cover here. + +The upshot will be as follows. +Given an internal sequence of sets $(A_n)_{n \in \mathbb{N}}$, and a list of properties $(P_n)$ such that $A_m$ satisfies property $P_n$ for all $m \geq n$, we can extend the sequence $(A_n)$ to a hypersequence. +For $N$ an infinite hypernatural, $A_N$ will then have property $P_i$ for all finite naturals $i$. +That is, we will have constructed an object which has all these properties simultaneously. + +This construction is akin in spirit to taking an intersection of nested sets with some property, to obtain an object which has all the properties. + +\subsection{Alternative characterisation of Loeb measurability} \label{sec:alternativeloeb} + +Recall the classical fact that a subset of $[0,1]$ is Lebesgue measurable if and only if it can be approximated arbitrarily well by finite unions of intervals, and its complement can also be approximated arbitrarily well by finite unions of intervals. +(See the proof of Theorem 4.3a in \cite{williamson}, for instance.) + +With this in mind, we prove the following characterisation, from Section 16.6 of Goldblatt \cite{goldblatt}. +The goal here is to show that Lebesgue measure is a (rather natural) example of a Loeb measure. + +\ + +\begin{defn}[$\mu$-approximability] \label{defn:muapprox} +Let $\mathcal{A}$ be an internal ring of subsets of $S \subseteq \hyp$, and $\mu: \mathcal{A} \to \hyp^{\geq 0} \cup \{ \infty \}$ a finitely-additive function. + +We say that $B \subseteq S$ is \emph{$\mu$-approximable} if, for every $\varepsilon \in \mathbb{R}^{>0}$, there are sets $C_{\varepsilon}, D_{\varepsilon} \in \mathcal{A}$ with $\mu_L(D_{\varepsilon} \setminus C_{\varepsilon}) < \varepsilon$ and $C_{\varepsilon} \subseteq B \subseteq D_{\varepsilon}$. +\end{defn} + +\ + +\begin{lemma} \label{lemma:approx} We can approximate any $\mu$-approximable set $B \subseteq \hyp$ by a member of $\mathcal{A}$, in the following sense: there is some $A \in \mathcal{A}$ such that the symmetric difference $A \symdiff B$ has $\mu_L^+$-measure zero. +\end{lemma} +\begin{proof} +Take a sequence of nested $\frac{1}{n}$-approximations $C_{n} \subseteq B \subseteq D_n$, and extend each $\langle C_n : n \in \mathbb{N} \rangle$, $\langle D_n: n \in \mathbb{N} \rangle$ to a hypersequence of $\frac{1}{N}$-approximations for $N$ ranging over the hypernaturals. +(This is justified by the remark on sequential comprehensiveness.) + +Let the hypersequences be $\langle C_n : n \in \hyp[\mathbb{N}] \rangle$ and $\langle D_n : n \in \hyp[\mathbb{N}] \rangle$ where $C_n \subseteq B \subseteq D_n$ for all $n \in \hyp[\mathbb{N}]$. +Then if we fix any (necessarily finite) $k \in \mathbb{N}$, it is an internal true statement that for every $n \in \hyp[\mathbb{N}]$, if $n \leq k$ then $C_n \subseteq D_k \subseteq D_n$ (by nestedness). + +Since all finite naturals $k$ satisfy that property, it must be the case that some infinite hypernatural $K$ does too (because otherwise this would be a property that distinguished between finite and infinite naturals, contradicting Theorem \ref{thm:infinite_not_internal}): there is $K$ such that for all $n \in \hyp[\mathbb{N}]$, $$n \leq K \Rightarrow C_n \subseteq D_K \subseteq D_n$$ + +Then $D_K$ is our desired member of $\mathcal{A}$ such that $\mu_L^+(D_K \symdiff B) = 0$. + +Indeed, $$D_K \symdiff B = (D_K \setminus B) \cup (B \setminus D_K) \subseteq D_n \setminus C_n$$ for all $n \in \mathbb{N}$. +The right-hand side has $\mu_L^+$-measure less than $\frac{1}{n}$; +so we obtain that $\mu_L^+(D_K \symdiff B)$ is a standard real which is less than all $\frac{1}{n}$. + +That is, $\mu_L^+(D_K \symdiff B) = 0$. +\end{proof} + +In fact, there is a very concrete definition of the Loeb measure $\mu_L^+$ of a set $B \subseteq S \subseteq \hyp$ which does not appear in the ring of sets $\mathcal{A} \subseteq \powerset(S)$. +This concreteness comes at the cost of working with arbitrary $\varepsilon > 0$, but it will turn out to be the key ingredient allowing us to move freely between Loeb measure and Lebesgue measure. +Morally speaking, Lebesgue measure is a statement about ``those sets we may approximate by nice sets'', and approximation is easiest with an $\varepsilon$-related treatment; this is the motivation for the following lemma about Loeb measure. + +\ + +\begin{lemma} \label{lemma:loebsup} Suppose $B \subset \hyp$ is Loeb measurable with finite Loeb measure. +Then $$\mu_L^+(B) = \inf \{ \mu_L(A) : A \in \mathcal{A}, B \subseteq A \} = \sup \{ \mu_L(A) : A \in \mathcal{A}, A \subseteq B \} $$ +\end{lemma} +\begin{proof} +We will assume the first equality; its proof requires manipulating the specific construction of the measure $\mu_L^+$ according to the Carath\'eodory extension theorem. +This is not particularly difficult, but it requires some ``grubby details''. +The proof may be found\footnote{Be warned that Goldblatt abuses the notation $\mu_L(B)$ to mean $\mu_L^+(B)$ when $B$ is Loeb measurable.} as Lemma 16.5.1 of Goldblatt \cite{goldblatt}. + +To prove the second equality, then, we will show that $$\mu_L^+(B) = \sup \{ \mu_L(A) : A \in \mathcal{A}, A \subseteq B \}$$ + +Let $\varepsilon \in \mathbb{R}^{>0}$. +We need to find $A_{\varepsilon} \in \mathcal{A}$ with $A_{\varepsilon} \subseteq B$ and $$\mu_L^+(B) < \mu_L(A_{\varepsilon}) + \varepsilon$$ + +Now, we stipulated that $B$ had finite Loeb measure, so (by the first part of the lemma) $$\mu_L^+(B) = \inf \{ \mu_L(A) : A \in \mathcal{A}, B \subseteq A \} < \infty$$ +That is, there is some $A \in \mathcal{A}$ with $B \subseteq A$ and $\mu_L(A) < \infty$. + +We can therefore use the first part of the lemma again, applied to $A \setminus B$. +(Drawing Venn diagrams will elucidate this section.) + +Since $\mu_L(A)$ is finite, we have $$\mu_L^+(A \setminus B) = \mu_L(A) - \mu_L^+(B)$$ +which is again finite, so by the first part of the lemma, we can approximate it: let $C \in \mathcal{A}$ be such that $A \setminus B \subseteq C$ and $$\mu_L(C) < \mu_L^+(A \setminus B) + \varepsilon$$ + +Now, $A \setminus C$ is a set-difference of members of $\mathcal{A}$, so it lies in $\mathcal{A}$; it is also a subset of $B$, since $A \setminus B \subseteq C$ so $A \setminus C \subseteq A \setminus (A \setminus B) \subseteq B$. + +Then $$C \supseteq (A \setminus B) \disjointunion (B \setminus [A \setminus C])$$ (where $\disjointunion$ is a disjoint union), and each of those terms is Loeb measurable, so $$\mu_L(C) \geq \mu_L^+(A \setminus B) + \mu_L^+(B \setminus [A \setminus C])$$ + +We chose $C$ such that $\mu_L(C) < \mu_L^+(A \setminus B) + \varepsilon$. +Therefore $$\mu_L^+(B \setminus [A \setminus C]) < \varepsilon$$ +so $$\mu_L^+(B) < \mu_L^+(A \setminus C) + \varepsilon$$ + +Finally, we have already noted that $A \setminus C$ lies in $\mathcal{A}$, so the proof is complete: set $A_{\varepsilon} = A \setminus C$. +\end{proof} + +\begin{thm}[Alternative definition of Loeb measurability] \label{thm:alternativeloeb} Let $B \subseteq \hyp$. +\ +\begin{enumerate} +\item \label{thm:muapprox} Suppose $B$ is $\mu$-approximable. Then $B$ is Loeb measurable. +\item \label{thm:meas} Suppose $B$ is Loeb measurable with $\mu_L(B) \not = \infty$. Then $B$ is $\mu$-approximable. +\end{enumerate} +\end{thm} + +\begin{proof} +(\ref{thm:muapprox}): We sketch this direction. +Suppose $B$ is $\mu$-approximable. +Recall that $\mu_L^+$ is the measure obtained by extending $\mu_L$ to a measure on $\sigma(\mathcal{A})$. + +Then we can find a set $A \in \mathcal{A}$ such that $\mu_L^+ (A \symdiff B) = 0$ (for $\symdiff$ the symmetric difference), as guaranteed by Lemma \ref{lemma:approx}. + +We now need to show that $B$ itself is Loeb measurable: that is, it is measurable in the sense of Definition \ref{defn:measurable} with respect to $\mu_L^+$. + +That is, for any $E \in \mathcal{A}$, we need $B$ to split $E$ additively with respect to $\mu_L^+$: we need $$\mu_L^+(E) = \mu_L^+(E \cap B) + \mu_L^+(E \setminus B)$$ + +Recall from Definition \ref{defn:measurable} that every member of $\mathcal{A}$ is automatically $\mu_L^+$-measurable; so the equality holds if we replace $B$ with $A$ throughout. +But we have defined $A$ to be ``almost equal'' to $B$ in the sense of Lemma \ref{lemma:approx}, and so by a simple argument (elucidated by a Venn diagram), the equality holds with $B$ as well. +(For full details, see Lemma 16.6.3 of \cite{goldblatt}; what remains of the argument is mere unenlightening algebra. +The essence is that ``$A$ is extremely close to $B$ from the point of view of $\mu_L^+$''.) + +(\ref{thm:meas}): +Let $\varepsilon \in \mathbb{R}^{>0}$. +We need to show that we can find approximating sets $C_{\varepsilon}, D_{\varepsilon} \in \mathcal{A}$ with $$\text{$\mu_L(D_{\varepsilon} \setminus C_{\varepsilon}) < \varepsilon$ and $C_{\varepsilon} \subseteq B \subseteq D_{\varepsilon}$}$$ + +But this is the content of Lemma \ref{lemma:loebsup}: $$\mu_L^+(B) = \inf \{ \mu_L(A) : A \in \mathcal{A}, B \subseteq A \} = \sup \{ \mu_L(A) : A \in \mathcal{A}, A \subseteq B \}$$ + +so we can find $C_{\varepsilon} \in \mathcal{A}$ with $C_{\varepsilon} \subseteq B$ such that $$\mu_L(C_{\varepsilon}) \geq \mu_L^+(B) - \frac{\varepsilon}{2}$$ and we can similarly find $D_{\varepsilon} \supseteq B$ such that $$\mu_L(D_{\varepsilon}) \leq \mu_L^+(B) + \frac{\varepsilon}{2}$$ +from which the result follows immediately. +\end{proof} + +\subsection{Lebesgue measure via Loeb measure} +It turns out that Lebesgue measure can be defined in a very natural way as a Loeb measure, by ``assigning a weight to an infinitesimal-width lattice on $\mathbb{R}$'', as in Section 16.8 of Goldblatt \cite{goldblatt}. + +\ + +\begin{defn}[Loeb measure defining the Lebesgue measure] \label{defn:loeblebesgue} +Fix an infinite hypernatural $N$, and define a lattice $$S = \left \{ \frac{k}{N} : k \in \hyp[\mathbb{Z}], -N^2 \leq k \leq N^2 \right\}$$ + +Define $\powerset_I(S)$ to be the collection of internal subsets of $S$, so each of its members is hyperfinite (indeed, of hyperfinite size less than or equal to $2 N^2 + 1$). +It is an internal algebra, and $\mu: \powerset_I(S) \to \hyp^{\geq 0}$ given by $$\mu(A) = \frac{|A|}{N}$$ defines a finitely additive function suitable for creating a Loeb measure. + +Specifically, $$\mu_L(A) = \st \left( \frac{|A|}{N} \right)$$ if $\frac{|A|}{N}$ is finite; and $\mu_L(A)$ takes the literal value $\infty$ otherwise. +$\mu_L^+$ is the measure on $\sigma(\powerset_I(S))$ whose existence is guaranteed by the Carath\'eodory extension theorem. + +\end{defn} + +\ + +\begin{remark} +Notice that $\mu_L^+$ is a measure on $\sigma(\powerset_I(S))$, not on $\powerset_I(S)$. +Since $\sigma(\powerset_I(S))$ may contain some non-internal subsets of $S$, this means that while $\mu_L$ is restricted only to internal sets, $\mu_L^+$ may be able to measure some external sets as well. +An example is the finite hyperreals, which are given by $$\bigcup_{n \in \mathbb{N}} \hyp[(] \unaryminus n, n) \cap S$$ +This is a countable union of measurable sets, so it is measurable; but it cannot be specified internally. +The union is taken over a non-internal set. +\end{remark} + +\ + +Now we wish to show that, in some sense, the above Loeb measure coincides with the Lebesgue measure $\lambda$ on $\mathbb{R}$. + +Since $\mu_L$ is a function on subsets of the lattice $S \subset \hyp$, one might imagine that the following procedure is required to find the ``Loeb measure'' of a set $B \subseteq \mathbb{R}$ (recalling that, strictly speaking, only subsets of $\hyp$ can have Loeb measure): +\begin{enumerate} +\item Transfer $B$ to the hyperreals; +\item Intersect $\hyp[B]$ with $S$; +\item Take the Loeb measure of the resulting set. +\end{enumerate} + +However, this procedure fails to give the desired answer for the set $\mathbb{Q}$, which we know to have Lebesgue measure $0$. +Indeed, $\hyp[\mathbb{Q}]$ contains $S$, so it must have infinite Loeb measure under this (faulty) scheme, rather than the $0$ we would like if our measure resembles Lebesgue measure. +Additionally, if $B$ is not an internal subset of $\mathbb{R}$, then we cannot necessarily transfer it to $\hyp$ in the first place. + +In keeping with the idea of ``approximate with an infinitesimal mesh'' (as opposed to ``project onto an infinitesimal mesh''), we instead aim to show that $$\lambda(B) = \mu_L^+(\app{B})$$ +where $\app{B}$, the ``approximation of $B$'', is defined\footnote{Goldblatt uses notation equivalent to $\st^{-1}(\cdot)$ for $\app{\cdot}$ in Section 16.8 of \cite{goldblatt}.} to be $$\app{B} = \{ s \in S: \text{$s$ is finite and $\st(s) \in B$} \}$$ + +\ + +\begin{defn}Given $B \subseteq \mathbb{R}$, we will say $B$ is \emph{``Loeb measurable''} (with quotation marks) if $\app{B}$ is Loeb measurable, with respect to $\mu_L^+$. +We will say $B$ has ``Loeb measure'' $m$ (with quotation marks) if $\app{B}$ has Loeb measure $m$. + +This notation is not standard. +\end{defn} + +\begin{remark}Notice that $\app{B}$ is not necessarily an internal set, because it relies on the predicate ``$s$ is finite''. +In fact, $\app{B}$ might not even be Loeb measurable. +This happens, for instance, whenever $B$ is not Lebesgue measurable. +We will prove this as Theorem \ref{thm:loebimplieslebesgue}. +\end{remark} + +\begin{example} +In the case $B = \mathbb{R}$, we have $$\app{B} = \{ s \in S: \text{$s$ is finite} \}$$ +which is definitely not an internal set. +Nonetheless, $\app{B}$ is still Loeb measurable, because it is a countable (albeit non-internal) union (in $\hyp$) of Loeb measurable sets: +$$\bigcup_{n \in \mathbb{N}} (S \cap \hyp[(-n, n]))$$ +Each of the terms in the union is an internal subset of the lattice $S$, and so is Loeb measurable. +\end{example} + +\begin{example}[The Cantor set] +Throughout this example, we will use the notation $0.a_1 a_2 \dots$ to denote ternary expansion. +Recall that the Cantor set is defined as $$\mathcal{C} = \{x \in [0,1] : \text{$x$ contains no $1$ in its ternary expansion} \}$$ +It is well-known to have Lebesgue measure $0$ despite being uncountable. + +What is $\app{\mathcal{C}}$? + +It is an internal fact of $\mathbb{R}$ that every real in $[0,1]$ has a ternary expansion. +Therefore it is true also in $\hyp$: every hyperreal in $\hyp[[]0,1]$ has an expansion of the form +$$0.(a_1 a_2 \dots a_n a_{n+1} \dots)(\dots a_{M-1} a_M a_{M+1} \dots)\dots $$ +where $M$ is an example of an infinite hypernatural number, and (to ensure uniqueness of the expansion) we choose the $a_i$ so that it is never the case that from some point on, all the $a_j$ are equal to $2$. + +The standard part of this number $x$ is precisely $$0.a_1 a_2 \dots$$ +the ``standard part'' of its base-$3$ expansion, where we truncate any infinite-index places. + +So, a hyperreal $s \in S$ lies in $\app{\mathcal{C}}$ if and only if the ``standard part'' of its base-$3$ expansion consists only of the base-$3$ digits $0$ and $2$. + +But how many of these are there? +Recalling that $N$ is the hypernatural denominator of every element of our lattice $S \subseteq \hyp$, we can pick $N = 3^P$ (where $P$ is hypernatural) in such a way that all the elements $\frac{k}{N} \in S$ have hyper-terminating base-$3$ expansion: +$$0.(a_1 a_2 \dots)( \dots a_{P-1} a_P)$$ + +Then $$\app{\mathcal{C}} = \{ (0.a_1 a_2 \dots) (\dots a_{P-1} a_P) : \text{$a_i \not = 1$ for each $i \in \mathbb{N}$} \}$$ +This set is not \emph{a priori} internal, because we built it using the ``$i \in \mathbb{N}$'' clause. +But it is the following (external) nested intersection of Loeb measurable sets: +$$\app{\mathcal{C}} = \bigcap_{n \in \mathbb{N}} \{ (0.a_1 a_2 \dots) (\dots a_{P-1} a_P) : \text{$a_i \not = 1$ for each $i < n$}\}$$ + +Being a countable intersection of Loeb measurable sets, it is measurable according to the extension $\mu_L^+$, and its measure is the limit of the measures of the components. + +The component $\{ (0.a_1 a_2 \dots) (\dots a_{P-1} a_P) : \text{$a_i \not = 1$ for each $i < n$}\}$ has hyperfinite size $$2^n 3^{P-n} = \left(\frac{2}{3} \right)^n 3^P$$ +where we note that since every such ternary expansion is hyper-terminating, it is auto\-matically ``acceptable'' in that it is not of the form $0.a_1 a_2 \dots a_{K-1} a_{K} 2222\dots$ for any (finite or infinite) hypernatural $K$. + +So the ``Loeb measure'' of $\mathcal{C}$ is $$\mu_L^+(\app{\mathcal{C}}) = \st \left( \frac{|\app{\mathcal{C}}|}{3^P} \right) = \st \left( \lim_{n \to \infty} \left[\frac{2}{3} \right]^n \right) = 0$$ +\end{example} + +\ + +\begin{thm} \label{thm:lebesgueimpliesloeb} Lebesgue measurable sets are ``Loeb measurable''. That is, +\begin{enumerate} +\item The ``Loeb measure'' of $(a, b) \subset \mathbb{R}$ exists, and is $b-a$. +\item More generally, if $B \subseteq \mathbb{R}$ is Lebesgue measurable, then it is ``Loeb measurable'', and its ``Loeb measure'' is equal to its Lebesgue measure. +\end{enumerate} +\end{thm} +\begin{proof} +(1): $$\app{(a, b)} = \{ s \in S : a < \st(s) < b \} = \bigcup_{n \in \mathbb{N}^{\geq 1}} \left[S \cap \hyp[(]a+\frac{1}{n}, b-\frac{1}{n}) \right]$$ +This is a nested countable union of internal sets, so its Loeb measure exists and is equal to the limit of the Loeb measures of the individual $S \cap \hyp[(]a+\frac{1}{n}, b-\frac{1}{n})$. + +But $S \cap \hyp[(]a+\frac{1}{n}, b-\frac{1}{n})$ is hyperfinite, because it is an internally specified subset of a hyperfinite set; so it has greatest and least elements $g$ and $l$ respectively, and it is easy to see that $g \near b-\frac{1}{n}$ and $l \near a+\frac{1}{n}$. + +Therefore $$S \cap \hyp[(]a+\frac{1}{n}, b-\frac{1}{n}) = \left\{ \frac{K}{N}, \frac{K+1}{N}, \dots, \frac{L}{N} \right\}$$ for some $K, L$ hypernaturals, where $\frac{K}{N} = l$ and $\frac{L}{N} = g$. +The internal cardinality of this set is the hypernatural $L-K+1$, so the Loeb measure is $$\st \left( \frac{|S \cap \hyp[(]a+\frac{1}{n}, b-\frac{1}{n})|}{N} \right) = \st \left( \frac{L-K+1}{N} \right) = \st \left( g-l + \frac{1}{N} \right) = b-a-\frac{2}{n}$$ + +By taking the limit as $n \to \infty$, we obtain $\app{(a,b)} = b-a$. + +(2): This section of the proof will involve working with epsilons, as is characteristic of showing that standard definitions are equivalent to non-standard ones. + +Note that $\app{\emptyset} = \emptyset$, so the ``Loeb measure'' of $\emptyset$ coincides with the Lebesgue measure. + +The first part of this theorem states that the ``Loeb measure'' agrees with Lebesgue measure on a basis of the Borel $\sigma$-algebra on $\mathbb{R}$. +There is a uniqueness theorem for measures, which forces the ``Loeb measure'' and the Lebesgue measure to agree on every Borel set (since they already agree on a basis of the $\sigma$-algebra). +This is the content of Lemma 16.4.1 of \cite{goldblatt}. + +Now, every Lebesgue measurable set $B$ can be approximated by Borel sets in an arbitrarily fine way: for every $\varepsilon$ there are Borel sets $C_{\varepsilon}$ and $D_{\varepsilon}$ such that $$C_{\varepsilon} \subseteq B \subseteq D_{\varepsilon}$$ +with $\lambda(B \setminus C_{\varepsilon}) < \varepsilon$, and similarly $\lambda(D_{\varepsilon} \setminus B) < \varepsilon$. +(Recall this fact from the preamble to Section \ref{sec:alternativeloeb} of this essay, as the alternative characterisation of Lebesgue measurability.) + +Since $C_{\varepsilon}$ and $D_{\varepsilon}$ are Borel, they are ``Loeb measurable''. +Certainly $$\app{C_{\varepsilon}} \subseteq \app{B} \subseteq \app{D_{\varepsilon}}$$ +but $C_{\varepsilon}$ has $\lambda(C_{\varepsilon}) = \mu_L^+(C_{\varepsilon})$ by the fact that $\lambda$ and $\mu_L^+$ agree on Borel sets; similarly $\lambda(D_{\varepsilon}) = \mu_L^+(D_{\varepsilon})$. + +Therefore $\app{B}$ is $\mu$-approxim\-able (recall Definition \ref{defn:muapprox})---strictly speaking, by taking a $\mu$-approximation of $C_{\varepsilon}$ to be the smaller set and a $\mu$-approximation of $D_{\varepsilon}$ to be the larger set, as per Lemma \ref{lemma:approx}---so by Theorem \ref{thm:alternativeloeb}, $B$ is ``Loeb measurable''. + +Its ``Loeb measure'' is sandwiched between $\lambda(C_{\varepsilon})$ and $\lambda(D_{\varepsilon})$ for all $\varepsilon \in \mathbb{R}^{>0}$, and the sets $C_{\varepsilon}$ and $D_{\varepsilon}$ are at most $2 \varepsilon$ apart in Lebesgue measure. +Therefore the ``Loeb measure'' of $B$ is equal to the limit of the $\lambda(C_{\varepsilon})$ (and also equal to the limit of the $\lambda(D_{\varepsilon})$), which is $\lambda(B)$. +\end{proof} + +% Apologies for the awful choice of hyphenation in $\mu$-approximability; it seems to be the least bad option. +We now prove the other direction of the equivalence, using Theorem \ref{thm:alternativeloeb} on $\mu$-approxima\-bility so that we may ensure we are always dealing concretely with ``nice'' sets. +Again the alternative characterisation of Lebesgue measurability (from the preamble to Section \ref{sec:alternativeloeb}) will come into play: Lebesgue measurable sets are those which may be approximated arbitrarily well by Borel sets. + +First, we require a technical lemma, which is Theorem 11.13.1 of Goldblatt \cite{goldblatt}. +It is technical in the sense that it is potentially sensitive to the explicit construction of the hyperreals; it is a fact of the ultrapower construction. +If we wished to remain agnostic about the construction of the hyperreals, it would be necessary to insist that we were working in a \emph{countably saturated} system---a requirement which is satisfied by the ultrapower construction---which is to say that the intersection of any decreasing sequence of nonempty internal sets is nonempty. +Goldblatt (in Theorem 11.10.1 of \cite{goldblatt}) calls this a ``delicate'' fact about the ultrapower; we will not prove it here. + +\ + +\begin{lemma} \label{lemma:internalclosed} Let $X$ be an internal subset of $\hyp$. Then $\{ \st(x): x \in X \}$ is closed as a subset of $\mathbb{R}$. +\end{lemma} +\begin{proof} +Let $r \in \mathbb{R}$ be a limit point of $\{ \st(x) : x \in X\}$; say $r$ is the limit of the sequence $$(\st(x_i))_{i \in \mathbb{N}}$$ +We need $r = \st(y)$ for some $y \in X$. + +By omitting some terms of the sequence if necessary, we may pick each $x_i$ so that $$\hyp[|]r-x_i| < \frac{1}{i}$$ +That is, $$x_i \in X \cap \hyp[(]r-\frac{1}{i}, r+\frac{1}{i})$$ + +Given countable saturation, there is some $y \in X$ lying in all the $X \cap \hyp[(]r-\frac{1}{n}, r+\frac{1}{n})$; then $\st(y) = r$ as required, since for all $n \in \mathbb{N}$, we have $$|r-y| < \frac{1}{n}$$ +\end{proof} + +\ + +\begin{thm} \label{thm:loebimplieslebesgue} +Let $B \subseteq \mathbb{R}$ be ``Loeb measurable'' (that is, $\app{B}$ is Loeb measurable). +Then $B$ is Lebesgue measurable, and $\lambda(B)$ is equal to $\mu_L^+(\app{B})$. +\end{thm} +\begin{proof} +Let $B$ be ``Loeb measurable''. +It is enough to show that $B$ is Lebesgue measurable; then Theorem \ref{thm:lebesgueimpliesloeb} tells us that its ``Loeb measure'' is equal to its Lebesgue measure. + +There are two cases: $B$'s ``Loeb measure'' is either finite or the literal value $\infty$. + +If it is finite, then by Theorem \ref{thm:alternativeloeb}, $\app{B}$ is $\mu$-approximable. +(Recall Definitions \ref{defn:muapprox} and \ref{defn:loeblebesgue}: $\mathcal{A} = \powerset_I(S)$ is the collection of internal subsets of the lattice $S$, and $B$ has the property that for every $\varepsilon \in \mathbb{R}^{>0}$, there are sets $C, D \in \powerset_I(S)$ such that $\mu_L(D \setminus C) < \varepsilon$ and $C \subseteq \app{B} \subseteq D$.) + +Let $\varepsilon \in \mathbb{R}^{> 0}$, and pick $\hyp[C], \hyp[D] \in \mathcal{A}$ approximating $\app{B}$ to within $\varepsilon$. +(We label them $\hyp[C]$ and $\hyp[D]$ as a cue to the fact that these are internal subsets of the hyperreal lattice.) + +Now, we would like to use $\hyp[C]$ to build a set of reals which approximates $B$ from below in the Lebesgue sense. +There are two possible sets to take: the set of all standard parts of elements of $\hyp[C]$, or $C$ itself. +(These are in general not necessarily the same set: consider $\hyp[(]0, 1)$, which contains an infinitesimal $\varepsilon > 0$ whose standard part is $0$, even though $0$ does not lie in $(0, 1)$.) +It turns out that the correct set is the set $$C_{\varepsilon} := \{ \st(c): c \in \hyp[C] \}$$ + +By Lemma \ref{lemma:internalclosed}, $C_{\varepsilon}$ is closed; that makes it Borel, as is required for the alternative definition of Lebesgue measurability. + +Similarly the set $D_{\varepsilon}$ is open and hence Borel, where $$D_{\varepsilon} := \mathbb{R} \setminus \{ \st(d): d \in S \setminus \hyp[D] \}$$ +(This definition is a little cunning; it is designed to maximise the symmetry between $C_{\varepsilon}$ and $D_{\varepsilon}$. +It reflects the idea of ``cover the set and its complement'' from the preamble before Definition \ref{defn:muapprox} of $\mu$-approximability.) + +We have $$C_{\varepsilon} \subseteq \{ \st(b) : b \in \app{B} \} = B \subseteq D_{\varepsilon}$$ +by taking standard parts of the true-by-definition $$\hyp[C] \subseteq \app{B} \subseteq \hyp[D]$$ + +We just need $\lambda(D_{\varepsilon}) - \lambda(C_{\varepsilon})$ to be small. +But we have just proved (in Theorem \ref{thm:lebesgueimpliesloeb}) that Lebesgue measurable sets have ``Loeb measure'' equal to their Lebesgue measure, and we already know $C_{\varepsilon}$ and $D_{\varepsilon}$ are Borel and hence Lebesgue measurable, +so $$\lambda(D_{\varepsilon}) - \lambda(C_{\varepsilon}) = \mu_L^+(\app{D_{\varepsilon}}) - \mu_L^+(\app{C_{\varepsilon}})$$ + +Finally, $\mu_L^+(\hyp[C]) \leq \mu_L^+(\app{C_{\varepsilon}})$ because $\hyp[C] \subseteq \app{C_{\varepsilon}}$. +Indeed, to recap the definitions, $$\text{$\hyp[C] \in \powerset_I(S)$ and $C_{\varepsilon} = \{ \st(c) : c \in \hyp[C] \}$}$$ +so $c \in \hyp[C]$ implies $\st(c) \in C_{\varepsilon}$ which means $c \in \{s \in S: \st(s) \in C_{\varepsilon} \} = \app{C_{\varepsilon}}$. + +And likewise $\mu_L^+(\app{D_{\varepsilon}}) \leq \mu_L^+(\hyp[D])$ because $\app{D_{\varepsilon}} \subseteq \hyp[D]$. +Indeed, if $s \in \app{D_{\varepsilon}}$ then $\st(s) \in D_{\varepsilon}$ so $$\st(s) \in \mathbb{R} \setminus \{ \st(d) : d \in S \setminus \hyp[D]\}$$ +That is, $s \in \hyp[D]$. + +Therefore $$\lambda(D_{\varepsilon}) - \lambda(C_{\varepsilon}) \leq \mu_L(\hyp[D]) - \mu_L(\hyp[C]) < 2 \varepsilon$$ +so we have shown that $B$ may be approximated arbitrarily well by Borel sets, so it is Lebesgue measurable. +This completes the first case. + +If instead $B$'s ``Loeb measure'' is the literal $\infty$, then we may express $B \subseteq \mathbb{R}$ as the following union: +$$\bigcup_{n \in \mathbb{N}} B \cap (-n, n)$$ +We can show that $B$ itself is Lebesgue measurable by showing that each $B \cap (-n, n)$ is Lebesgue measurable. + +But note that $$\app{B \cap (-n, n)} = \app{B} \cap \app{(-n, n)}$$ +so that set is Loeb measurable, being the intersection of two Loeb measurable sets. +It has finite Loeb measure, since $$\mu_L^+(\app{B \cap (-n, n)}) \leq \mu_L^+(\app{(-n, n)}) = 2n$$ +(using Theorem \ref{thm:lebesgueimpliesloeb} to deduce that the ``Loeb measure'' of $(-n, n)$ is equal to its Lebesgue measure). + +Therefore we are done by the first case, since $B \cap (-n, n)$ is a ``Loeb measurable'' set with finite ``Loeb measure''. +\end{proof} + +\begin{remark}The following remark is more in a handwaving motivational vein. +Very loosely, the lattice $S$ can be viewed as a hyperfinite probability space, and the function $\mu$ on it can be viewed as simply a constant (infinite) multiple of a probability measure. +(A probability measure would have to have denominator $2N^2+1$ rather than $N$, to make the total probability $1$ rather than infinite.) +Then intuitively every point of a set $B \subset \mathbb{R}$ is assigned \emph{infinitesimal} measure, rather than the absolute zero measure assigned by the standard theory. +This is a powerful point in favour of the non-standard approach to measure theory: now it is only literally impossible events that are assigned literally zero measure, and all other events are assigned some positive, possibly infinitesimal, measure. +(Of course, on taking standard parts, producing $\mu_L$ and therefore bringing the results back into the realm of standard probability theory, we recover the possibility that a non-impossible event may have zero measure.) +\end{remark} + +\section{Brownian motion} +Our primary reference for this section is Hurd and Loeb \cite{hurdloeb}, where we use Chapter IV.6. + +Brownian motion is a model of the motion of a small particle (such as a speck of pollen) suspended in a fluid (such as stationary air). +It is physically observable with a microscope, and it occurs because molecules of the surrounding fluid collide with the particle in a random way, causing random changes of both course and speed. + +\ + +\begin{defn}[Brownian motion] \label{defn:brownian} +Consider the random position $X_t$ of a particle on the real line at time $t$, where $t$ varies between $0$ and $1$. +Let $\Omega$ be the state space of the variables $X_t$. +We say the $[0,1]$-indexed collection of random variables $\langle X_t : 0 \leq t \leq 1 \rangle$ is a \emph{Brownian motion} if: +\begin{enumerate} +\item $X_0 = 0$. That is, the particle starts at the origin. +\item Given any sequence of nonempty intervals $$[s_1, t_1], [s_2, t_2], \dots, [s_n, t_n]$$ +where $s_1 < t_1 \leq s_2 < t_2 < \dots \leq s_n < t_n$, +we have the random variables $$X_{t_1} - X_{s_1}, \dots, X_{t_n} - X_{s_n}$$ all independent. +That is, the particle's overall movement during any time period is not affected by its overall movement during any other time period. +\item \label{item:normal} If $s < t$, then $$\mathbb{P}( \{ \omega \in \Omega: X_t(\omega) - X_s(\omega) \leq \alpha \} ) = \gaussian \left(\frac{\alpha}{\sqrt{t-s}} \right)$$ +for $\gaussian$ the Gaussian integral $$\gaussian(x) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^x e^{-u^2/2} du$$ +That is, if the particle's position at time $s$ is known, then its position at time $t$ is normally distributed. +\end{enumerate} +\end{defn} + +\ + +A non-standard way to obtain a Brownian motion is derived from considering a random walk. + +Let $\Omega^{(n)}$ be the space of all sequences $\omega = (\omega_1, \omega_2, \dots, \omega_n)$, where each $\omega_i = \pm 1$. +Let a particle move by a distance of $1/\sqrt{n}$ every time-step $t_k = k/n$, in the direction indicated by $\omega_k$. + +Then the position of the particle at time $t$ following walk-sequence $\omega$ is given by +\begin{equation} \label{eqn:brownian} +\chi(t, \omega) = \frac{1}{\sqrt{n}} \sum_{i=1}^{\lfloor n t \rfloor} \omega_i +\end{equation} + +The reason for the distance per step being $1/\sqrt{n}$ is so that the resulting walk has the right variance to satisfy the normal distribution requirement that is item (\ref{item:normal}) in Definition \ref{defn:brownian}; this is in anticipation of $\langle \st(\chi(t, \cdot)): t \in [0,1] \rangle$ being a Brownian motion when we let $n$, the length of the sequences $\omega$, be an infinite hypernatural. + +\ + +\begin{thm} \label{thm:brownian1} +Let $\chi$ be defined as in Equation \ref{eqn:brownian}. +Let $X_0 = \st(\chi(0, \cdot))$, a random variable on state space $\Omega^{(N)}$, where $N$ is an infinite hypernatural. +Then $X_0$ is the constant $0$. +\end{thm} +\begin{proof} +This is immediate: the sum defining $X_0$ is the empty sum. +\end{proof} + +\begin{thm} \label{thm:brownian2} +Let $\chi$ be defined as in Equation \ref{eqn:brownian}. +Let $X_t := \st(\chi(t, \cdot))$ be $[0,1]$-indexed random variables on states $\Omega^{(N)}$. +Then for any $s_1 < t_1 \leq s_2 < t_2$, the random variables $$X_{t_1} - X_{s_1}, X_{t_2} - X_{s_2}$$ are independent. +\end{thm} +\begin{proof} +$$X_{t_1}(\omega) - X_{s_1}(\omega) = \st( \chi(t_1, \omega) - \chi(s_1, \omega) ) = \st \left( \frac{1}{\sqrt{N}} \sum_{i=\lfloor N s_1 \rfloor + 1}^{\lfloor N t_1 \rfloor} \omega_i \right)$$ + +$$X_{t_2}(\omega) - X_{s_2}(\omega) = \st( \chi(t_2, \omega) - \chi(s_2, \omega) ) = \st \left( \frac{1}{\sqrt{N}} \sum_{i=\lfloor N s_2 \rfloor + 1}^{\lfloor N t_2 \rfloor} \omega_i \right)$$ + +The two random variables are therefore clearly independent: their values are defined by sums ranging over disjoint sections of the input $\omega$. +\end{proof} + +This proof easily generalises to the second property of Definition \ref{defn:brownian} with $n$ rather than two almost-disjoint intervals: in each interval, the sum is being taken over disjoint regions of the input $\omega$. + +\ + +To prove the third property of Definition \ref{defn:brownian}, we will need a non-standard analogue of the Central Limit Theorem. + +\ + +\begin{defn}[$*$-independence of random variables] \label{defn:independence} +Let $(X_n)_{n \in \hyp[\mathbb{N}]}$ be an internal sequence of random variables. +We say they are \emph{$*$-independent} if, given any internal $M$-subtuple $(X_{n_i})_{i = 1}^M$, and given any internal $M$-tuple $(\alpha_i)_{i=1}^M$ of reals, we have $$\mathbb{P}\left( \{ \omega \in \Omega: X_1(\omega) < \alpha_1, \dots, X_M(\omega) < \alpha_M \} \right) = \prod_{i=1}^M \mathbb{P}( \{ \omega \in \Omega: X_i(\omega) < \alpha_i \} )$$ +That is, ``all hyperfinite subcollections are independent''. +\end{defn} + +\pagebreak + +\begin{lemma}[Non-standard Central Limit Theorem] \label{thm:clt} +Let $(X_n)_{n \in \hyp[\mathbb{N}]}$ be an internal $*$-independent identically distributed sequence of random variables. +Suppose the mean of each random variable is $0$ and the variance of each is $1$. +Then for any infinite hypernatural $M$ and any $\alpha \in \hyp$, have $$\mathbb{P}\left(\left\{ \omega \in \Omega: \frac{1}{\sqrt{M}} \sum_{n=1}^{M} X_n(\omega) \leq \alpha \right\}\right) \near \hyp[\gaussian](\alpha)$$ +where we recall that $\gaussian$ is the cumulative density of the normal distribution, as in condition 3 of Definition \ref{defn:brownian}. +\end{lemma} + +\ + +We join Hurd and Loeb in omitting the proof of this lemma; it is a fairly short but rather unenlightening consequence of the transfer principle applied to the standard Central Limit Theorem. +The proof may be found as Theorem 21 from Anderson \cite{anderson}. + +\ + +\begin{thm} \label{thm:brownian3} +Property \ref{item:normal} of Definition \ref{defn:brownian} holds for the hyperfinite random walk. +That is, if $\chi$ is defined as in Equation \ref{eqn:brownian}, and $X_t = \st(\chi(t, \cdot))$, then $$\mathbb{P}(\{ \omega \in \Omega^{(N)} : X_t(\omega) - X_s(\omega) \leq \alpha \}) = \gaussian \left(\frac{\alpha}{\sqrt{t-s}} \right)$$ +\end{thm} +\begin{proof} +The left-hand side is $$\mathbb{P}\left(\left\{ \omega \in \Omega^{(n)}: \st \left( \frac{1}{\sqrt{N}} \sum_{i=\lfloor N s \rfloor+1}^{\lfloor N t \rfloor} \omega_i \right) \leq \alpha \right\} \right)$$ + +In order to apply the non-standard Central Limit Theorem, we need to manipulate this into a form which has a sum of random variables, rather than the \emph{standard part} of a sum of random variables. + +We can convert standard parts of sums into simple sums by passing to a limiting process: the left-hand side is precisely +$$\lim_{r \to \infty} \mathbb{P}\left(\left\{ \omega \in \Omega^{(n)}: \frac{1}{\sqrt{N}} \sum_{i=\lfloor N s \rfloor+1}^{\lfloor N t \rfloor} \omega_i \leq \alpha+\frac{1}{r} \right\} \right)$$ + +Now, to get this into precisely the form of the non-standard Central Limit Theorem, we must rewrite the sum so that its indices are from $1$ to some constant. +Letting $$T = \lfloor N t \rfloor - \lfloor N s \rfloor$$ have +$$\lim_{r \to \infty} \mathbb{P} \left( \left \{ \omega \in \Omega^{(n)} : \frac{1}{\sqrt{T}} \sum_{i=1}^{T} \omega_{i+\lfloor Ns \rfloor} \leq \left(\alpha + \frac{1}{r} \right) \frac{\sqrt{N}}{\sqrt{T}} \right \} \right)$$ + +Let us assume for the moment that the collection of $Y_i(\omega) := \omega_{i+\lfloor Ns \rfloor}$ is $*$-independent. +Certainly they all have mean $0$, variance $1$, and are identically distributed. + +By the non-standard Central Limit Theorem, each term of the limit is infinitesimally close to $$\hyp[\gaussian]\left( \left( \alpha+\frac{1}{r} \right) \frac{\sqrt{N}}{\sqrt{T}}\right) $$ + +But $\gaussian$ is uniformly continuous because it is increasing, bounded and continuous. +(Recall the non-standard definition of uniform continuity from Section \ref{sec:uniform}: infinitesimally perturbing $x$ induces only an infinitesimal perturbation in $\hyp[\gaussian](x)$.) +Therefore $$\st \hyp[\gaussian]\left( \left( \alpha+\frac{1}{r} \right) \frac{\sqrt{N}}{\sqrt{T}}\right) = \gaussian \left(\st \left( \left( \alpha+\frac{1}{r} \right) \frac{\sqrt{N}}{\sqrt{T}} \right) \right)$$ + +Recall that $$T = \lfloor N t \rfloor - \lfloor N s \rfloor$$ +so $$\frac{\sqrt{N}}{\sqrt{T}} = \frac{\sqrt{N}}{\sqrt{\lfloor N t \rfloor - \lfloor N s \rfloor}} \near \frac{1}{\sqrt{t-s}}$$ +because $$\lim_{n \to \infty} \frac{n}{\lfloor n t \rfloor - \lfloor n s \rfloor} = \frac{1}{t-s}$$ + +Therefore each term of our limit is infinitesimally close to $$\gaussian \left( \left( \alpha+\frac{1}{r} \right) \st \left(\frac{\sqrt{N}}{\sqrt{T}} \right) \right) = \gaussian \left(\left( \alpha+\frac{1}{r} \right) \frac{1}{\sqrt{t-s}}\right)$$ + +Taking the limit as $r \to \infty$, obtain $$\gaussian \left(\frac{\alpha}{\sqrt{t-s}} \right)$$ +exactly as required. + +We still need to show that the collection of $Y_i(\omega) := \omega_{i+\lfloor Ns \rfloor}$ is $*$-independent, so as to justify the use of the non-standard Central Limit Theorem and complete the proof. +But this is immediate: each $Y_i$ inspects a different part of the input. + +% ---- From this point is erroneous working I did when I had a typo $X_i$ instead of $\omega_i$ above, so I was proving the wrong things to be independent. +% Notice that the proof of Theorem \ref{thm:brownian2} goes through to show that the collection of $X_t$ has the property that every (standard-finite) subcollection $(X_i)_{i=1}^m$ has the independence property: for any $(i_1, \dots, i_m) \in [0,1]^m$, have $X_{i_2} - X_{i_1}, \dots, X_{i_m} - X_{i_{m-1}}$ independent. +%More elaborately stated, +% +%\begin{multline}(\forall m \in \mathbb{N}) \\ +%(\forall (i_1, \dots, i_m) \in [0,1]^m) (\forall (\alpha_1, \dots, \alpha_m) \in [0,1]^m) (\forall n \in \mathbb{N}^{\leq m}) \\ +%(\prod \mathbb{P}(X_{i_n}-X_{i_{n-1}} \leq \alpha_n) = \mathbb{P}(\bigcap \{ X_{i_n} - X_{i_{n-1}} \leq \alpha_n\})) +%\end{multline} +% +%This fact transfers to +%\begin{multline}(\forall m \in \hyp[\mathbb{N}]) \\ +%(\forall \text{$(i_1, \dots, i_m)$ internal} \in \hyp[[]0,1]^m) (\forall \text{$(\alpha_1, \dots, \alpha_m)$ internal} \in \hyp[[]0,1]^m) (\forall n \in \hyp[\mathbb{N}]^{\leq m}) \\ +%(\prod \mathbb{P}(X_{i_n}-X_{i_{n-1}} \leq \alpha_n) = \mathbb{P}(\bigcap \{ X_{i_n} - X_{i_{n-1}} \leq \alpha_n\})) +%\end{multline} +% +%This is simply the required $*$-independence. +\end{proof} + +Together, Theorems \ref{thm:brownian1}, \ref{thm:brownian2} and \ref{thm:brownian3} prove that the non-standard ``random walk'' approach yields a Brownian motion as defined in Definition \ref{defn:brownian}. + +\begin{remark} +In fact, it is possible to prove that almost all of these Brownian motions are continuous: for almost all fixed $\omega$, it is the case that $t \mapsto X_t(\omega)$ is continuous. +This is Theorem 6.13 of Hurd and Loeb \cite{hurdloeb}. + +That is, this scheme of creating Brownian motions almost always creates ``physically realistic'' motions, in the sense that the paths are continuous. +\end{remark} + +\begin{thebibliography}{9} + +\bibitem{robinson} + Abraham Robinson, + \emph{Non-standard analysis}, + Princeton Landmarks in Mathematics + +\bibitem{petry} + Andr\'{e} P\'{e}try, + \emph{Analyse Infinit\'{e}simale: une pr\'{e}sentation non standard}, first edition, + C\'{e}fal + +\bibitem{davis} + Isaac Davis, + \emph{An Introduction to Nonstandard Analysis}, \\ + \url{www.math.uchicago.edu/~may/VIGRE/VIGRE2009/REUPapers/Davis.pdf} + +\bibitem{goldblatt} + Robert Goldblatt, + \emph{Lectures on the Hyperreals: an Introduction to Nonstandard Analysis}, + Springer, Graduate Texts in Mathematics + +\bibitem{spivak} + Michael Spivak, + \emph{Calculus}, third edition, + Cambridge University Press + +\bibitem{hurdloeb} + A. E. Hurd and P. A. Loeb, + \emph{An Introduction to Nonstandard Real Analysis}, + Academic Press Inc + +\bibitem{halmos} + Paul R Halmos, + \emph{Measure Theory}, + Springer, Graduate Texts in Mathematics + +\bibitem{williamson} + J. H. Williamson, + \emph{Lebesgue Integration}, + Courier Corporation + +\bibitem{anderson} + Robert M. Anderson, + \emph{A non-standard representation for Brownian motion and \^Ito integration}, + Israel Journal of Mathematics, Vol. 25, 1976 + +\end{thebibliography} +\end{document} \ No newline at end of file diff --git a/ParametricBoundedLoeb2016.tex b/ParametricBoundedLoeb2016.tex new file mode 100644 index 0000000..66004bc --- /dev/null +++ b/ParametricBoundedLoeb2016.tex @@ -0,0 +1,154 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\newmdtheoremenv{thm}{Theorem}[section] + +\newcommand{\prov}{\square} +\newcommand{\encode}[1]{\ulcorner #1 \urcorner} +\newcommand{\lob}{L\"ob's Theorem} + +\title{Parametric bounded version of L\"ob's theorem} +\author{Patrick Stevens} +\date{24th July 2016} + +\begin{document} + +\maketitle +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/ParametricBoundedLoeb2016/ParametricBoundedLoeb2016.pdf} \end{center} +\normalsize + +\section{Introduction} +I was recently made aware of a preprint\cite{critch} of a paper which proves a bounded version of \lob. + +\ + +\begin{thm}[Parametric Bounded L\"ob] +If $\prov A$ is the operator ``there exists a proof of $A$ in Peano arithmetic'' +and $\prov_k A$ is the operator ``there exists a proof of $A$ in $k$ or fewer lines in Peano arithmetic'', +then for every formula $p$ of one free variable in the language of PA, and every computable $f: \mathbb{N} \to \mathbb{N}$ +which grows sufficiently fast, it is true that +$$(\exists \hat{k})[\color{red}(\vdash [\forall k][\color{green}\prov_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k > \hat{k}][p(k)]) \color{black}]$$ +\end{thm} + +\ + +(Colour is used only to emphasise logical chunks of the formula.) + +The paper gives plenty of motivation about why this result should be interesting and useful: +section 6 of the paper, for instance, is an application to the one-shot Prisoner's Dilemma +played between agents who have access to each other's source code. +However, I believe that while the theorem may be true and the proof may be correct, +its application may not be as straightforward as the paper suggests. + +\section{Background} + +\begin{thm}[\lob] +Suppose $\prov \encode{A}$ denotes ``the formula $A$ with G\"odel number $\encode{A}$ is provable''. +If $$\mathrm{PA} \vdash (\prov \encode{P} \to P)$$ +then +$$\mathrm{PA} \vdash P$$ +\end{thm} + +\ + +\lob{} is at heart a statement about the incompatibility of the interpretation of the box as ``provable'' +with the intuitively plausible deduction rule that $\prov \encode{P} \to P$. +(``If we have a proof of $P$, then we can deduce $P$!'') +The Critch paper has an example in Section 1.4 where $P$ is the Riemann hypothesis. + +\section{Problem with the paper} + +Suppose $\mathcal{M}$ is a model of Peano arithmetic, in which our agent is working. +It is a fact of first-order logic (through the L\"owenheim-Skolem theorem) that there is no first-order way of distinguishing any particular model of PA. +Therefore the model of PA could be non-standard; this is not something a first-order reasoning agent could determine. + +If the agent is working with a non-standard model of PA, then all the theorems of the Critch paper may well go through. +However, they become substantially less useful, as follows. + +Let us write $M$ for the underlying class (or set) of the model $\mathcal{M}$ of PA. Then the statement $$(\exists \hat{k})[\color{red}(\vdash [\forall k][\color{green}\prov_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k > \hat{k}][p(k)]) \color{black}]$$ +when relativised to the model $\mathcal{M}$ becomes +$$(\exists \hat{k} \in M)[\color{red}(\vdash [\forall k \in M][\color{green}\prov^{\mathcal{M}}_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k \in M^{>\hat{k}}][p(k)]) \color{black}]$$ +where $\prov^{\mathcal{M}}_{f(k)}$ is now shorthand for ``there is a proof-object $P$ in $M$ such that $P$ encodes a $M$-proof of $p(k)$ which is fewer than $f(k)$ lines long''. + +Notice that the quantifiers have been restricted to $M$; in particular, $\hat{k}$ might be a non-standard natural number. +Likewise, the ``there is a proof'' predicate is now ``there is an object which $M$ unpacks into a proof''; but such objects may be non-standard naturals themselves, and unpack into non-standard proofs (which $\mathcal{M}$ still believes are proofs, because it doesn't know the difference between ``standard'' and ``non-standard''). + +\subsection{Aside: non-standard proof objects} + +What is a non-standard proof object? +Let's imagine we have some specific statements $a_i$ for each natural $i$ such that $a_i \to a_{i+1}$ for each $i$, and such that $a_0$ is an axiom of PA. +I'm using $a_i$ only for shorthand; the reader should imagine I had some specific statements and specific proofs of $a_i \to a_{i+1}$. + +Consider the following proof of $a_2$: +\begin{enumerate} +\item $a_0$ (axiom) +\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line) +\item $a_2$ (by writing out the proof of $a_1 \to a_2$ above this line) +\end{enumerate} + +If we take a simple G\"odel numbering scheme, namely ``take the number to be an ASCII string in base $256$'', +it's easy to see that this proof has a G\"odel number. +After all, we're imagining that I have specific proofs of $a_i \to a_{i+1}$, so I could just write them in. +Then you're reading this document which was originally encoded as ASCII, so the G\"odel numbering scheme must have worked. + +Similarly, there is a G\"odel number corresponding to the following: +\begin{enumerate} +\item $a_0$ (axiom) +\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line) +\item \dots +\item $a_k$ (by writing out the proof of $a_{k-1} \to a_k$ above this line) +\end{enumerate} + +Now, suppose we're working in a non-standard model, and fix non-standard $K$. +Then there is a (probably non-standard) natural $L$ corresponding to the following proof: +\begin{enumerate} +\item $a_0$ (axiom) +\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line) +\item \dots +\item $a_K$ (by writing out the proof of $a_{K-1} \to a_K$ above this line) +\end{enumerate} + +Now, this is not a ``proof'' in our intuitive sense of the word, because from our perspective it's infinitely long. +However, the model still thinks this is a proof, and that it's coded by the (non-standard) natural $L$. + +\subsection{Implication for PBL} + +So the model $\mathcal{M}$ believes there is a natural $\hat{k}$ such that \dots +But if that natural is non-standard (and remember that this is not something the model can determine without breaking into second-order logic!) +then PBL doesn't really help us. +It simply tells us that all sufficiently-large non-standard naturals have a certain property; but that doesn't necessarily mean any standard naturals have that property. +And the application to the Prisoners' Dilemma in Critch's paper requires a standard finite $\hat{k}$. + +If we, constructing the agent Fairbot, could somehow guarantee that it would be working within the standard model of PA, then all would be well. +However, we can't do that within first-order logic. +It could be the case that when constructing Fairbot, the only sufficiently-large naturals turn out to be non-standard. +When we eventually come to run $\mathrm{Fairbot}_k(\mathrm{Fairbot}_k)$, it could therefore be that it will take nonstandardly-many proof steps to discover the ``(coooperate, cooperate)'' outcome. +In practice, therefore, the agents would not find that outcome: we can only run them for standardly-many steps, and all non-standard naturals look infinite to us. + + +\section{Acknowledgements} +My thanks are due to Mi\"etek Bak (who persuaded me that there might be a problem with the article) +and to John Aspden (who very capably forced Mi\"etek to clarify his objection until I finally understood it). +As ever, any mistakes in this article are due only to me. + +\begin{thebibliography}{9} + +\bibitem{critch} + Andrew Critch, + \emph{Parametric Bounded L\"ob's Theorem and Robust Cooperation of Bounded Agents}, + \url{http://arxiv.org/abs/1602.04184v4} + +\end{thebibliography} + +\end{document} \ No newline at end of file diff --git a/README.md b/README.md index 9580e73..e096dd7 100644 --- a/README.md +++ b/README.md @@ -7,5 +7,5 @@ It is intended to be read in as a subtree to its [build pipeline](https://github ## The licence -The `RepresentableFunctors.tex` file is CC BY-SA 4.0. The `pdf-targets.txt` file is licensed [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/). +All `.tex` files are CC BY-SA 4.0. \ No newline at end of file diff --git a/Tennenbaum.tex b/Tennenbaum.tex new file mode 100644 index 0000000..f7aca5c --- /dev/null +++ b/Tennenbaum.tex @@ -0,0 +1,160 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{mdframed} +\usepackage{hyperref} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\newmdtheoremenv{thm}{Theorem}[section] + +\theoremstyle{remark} +\newtheorem*{note}{Notation} +\newtheorem*{rmk}{Remark} +\newtheorem*{example}{Example} + +\title{Tennenbaum's Theorem} +\author{Patrick Stevens} +\date{27th April 2016} + +\begin{document} + +\maketitle +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/Tennenbaum/Tennenbaum.pdf} \end{center} +\normalsize + +\section{Introduction} + +\begin{thm}[Tennenbaum's Theorem] +Let $\mathfrak{M}$ be a countable non-standard model of Peano arithmetic, whose carrier set is $\mathbb{N}$. +Then it is not the case that $+$ and $\times$ have decidable graphs in the model. +\end{thm} + +\ + +\begin{note} +We will use the notation $\{ e\}$ to represent the $e$th Turing machine. +$e$ is considered only to be a standard integer here. +For example, we might view the G\"odel numbering scheme as being ``convert from ASCII and then interpret as a Python program''. +\end{note} + +\begin{rmk} +How might our standard Turing machine refer to a nonstandard integer? +The ground set of our nonstandard model is $\mathbb{N}$: every nonstandard integer has a standard one which represents it in $\mathbb{N}$. +Perhaps $4 \in \mathbb{N}$ is the object that the nonstandard model $\mathfrak{M}$ thinks is the number $7$, for instance. +So the way a Turing machine would refer to the number $7$-in-the-model is to use $4$ in its source code. +\end{rmk} + +What does it mean for $+$ to have a decidable graph? +Simply that there is some (standard) natural $n$ such that, +when we unpack $n$ into instructions for running a Turing machine, +we obtain a machine that takes three naturals (that is, standard naturals) $a, b, c$ and outputs $1$ iff, when we take the referents $a', b', c'$ of $a, b, c$ in the model $\mathfrak{M}$, it is true that $a' +_{\mathfrak{M}} b' = c'$. + +\begin{example} +A strictly standard-length program may halt in nonstandard time, when interpreted in a nonstandard model. +Indeed, fix some nonstandard ``infinite'' $n$ (i.e. $n$ is not a standard natural). +Then the following program halts after $n$ steps. +\begin{verbatim} +ans = 0; +for i = 1 to n: + ans := ans + 1; +end +HALT with output ans; +\end{verbatim} +\end{example} + +\section{Overview of the proof} + +The proof \emph{est omnis divisa in partes tres}. + +\begin{enumerate} +\item In any model, there is some pair of semidecidable but recursively inseparable sets. +\item We can use these to create an undecidable set of true standard naturals which can, in some sense, be coded up into a (nonstandard) natural in our model. +\item If $+$ and $\times$ were decidable, then the coding process would produce an object which would let us decide the undecidable set; contradiction. +\end{enumerate} + +\section{Existence of recursively inseparable sets} +This is fairly easy. Take $A = \{ e : \{ e \}(e) \downarrow = 0 \}$ and $B = \{ e: \{e \}(e) \downarrow > 0 \}$, where $\downarrow=$ means ``halts and is equal to'', and $\downarrow >$ means ``halts and is greater than''. +Recall that $e$ must be standard. + +Now, suppose there were a (standard) integer $n$ such that $\{ n \}$ were the indicator function on set $X$, where $X \cap B = \emptyset$ and $A \subseteq X$. +Then what is $\{n\}(n)$? +If it were $0$, then $n$ is not in $X$, so $n$ is not in $A$ and so $\{n\}(n)$ doesn't halt at $0$. +That's a contradiction. +If it were $1$, then $n$ is in $X$ and hence is not in $B$, so $\{n\}(n)$ doesn't halt at something bigger than $0$; again a contradiction. + +So we have produced a pair of sets which are both semidecidable but are recursively inseparable, in the sense that no standard integer $n$ has $\{n\}$ deciding a superset $X$ of $A$ where $X \cap B = \emptyset$. +(This is independent of the model of PA we were considering; it's purely happening over the ground set.) + +\section{Coding sets of naturals as naturals} +We can take any set of (possibly nonstandard) naturals and code it as a (possibly nonstandard) natural, as follows. +Given $\{ n_i : i \in I \}$, code it as $\sum_{i \in I} 2^{n_i}$. +If $+$ and $\times$ are decidable, then this is a decidable coding scheme. +(The preceding line is going to be where our contradiction arises, right at the end of the proof!) + +Notice that if $I$ is ``standard-infinite'' (that is, it contains nonstandardly-many elements) then the resulting code is nonstandard. +Additionally if any $n_i$ is strictly-nonstandard. + +\section{Undecidable set in \texorpdfstring{$\mathfrak{M}$}{M}} +Take our pair of recursively inseparable semidecidable sets: $\mathfrak{A}$ and $\mathfrak{B}$. +(We constructed them explicitly earlier, but now we don't care what they are.) +Recalling a theorem that being semidecidable is equivalent to being a projection of a decidable set, +write $A$ for a decidable set such that $(\exists y)[(n, y) \in A]$ if and only if $n \in \mathfrak{A}$, +and similarly for $B$. +(The quantifiers range over $\mathbb{N}$, because $A$ and $B$ consist only of standard naturals, being subsets of the ground set.) + +By their recursive-inseparability, they are in particular disjoint, so we have $$(\forall n)[(\exists x)(\langle n, x \rangle \in A) \to \neg (\exists y)(\langle n, y \rangle \in B)]$$ +where the quantifiers all range over $\mathbb{N}$. +Equivalently, $$(\forall n)(\forall x)(\forall y)(\neg \langle n,x \rangle \in A \vee \neg \langle n,y \rangle \in B)$$ +If we bound the quantifiers by any standard $m = SS\dots S(0)$ (which we explicitly write out, so it's absolute between all models of PA), we obtain an expression which our nonstandard model believes, because the expression is absolute for PA: +$$(\forall n < m)(\forall x < m)(\forall y < m)(\neg \langle n,x \rangle \in A \vee \neg \langle n,y \rangle \in B)$$ + +This is true for every standard $m$, and so it must be true for some nonstandard $m$ by overspill, since $\mathfrak{M}$ doesn't know how to distinguish between standard and nonstandard elements. +If the property were only ever true for standard $m$, then $\mathfrak{M}$ could identify nonstandard $m$ by checking whether that property held for $m$. + +Let $e$ be strictly nonstandard such that +\begin{equation} \label{eqn:prop} +\mathfrak{M} \vDash (\forall n < e)(\forall x < e)(\forall y < e)(\langle n,x \rangle \not \in A \vee \langle n,y \rangle \not \in B) +\end{equation} +where we note that this time $e$ is not written out explicitly as $SS\dots S(0)$ because it's too big to do that with. + +Finally, we define our undecidable set $X \subseteq \mathbb{N}$ of \emph{standard} naturals to be those standard naturals $x$ such that $$\mathfrak{M} \vDash (\exists y < e) (\langle x, y \rangle \in A)$$ +This is undecidable in the standard sense: there are no standard $m$ such that $\{m \}$ is the indicator function of $X$. +Indeed, I claim that $X$ separates $\mathfrak{A}$ and $\mathfrak{B}$. +(Recall that all members of $X$, $\mathfrak{A}$ and $\mathfrak{B}$ are standard.) + +\begin{itemize} +\item If $a \in \mathfrak{A}$ then there is some standard natural $n$ such that $\langle a, n \rangle \in A$; +and $n$ is certainly less than the nonstandard $e$. +Hence $a \in X$. +\item If $b \in \mathfrak{B}$, then there is standard $n$ such that $\langle b, n \rangle \in B$. +Then $n < e$, so by (\ref{eqn:prop}) we have $\langle b, x \rangle \not \in A$ for all $x < e$. +That is, $b \not \in X$. +\end{itemize} + +\section{Coding up \texorpdfstring{$X$}{X}} +Now if we code up $X$, which is undecidable, using our coding scheme $$\{ n_i : i \in I \} \mapsto \sum_{i \in I} 2^{n_i}$$ +we obtain some nonstandard natural; say $p = \sum_{x \in X} 2^x$. +Supposing the $+$ and $\times$ relations to be decidable, this coding is decidable. +Remember that $X$ is a set of standard naturals which is undecidable: no standard Turing machine decides $X$. + +But here is a procedure to determine whether a standard element $i \in \mathbb{N}$ is in $X$ or not: + +\begin{enumerate} +\item Take the $i$th bit of $p$. (This is decidable because $+$ and $\times$ are.) +\item Return ``not in $X$'' if the $i$th bit is $0$. +\item Otherwise return ``is in $X$''. +\end{enumerate} + +This contradicts the undecidability of $X$. + +\section{Acknowledgements} +The structure of the proof is from Dr Thomas Forster's lecture notes on Computability and Logic from Part III of the Cambridge Maths Tripos, lectured in 2016. + +\end{document} \ No newline at end of file diff --git a/TokyoEntrance2016.tex b/TokyoEntrance2016.tex new file mode 100644 index 0000000..534c296 --- /dev/null +++ b/TokyoEntrance2016.tex @@ -0,0 +1,99 @@ +\documentclass[11pt]{amsart} +\usepackage{geometry} +\geometry{a4paper} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{hyperref} +\usepackage{lmodern} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + +\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + +\title{Tokyo 2016 Graduate School Entrance Exam} +\author{Patrick Stevens} +\date{16th March, 2017} + +\begin{document} +\maketitle + +\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/TokyoEntrance2016/TokyoEntrance2016.pdf} \end{center} +\normalsize + +\section{Question 2} +\subsection{Prove \texorpdfstring{that $S = 2\pi \int_{-1}^1 F(y, y') \ \mathrm{d}x$}{a certain integral form for S}} +The surface may be parametrised as $$S(x, \theta) = (x, y(x) \cos(\theta), y(x) \sin(\theta))$$ +where $\theta \in [0, 2\pi)$ and $x \in [-1,1]$. + +Hence $$\dfrac{\partial S}{\partial x} = (1, y'(x) \cos(\theta), y'(x) \sin(\theta))$$ and $$\dfrac{\partial S}{\partial \theta} = (0, -y(x) \sin(\theta), y(x) \cos(\theta))$$ +so the surface element $$\mathrm{d}\Sigma = \left| \left(1, y'(x) \cos(\theta), y'(x) \sin(\theta) \right) \times (0, -y(x) \sin(\theta), y(x) \cos(\theta)) \right| \mathrm{d}x \mathrm{d}\theta$$ +i.e. $$y \sqrt{1+(y')^2}$$ + +The integral is therefore $$\int_{0}^{2 \pi} \int_{-1}^1 y \sqrt{1+(y')^2} \mathrm{d}x \mathrm{d}\theta$$ +as required. + +\subsection{Prove the first integral of the Euler-Lagrange equation} +We know the Euler-Lagrange equation $$\dfrac{\partial F}{\partial y} = \dfrac{\mathrm{d}}{\mathrm{d}x} \dfrac{\partial F}{\partial y'}$$ + +Now, $$\frac{\mathrm{d}F}{\mathrm{d}{x}} = \dfrac{\partial F}{\partial y} \dfrac{\mathrm{d}y}{\mathrm{d}x} + \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y'}{\mathrm{d}x}$$ +so substituting Euler-Lagrange into this: +$$\frac{\mathrm{d}F}{\mathrm{d}{x}} = \dfrac{\mathrm{d}}{\mathrm{d}x} \left(\dfrac{\partial F}{\partial y'}\right) \dfrac{\mathrm{d}y}{\mathrm{d}x} + \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y'}{\mathrm{d}x}$$ + +Notice the right-hand side is just what we get by applying the product rule: it is $$\dfrac{\mathrm{d}}{\mathrm{d}x} \left( \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y}{\mathrm{d}x} \right)$$ + +The result follows now by simply integrating both sides with respect to $x$. + +\subsection{Solve the differential equation} + +Just substitute $F(y,y') = y \sqrt{1+(y')^2}$: +$$y\sqrt{1+(y')^2} - y' \left[ \frac{1}{2} y (1+(y')^2)^{-1/2} \cdot 2 y'\right] = c$$ +which can be simplified to $$y (1+(y')^2)^{-1/2} \left[ (1+(y')^2) -(y')^2\right] = c$$ +i.e. $$y^2 - c^2 = (c y')^2$$ + +If $c=0$ then this is trivial: $y=0$. From now on, assume $c \not = 0$; then since $y$ is known to be positive, $c > 0$. + +Invert: $$\frac{c^2}{y^2-c^2} = \left(\frac{dx}{dy}\right)^2$$ +so $$\dfrac{dx}{dy} = \pm \frac{c}{\sqrt{y^2-c^2}}$$ +which is a standard integral: $$x = \pm c \log(y+\sqrt{y^2-c^2}) + K$$ + +Also $y(-1) = 2 = y(1)$, so $$\{1,-1\} = \{c \log(2+\sqrt{4-c^2}) + K, -c \log(2+\sqrt{4-c^2}) + K\}$$ +which means $K = 0$. + +Then $$\exp\left(\pm \frac{x}{c}\right) = y+\sqrt{y^2-c^2}$$ +Since $y(1) = 2$, we have $$\exp(\pm 1/c) = 2+\sqrt{4-c^2}$$ and in particular (since $c>0$) we have the $\pm$ on the left-hand side being positive; that is the expression $c$ is required to satisfy. + +Rearrange: $$y = \frac{c+\exp(2 \frac{x}{c})}{2 \exp\left(\frac{x}{c}\right)}$$ +which completes the question. + +\section{Question 3} +\subsection{Part 1} +We must put one ball into each box. Then we are distributing $n-r$ balls freely among $r$ boxes, so the answer is $$\binom{n-r-1}{r-1}$$ +(standard stars-and-bars result). + +\subsection{Part 2} +Consider the $n$ black balls laid out in a line; we are interspersing the $m$ white balls among them. +Equivalently, we have $n+1$ boxes (represented by the gaps between black balls) and we are trying to put $m$ balls into them. +By stars-and-bars again, the answer is $\binom{n}{m-1}$. + +\subsection{Part 3} +Condition on the colour of the first ball, and write $l$ for the length of the first run. +Then +$$P_{n,m}(r,s) = \frac{n}{n+m} \sum_{l=0}^{n} P_{n-l,m}(r-1, s) + \frac{m}{n+m} \sum_{l=0}^m P_{n, m-l}(r, s-1)$$ +Also $P_{n,m}(0,s) = \chi[n=0] \chi[s=1]$ where $\chi$ is the indicator function, and $P_{n,m}(r,0) = \chi[m=0] \chi[r=1]$. + +\subsection{Part 4} + +\subsection{Part 5} +If $m \leq n$, then the sum is $$\sum_{l=0}^m \binom{n}{l} \binom{m}{m-l}$$ which is the $x^m$ coefficient of the left-hand side and hence of the right-hand side. + +If $m > n$, then the sum is $$\sum_{l=0}^n \binom{n}{n-l} \binom{m}{l}$$ which is the $x^n$ coefficient of the left-hand side and hence of the right-hand side. + +For the second equation: this follows by setting $n \mapsto n-1$ in the above. + +\subsection{Part 6} + +\end{document} \ No newline at end of file diff --git a/YonedaWithoutTears.tex b/YonedaWithoutTears.tex new file mode 100644 index 0000000..d9cb59c --- /dev/null +++ b/YonedaWithoutTears.tex @@ -0,0 +1,256 @@ +\documentclass[11pt]{amsart} + \usepackage{geometry} + \geometry{a4paper} + \usepackage{graphicx} + \usepackage{amssymb} + \usepackage{epstopdf} + \usepackage{mdframed} + \usepackage{hyperref} + +% Reproducible builds +\pdfinfoomitdate=1 +\pdftrailerid{} +\pdfsuppressptexinfo=-1 + + \DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png} + + \newmdtheoremenv{note}{Note} + \newmdtheoremenv{thm}{Theorem} + \newmdtheoremenv{motiv}{Motivation} + \newmdtheoremenv{definition}{Definition} + + \newcommand{\homfrom}[1]{\mathrm{Hom\left(#1, -\right)}} + \newcommand{\homto}[1]{\mathrm{Hom\left(-,#1\right)}} + \newcommand{\Set}{\mathbf{Set}} + \newcommand{\Nat}{\mathrm{Nat}} + + \title{Yoneda Without Tears} + \author{Patrick Stevens} + + \begin{document} + + \maketitle + + \tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/YonedaWithoutTears/YonedaWithoutTears.pdf} \end{center} + + \normalsize + + \section{Introduction} + This document will assume that you are familiar with the notion of a category and a functor. + Ideally you will also be familiar with the idea of a natural transformation and a hom-set. + + This notation varies widely; we will fix one version of it now. + + \ + + \begin{definition} + Fix a locally small category $\mathcal{C}$, and pick an object $A \in \mathcal{C}$. + The \emph{hom-set} $\homfrom{A}$ is defined to be the set $\{f \in \mathrm{mor} \mathcal{C} : \mathrm{dom} f = A\}$. + There is of course a dual: $\homto{A}$ is defined to be $\{f \in \mathrm{mor} \mathcal{C} : \mathrm{cod} f = A\}$. + \end{definition} + + \ + + \label{defn:nattrans} + \begin{definition} + Let $F, G: \mathcal{C} \to \mathcal{D}$ be functors. + A \emph{natural transformation} from $F$ to $G$ is a selection $\alpha$, parametrised over each $X \in \mathcal{C}$, of an arrow $\alpha_X$ in $\mathcal{D}$ from $FX \to GX$. + We require this selection to be ``natural'' in that whenever $f : X \to Y$ is an arrow in $\mathcal{C}$, we have $$FX \xrightarrow{Ff} FY \xrightarrow{\alpha_Y} GY = FX \xrightarrow{\alpha_X} GX \xrightarrow{Gf} GY$$ + \end{definition} + + \ + + \begin{definition} + Let $F, G : \mathcal{C} \to \mathcal{D}$ be functors. + Then the set of natural transformations from $F$ to $G$ is denoted $\Nat[F, G]$. + \end{definition} + + \ + + Then you should be able to parse the statement of the Yoneda lemma, though not necessarily understand it: + + \ + + \begin{thm}[The Yoneda lemma] + Let $\mathcal{C}$ be a category, and let $G: \mathcal{C} \to \Set$ be a functor. + Let $A$ be an object of $\mathcal{C}$. + Then $$\Nat [\homfrom{A} \to G ] \cong G A$$ + and moreover the bijection is natural in both $G$ and $A$. + \end{thm} + + \section{The Relevant Interpretation of ``Category''} + + The main insight comes from asking the question, ``how can I better understand what $\homfrom{A}$ means?''. + + There are many ways to understand what a category is, but the relevant one here is that a category is a description of a many-sorted algebraic theory with unary functions between them. + An object is a placeholder for a type, and an arrow is a placeholder for a function from that type. + + A category is only an abstract description of some types and their interactions. + It's a mistake to think of an object of a category viewed this way as ``being'' $\mathbb{N}$. + We only get to think of the type in this way when we instantiate the theory: when we find a model of it somewhere. + Since sets are the easiest places to work, we will consider set-models only. + + \subsection{Examples} + + \begin{itemize} + \item Any category at all describes a theory which has a model that has only one type which is empty: simply interpret all the type-templates (i.e. objects in the category) as being that set, and all arrows become the identity. + \item Any category at all describes a theory which has a model that has only one type which has one element in: simply interpret all the type-templates (i.e. objects in the category) as being that set, and all arrows again become the identity. + \item Any category with one object (i.e. any monoid when viewed as a category) has a model where there is only one type, and the elements of that type are the elements of the monoid (i.e. the arrows in the category). (Bear this one in mind.) + \item The category with three objects $A, B, C$, with unique non-identity arrows $f, g : A \to B$, $h : B \to C$, $k : A \to C$ has a model where $A$ is represented by $\mathbb{Z}$, $B$ is represented by $\mathbb{N}$, and $C$ is represented by $\mathrm{Bool}$. + Indeed, take $f = |\cdot|$ the absolute value function, $g$ the negation function, and $h$ the ``is even'' function. + \item That category also has a model (which is not a set-model) where $A$ is the type of groups, $B$ is also the type of groups, and $C$ is the type of sets. + Take $f$ to be the identity, $g$ to be the ``construct a group with the same objects but with multiplication reversed'', and $h$ to be ``take the underlying set of the group''. + \item That category has a much more boring set-model: take $A = \{1,2\}$, $B = \{5, 6\}$, and $C = \{1\}$. + Let $f : n \mapsto n+4$, let $g$ be the other bijection, and let $h$ be $n \mapsto 1$. + \end{itemize} + + In general, any category has lots and lots of models. + + \subsection{What is a model anyway?} + + A set-model of the theory is identifying, for each type-description in the category, a set whose elements are representing elements of that type. + It also identifies, for each description of a unary predicate on the types, a function between the types; and the models of the unary predicates have to compose in a way that is reflected in the type-description. + + This is just a functor from $\mathcal{C} \to \Set$! + + So the first key insight is that a functor from $\mathcal{C} \to \Set$ is precisely a model of the theory described by $\mathcal{C}$. + + \subsection{Homomorphisms between models} + + We define a certain restricted form of model homomorphism as follows. + (Note that this is not quite what is usually meant by a model homomorphism, and I have invented the term ``fixed model homomorphism'' to describe it.) + + A \emph{fixed model homomorphism} $\alpha$ is a function from one model $F: \mathcal{C} \to \Set$ of the theory $\mathcal{C}$ to another model $G: \mathcal{C} \to \Set$, which assigns to each type $FA$ of the model $F$ the corresponding type $GA$ of the model $G$, in such a way that $\alpha$ respects the predicates $Ff: FA \to FB$ of the model: + $$FA \xrightarrow{Ff} FB \xrightarrow{\alpha_B} GB = FA \xrightarrow{\alpha_A} GA \xrightarrow{Gf} GB$$ + +Notice that this is a model homomorphism which additionally ensures that $FA$ is always mapped to $GA$ (for any $A$), so (for example) it won't collapse all the objects $FA$ into a single object in $G$'s image unless $G$ is the trivial model. + + You might recognise the definition of a fixed model homomorphism as being the definition of a natural transformation between $F$ and $G$ when viewed as functors. + + So the second key insight is that a natural transformation between functors $F: \mathcal{C} \to \Set$ and $G$ is just a fixed homomorphism between the $\Set$-models $F$ and $G$ of the theory $\mathcal{C}$. + + \section{Free models} + Throughout mathematics, there is the notion of a free object: an object which somehow has the least possible structure while still obeying all the rules it has to obey. + Can we find a free model of the theory represented by the category $\mathcal{C}$? + + Imagine $\mathcal{C}$ has two objects. Then any free model worth its name must have at least two types - otherwise we've definitely lost information in the model. + (The theory said there were two types, so our model had better have at least two types or else it's not a good model of the theory.) + + Likewise, any free model worth its name had better have \emph{at most} two types, since otherwise we've added extra structure that the theory didn't specify. + For concreteness, take our category to be the unique category with two objects and a single arrow from one to the other (and also the identity arrows). + Then if our free model had three types in it, that would be very weird (somehow the model would fundamentally require introducing extra things into the universe if we ever wanted to realise the model). + + So our free model had better have exactly one type for every object in the category. + + Moreover, for the same reasons, every arrow in the category should have exactly one corresponding unary function in the model. + (Any fewer, and we've lost the information that the theory should have some particular predicate; any more and we've somehow got a theory with a predicate that can't be realised without adding an extra function to the universe.) + + An excellent guess for a free model turns out to be the following (called the \emph{term model}): pick some type (i.e. some object in the category). + Let that type have exactly one element, and then chase through all the functions, declaring that everything which is not obviously the same as something we've already made is different. + (By analogy with the construction of the free group on some generators: keep piling together the generators, and declare to be different every word which you haven't obviously already made.) + Declare that there are no other things in the universe: if we haven't constructed some member of a type this way, then that member can't exist. + It's called the term model because we select a type, declare that there is a term of that type, and then see what else is forced to exist. + + \subsection{Examples} + + \subsubsection{Simplest example} + + For example, in the category above with two objects $A$ and $B$, and a single arrow $f$ from $A$ to $B$, we could construct two different models this way. + The first is the $A$-related model: declare that the type $A$ has a single element $a$, and then all the other things in the universe are those constructed from $a$. + (That is, $\mathrm{id}(a) = a$ which we already know about; $f(a) \in B$ which we don't yet know about so we'll note down that this is a new thing in the universe; and then $\mathrm{id}(f(b))$ which does already exist and is $f(b)$). + + The second is the $B$-related model: declare that the type $B$ has a single element $b$, and then all the other things in the universe are those constructed from $b$. + (That is, $\mathrm{id}(b) = b$, and then there are no more ways to create elements because there are no other arrows from $B$.) + + So the two term models we have constructed are: + \begin{itemize} + \item The one based at $A$, which has the type $A$ consisting of a single element, and the type $B$ consisting of a single element. + \item The one based at $B$, which has the type $A$ consisting of no elements at all, and the type $B$ consisting of a single element. + \end{itemize} + + \subsection{More complex example} + + We'll look at an example with three types: $A, B, C$ with arrows $f, g: A \to B$ and $h : B \to C$, as well as two distinct arrows $hf, hg: A \to C$. + One model of this is where $A = \mathbb{N}$, $B = \mathbb{Z}$, $C = \mathrm{Bool}$, and $f$ is the obvious injection $n \mapsto n$, $g$ is the ``negate'' function $n \mapsto -n$, and $h$ is the ``is negative'' function. + + Then there will be three term models: one based at $A$, one at $B$, and one at $C$. + + \begin{itemize} + \item At $A$: we have one element $a \in A$; then two elements of $B$, namely $f(a)$ and $g(a)$; then two elements of $C$, namely $hf(a)$ and $hf(b)$. + \item At $B$: we have one element $b \in B$; then only one element of $C$, namely $h(b)$. + \item At $C$: we have one element $c \in C$ only. + \end{itemize} + + \subsection{A cyclic example} + + Consider the category with two objects $A, B$ only, and arrows $f: A \to B$ and $g: B \to A$ which compose so that $gf = 1_A$ but $fg \not = 1_B$. + Then there are two term models: + + \begin{itemize} + \item At $A$: there is $a \in A$, then $f(a) \in B$, then $gf(a) \in A$ but that is known to be just $a$, so we're done. + \item At $B$: there is $b \in B$, then $g(b) \in A$, then $fg(b) \in B$, then $gfg(b) = g(b) \in A$, and we're done. + \end{itemize} + + \subsection{An infinite example} + + Consider the category which is the monoid $\mathbb{N}$: namely, a single element $A$ and then an arrow $f_i$ for each $i \in \mathbb{N}$ such that $f_i f_j = f_{i+j}$. + + Then there is just one term model, and its elements are $a, f_1(a), f_2(a), \dots$. + + \subsection{General definition of the term model} + + If you have been thinking categorically, you may have noticed that in every example above, if you simply remove the term from everything when we list elements, we end up with a collection of arrows in the category. + For example, in the natural numbers case, we had the elements $a, f_1(a), f_2(a), \dots$; delete the $a$ and we get $\{\mathrm{id}, f_1, f_2, \dots\}$. + And this, crucially, is simply a list of the arrows out of $A$. + + In general, the collection of $\{\text{things which exist in the term model based at an object $A$}\}$ is isomorphic to the set of arrows out of $A$: + that is, it's the hom-set $\homfrom{A}$. + + So the third key insight is that the term models are precisely the hom-sets of the category. + + \section{The Yoneda lemma} + + Now we can understand the Yoneda lemma's statement in the light of these new concepts: + + \ + + \begin{thm}[The Yoneda lemma, in new terms] + Let $\mathcal{C}$ be an algebraic theory with unary predicates, and let $G: \mathcal{C} \to \Set$ be a model of that theory. + Let $A$ be a type in $\mathcal{C}$. + Then the collection of fixed model homomorphisms from the term model based at $A$ into the model $G$ is isomorphic to the set of things of type $A$ in the model $G$. + Moreover, the isomorphism is natural in both the type we chose, and the model we chose. + \end{thm} + + \ + + After a little thought, the existence of the bijection is just obvious. + Indeed, a homomorphism from the term model based at $A$ into any other model $G$ is exactly defined by where $a$ goes: nothing exists in the term model except things which are derived by applying arrows to $a$, + so if we've decided where $a$ goes then we've decided where everything in the term model goes. + Hence for every fixed model homomorphism from $\homfrom{A}$ to $G$, we can canonically define a member of the concrete type $GA$ which is ``where did $a$ end up''. + Conversely, if we're given an element $x \in GA$ of the instantiation of the type $A$ in model $G$, we can canonically define a fixed model homomorphism from $\homfrom{A}$ by sending our abstract term $a$ to $x$, and letting all the rest of the elements of the term model get pulled along with it. + + \subsection{Naturality} + +I'm afraid I don't know of a good way to think about naturality other than just to draw out the diagrams and show they commute; but they're both easy and I can't be bothered to do them right now. + + \section{Relation to the notion of the free model} + + It turns out that the Yoneda lemma can be used to prove that the term models together form a ``free'' collection in some sense. + The terse way to say this is that the Yoneda embedding $A \mapsto \homfrom{A}$ is full and faithful from $\mathcal{C}^{\mathrm{op}} \to [\mathcal{C}, \Set]$. + + In more elementary terms: the conversion from objects to models, given by taking an object and producing the term model, loses no information about the category. + If we select two types $A$ and $B$ in the category $\mathcal{C}$, and take a pair of different arrows $f, g : A \to B$, then these two arrows correspond to a pair of fixed model homomorphisms (natural transformations) between the term models $\homfrom{A}$ and $\homfrom{B}$ (given by ``replace $a$ by $f(a)$'' and ``replace $a$ by $g(a)$'' respectively). + Moreover, the two homomorphisms really are different from each other. + + So given a category (a specification of an algebraic theory), we can produce a specific collection of models for that theory and a specific collection of homomorphisms between the models, such that all the information about the theory can be recovered from the models. + There are loads more models out there, and loads more homomorphisms between those extra models, but if we restrict our attention only to the term models then we recover all the information about the original category. + Moreover, remove any of the models or any of the homomorphisms, and we stop being able to pick out the theory we are modelling uniquely. + + That is, the collection of term models is ``free'': they haven't lost us any information about the theory (we can use them to recover the original category entirely), and nor do they contain any extra information (\emph{every} fixed model homomorphism between term models is required, or else we have lost some information about the category). + + \section{Acknowledgements} + + This entire document is derived from an answer by Sridhar Ramesh on a Math Overflow answer at \url{https://mathoverflow.net/a/15143}. + + +\end{document} \ No newline at end of file diff --git a/pdf-targets.txt b/pdf-targets.txt index 3e91044..76210a8 100644 --- a/pdf-targets.txt +++ b/pdf-targets.txt @@ -1 +1,12 @@ +static/misc/FriedbergMuchnik/FriedbergMuchnik.tex +static/misc/ModularMachines/EmbedMMIntoTuringMachine.tex +static/misc/MonadicityTheorems/MonadicityTheorems.tex +static/misc/AdjointFunctorTheorems/AdjointFunctorTheorems.tex +static/misc/Tennenbaum/Tennenbaum.tex +static/misc/MultiplicativeDetProof/MultiplicativeDetProof.tex +static/misc/ParametricBoundedLoeb2016/ParametricBoundedLoeb2016.tex +static/misc/TokyoEntrance2016/TokyoEntrance2016.tex +static/misc/NonstandardAnalysis/NonstandardAnalysisPartIII.tex static/misc/RepresentableFunctors/RepresentableFunctors.tex +static/misc/YonedaWithoutTears/YonedaWithoutTears.tex +