Add patrickstevens.co.uk PDFs

This commit is contained in:
Smaug123
2022-05-15 11:17:34 +01:00
parent f7da1815b6
commit d8cf76c2f1
12 changed files with 4154 additions and 1 deletions

410
AdjointFunctorTheorems.tex Normal file
View File

@@ -0,0 +1,410 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{tikz-cd}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\newmdtheoremenv{defn}{Definition}
\newmdtheoremenv{thm}{Theorem}
\title{Motivation for the General Adjoint Functor Theorem}
\author{Patrick Stevens}
\date{24th December 2015}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/AdjointFunctorTheorems/AdjointFunctorTheorems.pdf} \end{center}
\normalsize
\emph{You should draw diagrams yourself throughout this document. It will be unreadable as mere symbols.}
\section{Primeval Adjoint Functor Theorem}
Recall the theorem (``RAPL'') that Right Adjoints Preserve Limits.
The PAFT is an attempt at a converse to this theorem.
It states that if $G: \mathcal{D} \to \mathcal{C}$ preserves limits, and $\mathcal{D}$ is small and complete, then $G$ is a right adjoint.
The problem with the PAFT is that it's actually a very weak theorem: small complete categories are preorders.
So we want to weaken that requirement of smallness and completeness.
How do we usually weaken smallness?
The next best thing is local smallness, but we're going to lose something if we just replace ``small'' by ``locally small''.
\section{General Adjoint Functor Theorem}
The question is therefore, ``what requirement do we need to add to augment local smallness?''
Recall the most ``concrete'' definition of a left adjoint to $G: \mathcal{D} \to \mathcal{C}$: it is a specification of $F: \mathcal{C} \to \mathcal{D}$ together with $\eta_A: A \to GFA$ for each $A$, such that any $g: A \to GB$ has a unique $h: FA \to B$ with $(Gh) \circ \eta_A = g$.
\[
\begin{tikzcd}
FA
\arrow[r, dashrightarrow, "h"']
& B
\\
GFA
\arrow[r, "Gh"]
& GB
\\
A
\arrow[u, "\eta_A"]
\arrow[ur, "f"']
&
\end{tikzcd}
\]
This definition is how I remember what an adjoint is, and it very closely parallels the UMP of the free group on a given set.
Now, by Then A Miracle Occurs\footnote{\tiny{\url{https://web.archive.org/web/20070703151645/http://star.psy.ohio-state.edu/coglab/Miracle.html}}}, I define the following Tiny Set Condition:
\
\begin{defn}[Tiny Set Condition] A functor $G: \mathcal{D} \to \mathcal{C}$ has the TSC iff for every object $A \in \mathcal{C}$, there is $B \in \mathcal{D}$ and $\eta_A : A \to GB$ such that every $g: A \to GX$ factors as $A \xrightarrow[\eta_A]{} GB \xrightarrow[Gh]{} GX$ for some $h: B \to X$.
\[
\begin{tikzcd}
B
\arrow[r, "h"']
& X
\\
GB
\arrow[r, "Gh"]
& GX
\\
A
\arrow[u, "\eta_A"]
\arrow[ur, "g"']
&
\end{tikzcd}
\]
\end{defn}
\
Notice how closely this mirrors the definition of an adjoint: it is a very slight weakening of my favourite definition, in that we don't require $h$ to be unique.
In particular, using that definition it is a completely content-free statement that if $G$ has a left adjoint, then it satisfies the TSC: simply take $\eta_A$ to be the unit of the adjunction.
We are assuming access to RAPL, so this gives the following theorem:
\
\begin{thm} A functor $G: \mathcal{D} \to \mathcal{C}$, which has a left adjoint, must satisfy the TSC and must preserve small limits.
\end{thm}
\
\subsection{Relation to the GAFT}
\emph{This is an overview of where we are heading.}
The GAFT talks about what happens if we replace the TSC with a weaker condition:
\
\begin{defn}[Solution Set Condition] A functor $G: \mathcal{D} \to \mathcal{C}$ has the SSC iff for every object $A \in \mathcal{C}$, there is a set $\{ B_i \in \mathcal{D} : i \in I \}$ and $\{ \eta_A^i : A \to G B_i : i \in I \}$ such that all $f: A \to GB$ factor as some $A \xrightarrow[\eta_A^i]{} G B_i \xrightarrow[Gh]{} GB$.
\end{defn}
\
That is, we relax the uniqueness of $B$ in the statement of the TSC.
Then the GAFT states that:
\
\begin{thm}[General adjoint functor theorem] A functor $G: \mathcal{D} \to \mathcal{C}$ (where $\mathcal{D}$ is complete and locally small) has a left adjoint iff it satisfies the SSC and preserves small limits.
\end{thm}
\
\section{Converse to Theorem 1}
We wish to show the following:
\
\begin{thm}[Tiny Adjoint Functor Theorem] If $G: \mathcal{D} \to \mathcal{C}$ has the TSC and preserves small limits, and $\mathcal{D}$ is complete and locally small, then $G$ has a left adjoint.
\end{thm}
\
This will give us a slightly more general version of the PAFT, because we've relaxed smallness of $\mathcal{D}$ and still given an equivalent condition for being an adjoint.
To be clear, what we have done is taken the PAFT, replaced ``small'' with ``locally small'', and imposed the TSC.
The TSC is ostensibly only a slight weakening of the definition of an adjoint, so it's not too much to ask that they would be equivalent.
I will write $FA$ for the object $B$ guaranteed by the TSC; this anticipates the definition of $F$ as a functor.
\subsection{Proof}
Recall the theorem that a specification of a left adjoint is equivalent to a specification of an initial object of $(A \downarrow G)$ for each object $A \in \mathcal{C}$.
There's one obvious choice for such an initial object: $(FA, \eta_A)$.
(Note for the future that this might not actually be initial, but it's the obvious choice.)
Is this actually initial?
\[
\begin{tikzcd}[row sep=large, column sep=large]
A
\arrow[dr, "f"]
\arrow[d, "\eta_A"]
&
\\
GFA
\arrow[r, dashrightarrow]
& GX
\end{tikzcd}
\]
Well, it certainly has an arrow from it into any other $(X \in \mathcal{D}, f: A \to GX)$, because that's just the statement of the TSC: any arrow $f: A \to GX$ factors through the map $\eta_A : A \to GFA$.
Is that arrow unique?
Well, OK, maybe it isn't. The TSC didn't tell us much, after all.
But we are in a complete and locally small category, so what we can do is equalise out all the ways in which the arrow fails to be unique.
\subsubsection{Try and make the arrow unique}
Say $\{ G(h_i): i \in I \}$ is the set of distinct arrows $GFA \to GX$ with $G(h_i) \eta_A = f$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
A
\arrow[dr, "f"]
\arrow[d, "\eta_A"]
&
\\
GFA
\arrow[r, shift left, "G(h_1)"]
\arrow[r, shift right, "G(h_2)"']
& GX
\end{tikzcd}
\]
Let $E$ be the ``industrial-strength equaliser'' of the $h_i$ in $\mathcal{D}$: an arrow $e: E \to FA$ such that $h_i e = h_j e$ for all $i, j$.
Since $G$ preserves limits, $Ge: GE \to GFA$ must be an industrial-strength equaliser of the $G(h_i)$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dr, "f"]
\arrow[d, "\eta_A"]
&
\\
GE
\arrow[r, "Ge"']
& GFA
\arrow[r, shift left, "G(h_1)"]
\arrow[r, shift right, "G(h_2)"']
& GX
\end{tikzcd}
\]
Since $\eta_A$ equalises all the $G(h_i)$, it must lift uniquely over $Ge$: say $\eta_A = G(e) \circ \overline{\eta_A}$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dl, "\overline{\eta_A}"']
\arrow[dr, "f"]
\arrow[d, "\eta_A"]
&
\\
GE
\arrow[r, "Ge"']
& GFA
\arrow[r, shift left, "G(h_1)"]
\arrow[r, shift right, "G(h_2)"']
& GX
\end{tikzcd}
\]
Now, in our dream world, the $G(h_i)$ would all be equal: that is, $Ge$ would be an iso.
Can we find an inverse?
The TSC tells us that there is an arrow $G(\gamma) : GFA \to GE$ such that $G(\gamma) \eta_A = \overline{\eta_A}$.
That is, $G(e) G(\gamma) \eta_A = \eta_A$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dl, "\overline{\eta_A}"']
\arrow[dr, "f"]
\arrow[d, "\eta_A"]
&
\\
GE
\arrow[r, shift left, "Ge"]
& GFA
\arrow[l, shift left, "G(\gamma)"]
\arrow[r, shift left, "G(h_1)"]
\arrow[r, shift right, "G(h_2)"']
& GX
\end{tikzcd}
\]
This is as far as we can go, because of the weakness of the TSC, telling us nothing about the arrows whose existence it guarantees.
\subsubsection{Try and produce an object which works}
We see that $G(e) G(\gamma)$ is a map $GFA \to GFA$, and we wanted it to be the identity.
What we could do is equalise out all the maps $GFA \to GFA$, and that would tell us that $\eta_A$ lifted over the equaliser.
This would produce an object which we really hope would actually have the uniqueness property.
Let $G(i): GI \to GFA$ be the industrial strength equaliser of all maps $r_i: FA \to FA$ such that $G(r_i) \eta_A = \eta_A$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dr, "\eta_A"]
\arrow[d, "\eta_A"]
&
\\
GI
\arrow[r, "G(i)"']
& GFA
\arrow[r, shift left, "G(r_1)"]
\arrow[r, shift right, "G(r_2)"']
& GFA
\end{tikzcd}
\]
Now $G(e \gamma) \eta_A = \eta_A$, so $\eta_A$ lifts over $Gi$ to $\eta_A': A \to GI$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dl, "\eta'_A"']
\arrow[dr, "\eta_A"]
\arrow[d, "\eta_A"]
&
\\
GI
\arrow[r, "G(i)"']
& GFA
\arrow[r, shift left, "G(r_1)"]
\arrow[r, shift right, "G(r_2)"']
& GFA
\end{tikzcd}
\]
Finally, I claim that this is initial. Indeed, it certainly has maps into every $(GX, f)$, because we can just compose $A$'s TSC-map with $G(i)$.
\[
\begin{tikzcd}[row sep=large, column sep=huge]
& A
\arrow[dl, "\eta'_A"']
\arrow[d, "\eta_A"]
\arrow[dr, "\eta_A"']
\arrow[drr, "f"]
&
&
\\
GI
\arrow[r, "G(i)"']
& GFA
\arrow[r, shift left, "G(r_1)"]
\arrow[r, shift right, "G(r_2)"']
& GFA
\arrow[r, "G(h)"']
& GX
\end{tikzcd}
\]
The map is unique: if we had $G(x), G(y): GI \to GX$ such that $G(x)\eta_A' = G(y) \eta_A' = f$, say, then make their equaliser $G(e)$ as above.
$\eta_A'$ lifts over $G(e): GE \to GI$, say to $\eta_A'': A \to GE$.
\[
\begin{tikzcd}[row sep=large, column sep=large]
& A
\arrow[dl, "\eta''_A"']
\arrow[dr, "f"]
\arrow[d, "\eta'_A"]
&
\\
GE
\arrow[r, "G(e)"']
& GI
\arrow[r, shift left, "G(x)"]
\arrow[r, shift right, "G(y)"']
& GX
\end{tikzcd}
\]
By the TSC, there is a map $G(\gamma) : GFA \to GE$ with $G(\gamma) \eta_A = \eta_A''$.
So $G(i) G(e) G(\gamma) \eta_A = \eta_A$.
\[
\begin{tikzcd}[row sep=large, column sep=huge]
& & & A
\arrow[dlll, "\eta'_A"']
\arrow[dll, "\eta_A"]
\arrow[dl, "\eta''_A"]
\arrow[d, "\eta'_A"]
\arrow[dr, "f"]
&
\\
GI
\arrow[r, "G(i)"']
& GFA
\arrow[r, "G(\gamma)"']
& GE
\arrow[r, "G(e)"']
& GI
\arrow[r, shift left, "G(x)"]
\arrow[r, shift right, "G(y)"']
& GX
\end{tikzcd}
\]
Now $G(i)$ is an industrial equaliser of things of this form, so $G(i) G(e) G(\gamma) G(i) = G(i)$; $G(i)$ is monic and so $G(e) G(\gamma) G(i) = 1_{GI}$.
Therefore $G(e)$ is split epic; it's also monic because it's an equaliser, so it's iso.
That is, $G(x), G(y)$ are equal after all, because an equaliser is iso iff the arrows it's equalising are the same.
\subsection{Summary}
We've seen that $G: \mathcal{D} \to \mathcal{C}$ (where $\mathcal{D}$ is complete and locally small) has a left adjoint iff it preserves small limits and has the TSC.
Notice how often in the proof we had lines like ``this thing lifts over this thing'': effectively telling us that the limits we had constructed were genuinely members of $(A \downarrow G)$, so that we could use the TSC.
There is actually a theorem about this, and it follows basically the same pattern as those lines do.
It tells us that limits in the arrow category are basically the same as limits in $\mathcal{D}$.
However, I wanted to stay as concrete as possible here.
\section{More generality}
This condition has told us that ``if you look like you've got a unit of an adjunction, then you really do have a unit''.
That's still quite restrictive, and it turns out to be possible to relax the TSC to the SSC.
Recall that the difference between the TSC and the SSC is just that the SSC asserts the existence of several objects which work, rather than just one.
\
\begin{thm}[GAFT from TAFT] If $G: \mathcal{D} \to \mathcal{C}$ has the SSC and preserves products, and $\mathcal{D}$ has products and is locally small, then $G$ has the TSC.
\end{thm}
\
The proof is by the only way we have of combining objects: that is, by taking products.
If $G$ has the SSC, we construct for each $A$ an object $P = \prod_j B_j$.
Since $G$ preserves products, $GP$ is a product of the $G B_i$.
Define $\eta_A: A \to GP$ componentwise by $\pi_{G B_i} \eta_A = \eta_A^i$.
Now any $f:A \to GB$ factors as $A \xrightarrow[\eta_A^i]{} G B_i \xrightarrow[Gh]{} GB$, because we had a solution set.
Therefore it factors as $A \xrightarrow[\pi_{G B_i} \eta_A]{} G B_i \xrightarrow[Gh]{} GB$; this is already a factorisation through $P$ in the first arrow!
(Of course, the remaining components of $A \to GP$ are defined by anything we like; the solution-set gives us arrows $A \to G B_j$ for $j \not = i$.)
So we are done: given the SSC, we have produced an object $P$ and arrow $\eta_A: A \to GP$ which together witness that $G$ satisfies the TSC.
Combining Theorem 4 with Theorem 3 gives exactly the General Adjoint Functor Theorem.
\end{document}

View File

@@ -0,0 +1,488 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{xparse}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
% Following code for \machine is from http://tex.stackexchange.com/a/305621/103801
\ExplSyntaxOn
\int_new:N\g_patrick_int%
\NewDocumentCommand{\machine}{m}{%
\clist_set:Nn \l_tmpa_clist {#1}
\int_gzero:N \g_patrick_int
\begin{array}{|c|}
\hline
\prg_replicate:nn { \clist_count:N \l_tmpa_clist } {%
\int_gincr:N \g_patrick_int
\clist_item:Nn \l_tmpa_clist { \g_patrick_int } \\
\hline
}
\end{array}
}
\ExplSyntaxOff
% end of \machine code
\newmdtheoremenv{defn}{Definition}
\theoremstyle{remark}
\newtheorem*{remark}{Remark}
\newtheorem*{example}{Example}
\title{Embedding a modular machine into a group}
\author{Patrick Stevens}
\date{21st April 2016}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/ModularMachines/EmbedMMIntoTuringMachine.pdf} \end{center}
\normalsize
\section{Introduction}
This document was born from a set of lectures I attended, given by Dr Maurice Chiodo
in his half of the Part III Maths course \emph{Infinite Groups and Decision Problems}, which he lectured jointly with Dr Jack Button in 2016.
The treatment of modular machines was a bit heavy in text, and they are actually fairly intuitive objects, so I decided to expand on them here.
A warning for those who are following the official course notes: I accidentally interchanged $L$ and $R$ in my definition of a modular machine below, relative to the official notes.
It doesn't change any of the ideas, and is just a different labelling convention, but by the time it was pointed out to me, most of the document was written, and I can't be bothered to fix it now.
Use at your own risk.
Mistakes in this article are due entirely to me; thanks to Joshua Hunt and Daniel Zheng for catching some before this went to press.
\section{What is a modular machine?}
A modular machine is a Turing-equivalent form of computation.
They operate on two tapes, which are each bounded at one end.
Each cell of a tape may be filled with an integer from $0$ to $m-1$ inclusive; this is where the name ``modular'' comes from.
$$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, a_n} \ \machine{\ , b_1, b_2, \vdots, b_{k-1}, b_k}$$
The machine is considered to have a ``head'' which is looking at the bottom two cells (which are on the edge of the tape); in this instance, the head is looking at $a_n$ and $b_k$.
The machine also has a list of instructions, of the form $$(\alpha, \beta, (x, y), L)$$ or $$(\alpha, \beta, (x,y), R)$$
These instructions are constantly active, and the machine just does whichever it can at each stage.
To ensure that the machine only has one thing it can do at any time,
we insist any $\alpha, \beta$ may only have one instruction in which they appear in the first two places.
For example, if $(1,2,(3,3),L)$ is an instruction, then $(1,2,(5,2),R)$ cannot also be an instruction.
Since $\alpha, \beta$ will appear on the tape during execution, we require that $0 \leq \alpha, \beta < m$;
since $x, y$ will be written onto the tape during execution, we require that $0 \leq x, y < m$.
When the head sees entries $(\alpha = a_n, \beta = b_k)$ and it has the instruction $$(a_n, b_k, (x, y), L)$$
it executes the following procedure:
\begin{enumerate}
\item Shift the left-hand tape up one: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, a_n, \ } \ \machine{\ , \ , b_1, b_2, \vdots, b_{k-1}, b_k}$$
\item Write $x, y$ into the left-hand tape's bottom cells: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, x, y} \ \machine{\ , \ , b_1, b_2, \vdots, b_{k-1}, b_k}$$
\item Shift the right-hand tape down one: $$\machine{a_1, a_2, a_3, \vdots, a_{n-1}, x, y} \ \machine{\ ,\ , \ , b_1, b_2, \vdots, b_{k-1}}$$
\end{enumerate}
If, instead, the instruction was $$(a_n, b_k, (x, y), R)$$ then the same procedure would be carried out, but with right and left interchanged.
This would result in the final state $$\machine{\ , a_1, a_2, a_3, \vdots, a_{n-1}} \ \machine{b_1, b_2, \vdots, b_{k-1}, x, y}$$
\begin{remark}
The very convenient thing about modular machines is that they can be easily coded into numbers.
All the state can be tracked with just two numbers: $$(A, B) := (\sum_{i=0}^n a_i m^{n-i}, \sum_{i=0}^k b_i m^i)$$
and, for instance, the operation $(\alpha, \beta, (x,y), L)$ produces $$([A-(A \mod{m})]m + xm+y, [B-(B \mod{m})]/m)$$
We could even collapse $(x, y)$ into a single integer $xm+y$ which is less than $m^2$.
\end{remark}
\section{Turing equivalence}
We will take our Turing machines to be in ``quintuple form'': they consist of a list of instructions of the form $$(q, a, a', q', L/R)$$
where:
\begin{itemize}
\item $q$ is the current state
\item $a$ is the symbol under the head
\item $a'$ is the symbol to write to the tape
\item $q'$ is the state to move to
\item $L/R$ is the direction the head moves after this instruction executes.
\end{itemize}
Note that the machine always writes and always moves on every execution step.
We may implement a Turing machine as a modular machine as follows.
\
\begin{defn}[Instantaneous description]
An \emph{instantaneous description} of a Turing machine is a string of the form
$$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$
where $s_i$ are the symbols written on the alphabet, $q$ is the state the machine is currently in, and $a$ is the symbol under the head.
It captures completely the state of execution at a given instant in time.
\end{defn}
\
We will implement the instantaneous description $s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$ as both of two possible modular machine states:
$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, q} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, a}$$
or
$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, a} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, q}$$
It will soon become clear why we want to be able to use both these states.
While the modular machine can only ever occupy one of the states, the Turing machine it's emulating will be in the same state whichever
of the two the modular machine happens to be in.
Define the modulus $m$ to be the number of Turing-machine states, plus the number of Turing-machine symbols, plus $1$; this is just to
make sure we have plenty of symbols to work with, and can store all the information we need in any given cell of the tape.
How can we express a Turing-machine instruction?
Remember, they are one of the two forms
\begin{itemize}
\item $(q, a, a', q', L)$, which would convert $$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$ to $$s_1 s_2 \dots s_{k-1} q' s_k a' s_{k+2} \dots s_r$$
\item $(q, a, a', q', R)$, which would convert $$s_1 s_2 \dots s_k q a s_{k+2} \dots s_r$$ to $$s_1 s_2 \dots s_k a' q s_{k+2} \dots s_r$$
\end{itemize}
Therefore, taking our correspondence between Turing-machine instantaneous descriptions and internal states of a modular machine,
we need the instruction $(q, a, a', q', L)$ to take
$$\machine{s_1, s_2, \dots, s_{k-1}, s_k, q} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, a} \mapsto
\machine{\ ,s_1, s_2, \dots, s_{k-1}, q'} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', s_k}$$
or to $$\machine{\ ,s_1, s_2, \dots, s_{k-1}, s_k} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', q'}$$
Now it is clear why we needed the two possible representations of a single Turing-machine instantaneous description:
only the second of the above transitions is easy to implement in the modular machine,
but it has swapped $q$ from the left-hand register to the right-hand register.
This is perfectly kosher, but only because we were careful to state that the current-state and current-symbol
letters were interchangeable in the modular machine.
It can be performed using the modular machine instruction $(q, a, (a', q'), R)$.
Similarly, since we might have started this whole affair with the other representation of the TM instantaneous description,
we need to do the same with $$\machine{s_1, s_2, \dots, s_{k-1}, s_k, a} \ \machine{\ , s_r, s_{r-1}, \dots, s_{k+2}, q}$$
which, by modular machine instruction $(a, q, (a', q'), R)$ is taken to
$$\machine{\ , s_1, s_2, \dots, s_{k-1}, s_k} \ \machine{s_r, s_{r-1}, \dots, s_{k+2}, a', q'}$$
Notice, as an aside, that the Turing-machine head was moving left, and the modular-machine ``head'' symbol $q'$ has ended up on the right-hand tape
whichever of the two representations of the instantaneous description we used.
We can do the same for the Turing machine instructions which involve moving rightwards.
\section{Summary}
If we take our Turing machine states $(q, a, a', q', L/R)$ and, for each one, create a pair of modular machine instructions
$(a, q, (a', q'), L/R)$ and $(q, a, (a', q'), L/R)$, we end up with a modular machine that precisely emulates the Turing machine.
We could represent the pair $(a', q')$ as a single integer which is between $0$ and $m^2$: namely, by taking $a'm+q'$.
This makes the strings a bit shorter, but not as comprehensible.
\
\begin{defn}[Halting set]
We define the ``halting set'' of a modular machine to be the collection of states of the tape from which, when the machine runs,
we eventually end up with both tapes zeroed out.
For this to be a sensible definition, we want the machine not to have an instruction corresponding to head-states $(0,0)$, so that the machine really does stop eventually if it started in a halting state.
\end{defn}
\section{Embedding a modular machine into a group}
What we seek now is a way to embed a MM into a group.
A MM has two pieces of state: the left-hand tape and the right-hand tape.
These can be easily coded as integers.
\subsection{How could we apply a machine instruction?}
Let's imagine we have a way of representing the state $(a,b)$ as a group word in some group: $t(a, b)$.
What we now want is a way of applying a transformation to obtain the different word $t(a', b')$ which corresponds to
executing the MM instruction $(\alpha, \beta, (x, y), L/R)$.
A very good way of applying reversible transformations is to conjugate, so let's add lots of letters to the group:
one $r_i$ for each instruction $(\alpha, \beta, (x,y), L)$, and one $s_i$ for each $(\alpha, \beta, (x,y), R)$.
Conjugating by the letter $r_i$ will apply the $i$th $L$-instruction.
The required effect is $$r_i t(\alpha + m u, \beta + m v) r_i^{-1} = t(xm+y+m^2u, v)$$
\subsection{How do we store the states?}
The next idea is that since our states are merely integers, we might store them as exponents of a group generator:
$x^a y^b$ where $a$ is the left-hand tape and $b$ the right-hand.
In this scheme, it'll be easier if we allow $x$ and $y$ to commute, since all we care about is their exponents.
Now, it will be convenient to introduce a third letter, $t$, which will let us store separately the ``head'' and the ``body'' of the tapes.
Our storing scheme will be $$y^{m v} x^{mu} (x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-m v}$$
which we will denote $t(\alpha+m u, \beta+m v)$,
corresponding to the tape which has $\alpha, \beta$ as the two heads, and then $u, v$ as the data on the rest of the tape.
So we define $$K := \langle x, y, t \mid xy=yx \rangle$$
\begin{remark}
Notice that the only words in $K := \langle x, y, t \mid xy=yx \rangle$ which can be expressed in this form are the words $y^a x^b t x^{-b} y^{-a}$.
Therefore it makes sense to define a subgroup $$T := \langle t(r, s) : r, s \in \mathbb{Z} \rangle$$
which consists of ``all possible machine states''.
\end{remark}
\subsection{How do the machine instructions work?}
How does $r_i$ act, then?
$(\alpha, \beta, (p,q), L)$ needs to take $$t(\alpha + m u, \beta + m v) = y^{mv}x^{mu} (x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-m v}$$
to $$t(pm+q+m^2u, v'm + \nu) = y^{v'm} x^{(um+p)m}(x^{q} y{^\nu} t y^{-\nu} x^{-q}) x^{-(um+p) m} y^{-v' m}$$
where we are writing $v = v'm + \nu$.
More concretely,
$$y^{v'm^2+\nu m} x^{m u}(x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}) x^{-m u} y^{-v'm^2-\nu m}$$
maps to $$y^{v' m} x^{um^2+pm}(x^{q} y{^\nu} t y^{-\nu} x^{-q}) x^{-um^2-p m} y^{-v' m}$$
Notice that this is exactly performed by the map $x^m \mapsto x^{m^2}, y^m \mapsto y, t(\alpha, \beta) \mapsto t(q+p m, 0)$.
How can we make that well-defined (since we clearly can't send $y^m$ to $y$ without messing up the map of $t(\alpha, \beta)$)?
The trick is to forget that we're working with $x,y,t$, and start working with $t(\alpha, \beta)$, $x^m$, $y^n$ as atomic blocks.
Define a new subgroup $$K_{\alpha, \beta}^{M, N} := \langle \overline{t(\alpha, \beta)}, \overline{x}^M, \overline{y}^N \rangle \leq K$$
Then the map $$\phi_i: t(\alpha,\beta) \mapsto t(q+pm, 0), x^m \mapsto x^{m^2}, y^m \mapsto y$$
would do what we want.
What is the domain and range of that map?
If we view it as being $K_{\alpha, \beta}^{m,m} \to K_{q+pm, 1}^{m^2, 0}$, then it's actually an isomorphism:
it's just matching up the generators of the respective groups.
OK, we have a map which we want to apply whenever we see $r_i w r_i^{-1}$.
The way we can do that is to create an HNN extension:
take $K *_{\phi_i}$ with stable letter $r_i$.
We can do the whole lot again with $\psi_i$ corresponding to $s_i w s_i^{-1}$, which performs an $R$ instruction (where $\phi$ performed an $L$ instruction).
\subsection{How do we turn this all into a group?}
We've got all these groups floating around; to specialise to a group in which we can only be dealing with machine states,
consider $$T'_{\mathcal{M}} := \langle \overline{t(\alpha, \beta)}: (\alpha, \beta) \in H_0(\mathcal{M}); \overline{r_i}: i \in I; \overline{s_j}: j \in J \rangle \leq K *_{\phi_i; \psi_j}$$
where $H_0$ refers to the halting set of the modular machine $\mathcal{M}$.
If we take an element $\overline{t(\alpha, \beta)}$ of $T'_{\mathcal{M}}$ which contains no $r_i$ or $s_j$, it reduces to the word $t$ in this group if and only if, when applying $r_i$'s and $s_j$'s, we end up in the halting set of the modular machine:
the HNN extension has quotiented out by the relation ``conjugating by $r_i$ applies effect $\phi_i$, which moves the modular machine according to the $i$th $L$-instruction''.
The word $t$ is precisely symbolising the empty tape.
Of course, we could have lots of unused instructions or parts of instructions floating around: the group word $r_i t$ still has empty tape, so is still a halting state.
\subsection{Subgroup membership to word problem}
We have constructed a subgroup $\langle t \rangle' := \langle t, r_i, s_j \rangle$ such that $(\alpha, \beta)$ is a halting state of the modular machine if and only if $t(\alpha, \beta)$ lies in that subgroup.
Using an HNN extension, we can make a group where we just need to check equality of words:
create the group $$G_{\mathcal{M}} := \langle K*_{\phi_i, \psi_j}; k \mid khk^{-1} = h \ \forall h \in \langle t \rangle' \rangle$$
Then conjugating an element by $k$ does nothing precisely when that element was in $\langle t \rangle'$: that is, precisely when the modular machine halts from that starting state.
\subsection{What have we missed out?}
There are many checks to be done along the way.
Our HNN extensions need to be well-defined.
The $\phi_i, \psi_j$ need to be iso.
$G_{\mathcal{M}}$ is in fact finitely presented, but we need to show that.
However, once those checks are done, we have produced an explicit recipe for constructing a finitely presented group which can simulate a given arbitrary modular machine.
\section{Higman's construction}
It turns out that, somewhat shockingly, there is a beautiful way to use the above to embed a recursively-presented group $C = \langle X \mid R \rangle$ into a finitely-presented group.
Think of the earlier construction as telling us how to execute a Turing machine inside a group.
Then, morally, we do the following.
\begin{enumerate}
\item Take the machine that halts precisely on members of $R$;
\item Embed it into a group $G$;
\item Glue $G$ onto $C = \langle X \mid R \rangle$ in such a way that we can use the machine-part, $G$, to decide which reductions to make (rather than querying the infinite set $R$).
\end{enumerate}
Of course, the construction is rather complicated, but the upshot is that the finite presentation which defines $G$ can be used to capture all the information that lies in the infinite relator-set $R$.
\subsection{General approach}
We will make a (large!) group whose elements include what I will call ``collections'', which have the following structure:
\begin{enumerate}
\item machine state (for those following the course notes, this is $K_{\mathcal{M}}$ in Step 13 of Construction 11.2)
\item word under consideration (this is $\langle b_1, \dots, b_n \mid \cdot \rangle$)
\item group element that word corresponds to (this is $\overline{C}$)
\item marker (this is $d$)
\end{enumerate}
The group $\langle X \mid R \rangle$ embeds in the third component (``group element that word corresponds to'').
The ``machine state'' section will be implemented as $K_{\phi_i; \psi_j}$ from earlier (which, recall, was finitely presented).
The ``word under consideration'' will be how we extract information from the ``machine state''.
The ``marker'' serves no purpose for interpretation, but it turns out to be an important fiddly detail in the construction.
\subsection{Construction}
We are going to use a whole lot of HNN extensions to manipulate collections.
\subsubsection{Which modular machine to use?}
$$\machine{c_{i_n}, \dots, c_{i_2}, c_{i_1}} \ \machine{0,\dots,0,0}$$
Take a modular machine such that, if we start the tape with the above state, we halt with an empty tape if and only if $c_{i_1} \dots c_{i_n}$ is in $R$.
(Recall that our recursive presentation is $C = \langle X \mid R \rangle$, and this is the group we want to embed.)
It is possible to do this (for those following the course notes, this is steps 1 through 5): symmetrise the generators if necessary
so that each generator $c \in X$ has an inverse $c^{-1}$ and a relator $c c^{-1} = e$.
Then make a Turing machine that enumerates the trivial words in this new presentation (which is, in spirit, the same as the old presentation;
it certainly defines the same group).
Convert that Turing machine into a modular machine, where each $c$ or $c^{-1}$ of the group corresponds to a cell-state $a_c$ or $a_{c^{-1}}$.
Once we've got that modular machine, we can embed it into a group using what's happened already, although we're actually only going to need $$K_{\mathcal{M}} := K *_{\phi_i, \psi_j}$$
which, recall, is the group which holds an MM state as the word $x^{\alpha} y^{\beta} t y^{-\beta} x^{-\alpha}$ which can also have the MM instructions directly applied to it, by conjugating with the stable letters $r_i, s_j$ respectively to apply the $i$th Left-instruction or $j$th Right-instruction.
\subsubsection{Creating a collection}
We're only interested in certain modular machine states: namely, those corresponding to $t(\alpha, 0)$ for certain $\alpha$.
(Recall that $t(\alpha, \beta)$ is our notation for how the group $K_{\mathcal{M}}$ stores the current state $(\alpha, \beta)$ of the MM.)
That is, we're only interested in how the modular machine behaves when we start it off with input $\alpha = \sum_{i=0}^r c_{k_i} m^i$, say:
equivalently, when we ask it the question ``Is $c_{k_0} \dots c_{k_r}$ in the relating set of $C$?''.
So, while other MM-states appear encoded in our $K_{\mathcal{M}}$---for example, the state corresponding to ``starting with $5$ on the left-hand tape and $2$ on the right-hand, perform the fourth left-instruction in the list of possible instructions''---our remaining manipulations to the group will all refer to $t(\alpha, 0)$ directly.
That is, our remaining manipulations will ignore non-interesting MM states.
Given an MM state $t(\alpha, 0)$, we unpack it into a collection by conjugating with a new stable letter, $p$, and taking an HNN extension.
(In the course notes, this is steps 17 and 18.)
The extension will take $t(\alpha, 0)$ and unpack it into the word $$[t(\alpha, 0), w_{\alpha}(b)]$$
where $w_{\alpha}(b)$ simply means ``take the word which is currently loaded into the MM's memory, as its left-hand tape where the right-hand tape is $0$, and write it down with $b$'s in the abstract''.
\begin{example}
If $\alpha = c_3 + c_7 m + c_1 m^2$, we would have $w_{\alpha}(b) = b_3 b_7 b_1$, a word in the abstract group $\langle b_1, \dots, b_n \mid \cdot \rangle$.
I'm playing fast and loose with the ordering here; I may mean $b_1 b_7 b_3$, but I can't be bothered to check which is right.
\end{example}
To recap, the unpacked word $[t(\alpha, 0), w_{\alpha}(b)]$ has two components so far, then: the modular machine state $t(\alpha, 0)$, and an abstract statement $w_{\alpha}(b)$ of the word we're asking about.
I'll add a third component: the ``marker'' letter $d$, so our unpacked word actually has three components and looks like $$[t(\alpha, 0), w_{\alpha}(b), d]$$
We obtained the unpacked word by conjugating $t(\alpha, 0)$ by a new stable letter $p$, and taking an HNN extension which adds the relators $$p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$$
for each $\alpha$.
For reasons which will be important later, I added the $d$ at this point: an end-marker, sitting after the $w_{\alpha}(b)$.
I can't motivate its presence in this section, but eventually we will start appending things to $w_{\alpha}(b)$, and it will become vital to know where the word ends.
We use the marker $d$ for that.
\begin{remark}
This is an awful lot of relators.
Infinitely many, in fact!
Don't panic; we will eventually show that we can actually replace most of them with a finite number of relators.
We haven't actually tied the behaviour of the machine to the manipulation of collections yet; only expanded a machine into a collection.
\end{remark}
Later on, we will add one more component (our last one) to the collection.
It will be a bona fide word on $C$'s generators, and it will be the word which $\alpha$ represents.
We can do this by insisting that $b_i c_j = c_j b_i$ in general, and by doing another HNN extension,
which will have the effect of taking $b_j$ to $b_j c_j$.
That is, $b_3 b_2 \mapsto b_3 c_3 b_2 c_2 = b_3 b_2 c_3 c_2$;
then we view the $b$ chunk and the $c$ chunk as being distinct, forming the second and third components of our four-component collection respectively.
Formally, in a little while we will add a stable letter $V$ and add the relators that $V b_j V^{-1} = b_j c_j$
(and that $V a V^{-1} = a$ for most of the other possible $a$ -- for example, for $a = t$ or $a = r_i$).
In the course notes, $\psi_+$ adds this component to the collection.
So our final collection will be $$[t(\alpha, 0), w_{\alpha}(b), w_{\alpha}(c), d]$$
However, we don't bother adding this yet.
In essence, the original three components of the collection are where the magic happens;
but because $w_{\alpha}(b)$ is a ``purely syntactic'' readout of the machine $t(\alpha, 0)$,
we won't actually have an embedded copy of $C$ in the group unless we add one.
So as the last step of our entire construction, we will put in this fourth component that ``evaluates the word $w_{\alpha}(b)$ as an element of $C$''.
\subsubsection{Tying the $w_{\alpha}(b)$-component to the machine component}
The real magic happens at this point.
Everything we've done so far has added only finitely many relators, \emph{except} $$p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$$
Recall that this was unpacking a machine (with a word loaded onto its tape) into a pair of (the machine, the word).
How can we specify this with only finite amounts of information?
Well, the word is taken over a finite alphabet $b_1, \dots, b_n$,
so we'd be done if, starting from an empty-tape machine, we had a way of loading a word onto the tape of the ``machine'' component of the collection one letter at a time, and at the same time appending the word to the ``word'' component of the collection one letter at a time.
This will involve reaching down into the implementation of the machine, which (recall) is currently being held as one component of a collection;
the collection itself is just a word over a particular alphabet which we haven't yet specified (we can extract it at the end).
Define a bunch of HNN extensions, one for each $b_i$, taking $t(\alpha, 0)$ to $t(\alpha m + c_i, 0)$
and $w_{\alpha}(b)$ to $w_{\alpha}(b) \cdot b_i$.
Here is where we need $d$: we need to know where the end of $w_{\alpha}(b)$ is, so that we can append something to it.
Formally, define stable letters $U_i$ such that:
\begin{itemize}
\item $U_i b_r U_i^{-1} = b_r$ (dealing with the $w_{\alpha}(b)$-chunk)
\item $U_i d U_i^{-1} = b_i d$ (appending $b_i$ to the abstract word)
\item $U_i x U_i^{-1} = x^m$ (shifting the left-hand tape of the machine up by one, to make room for the new symbol)
\item $U_i t U_i^{-1} = x^i t x^{-i}$ (filling that new empty slot with the required state)
\end{itemize}
The miracle of what we have just done is that we can implement all the infinitely-many $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$ in terms of these finitely-many new relators!
To obtain the effect of $p t(\alpha, 0) p^{-1}$ where $\alpha$ represents the word $c_3 c_6$, just conjugate the collection $ptp^{-1} = [t(0,0), \text{empty word}, d]$ in turn by $U_3$ and $U_6$.
\subsubsection{Adding an embedded copy of $C$}
So far, we have been dealing syntactically with symbols $b_i$ which represent the generators of $C$.
But we haven't actually got an embedded copy of $C$ yet, or at least we haven't obviously got one.
What we need is a way of evaluating $w_{\alpha}(b)$ to obtain $w_{\alpha}(c) \in C$.
That's easy, though: do one HNN extension that will have the effect of taking $b_i$ to $b_i c_i$, and insist that the $b_i, c_j$ all commute.
Formally, add a single stable letter $V$ such that:
\begin{itemize}
\item $V b_j V^{-1} = b_j c_j$
\item $V d V^{-1} = d$ (so the marker is unchanged)
\item $V t V^{-1} = t$, $V r_i V^{-1} = r_i$, $V s_j V^{-1} = s_j$ (so the $K_{\mathcal{M}}$ component is unchanged)
\item $V p V^{-1} = p$ (so that $V$'s unpacking won't do anything until we have explicitly performed $p$'s unpacking)
\end{itemize}
and add (finitely many) relators $b_i c_j = c_j b_i$.
(This HNN extension is $\psi^+$ in the course notes.)
\begin{remark}
One might wonder why we don't need $V$ to commute with the $U_j$ (which, recall, load letters onto the tape of the machine).
The answer is that we only want to do the unpacking right at the end of the procedure:
at the ``load letters onto tape'' stage, we don't do any unpacking.
Therefore, in our final uber-group which encodes all of Higman's construction (whose elements are words which are collections),
there may be strange fragments of instructions floating around, like in $V t(\alpha, 0) w_{\alpha}(b) d$ where $V$ is ``half of an instruction''.
We just let these hang around, and ignore them: Britton's lemma on HNN extensions tells us that none of these are equal to words in which
the instructions have all been fully executed (that is, in which no $V$'s appear).
While all this junk is still sitting around in the group, it doesn't interfere with the interesting part of the group.
The situation is analogous to the embedding of a modular machine into a group, where we ignore any chunks of half-instruction like $r_i x^a y^b t y^{-b} x^{-a}$ (where $r_i$ is an un-completed instruction) because they don't interfere with what we're really doing.
\end{remark}
Consider that a word $w_{\alpha}$ evaluates in $C$ to something trivial if and only if the modular machine halts in the zero state from $t(\alpha, 0)$.
(That's how we defined the modular machine.)
The modular machine halts in the zero state from $t(\alpha, 0)$ if and only if conjugating by $r_i, s_j$ in some order causes $t(\alpha,0)$ to be taken to the single letter $t$.
That is, if and only if the machine, starting from state $t(\alpha, 0)$, eventually reaches the state that is the single letter $t$.
But because $V$ commutes with the $r_i$ and $s_j$ (that is, the ``instructions'' in $K_{\mathcal{M}}$ telling it to execute the appropiate machine instructions), we have $$r^i s^j V t(\alpha, 0) V^{-1} s^{-j} r^{-i} = V r^i s^j t(\alpha, 0) s^{-j} r^{-i} V^{-1} = V t(0, 0) V^{-1} = t$$ if the machine halts on the zero state through applying instructions $r_i, s_j$.
That is, $$V t(\alpha, 0) V^{-1} = r^{-i} s^{-j} t r^i s^j$$ which, since the machine is deterministic (and we're undoing the instructions that took us from $t(\alpha, 0)$ to $t(0,0)$), is $t(\alpha, 0)$.
Otherwise, if the machine doesn't halt through $r_i, s_j$, there are some interfering $x$'s left at the end (we end up with $V t(a, b) V^{-1} = V x^i y^j t y^{-j} x^{-i} V^{-1}$) so $V$ can't cancel.
The upshot is that conjugating the three-element collection $[K_{\mathcal{M}}, w_{\alpha}(b), d]$ by $V$ unpacks into a collection $$[K_{\mathcal{M}}, w_{\alpha}(b), w_{\alpha}(c), d]$$ without any $V$'s if and only if $w_{\alpha}(c)$ is trivial as a member of $C$.
\begin{example}
Suppose $$C = \mathbb{Z}_2 = \langle c_1 \mid c_1^2 = e \rangle$$
Expand the presentation to $$\langle c_1, c_2 \mid c_1 c_2 = e, c_1^2 = e, c_1^2 c_2^2 = e, \dots \rangle$$
where the ellipsis indicates every trivial word.
Let us examine $c_1^2$, which is trivial; it is coded into the machine-component of the group as $t(1+m, 0)$.
There is a sequence of $r_i, s_j$ such that conjugating $t(1+m, 0)$ by $r_i s_j$ results in $t(0,0)=t$.
To make this more readable, let's say $r_i s_j$ are precisely the two instructions we need to use to do this.
Now, $V t(1+m, 0) V^{-1}$ is equal to $$V r_i s_j t(0,0) s_j^{-1} r_u^{-1} V^{-1}$$
But $V$ commutes with all those terms, so it is just $t(1+m, 0)$.
Hence $V t(1+m, 0) b_1^2 d V^{-1}$ is $$t(1+m, 0) b_1^2 c_1^2 d$$
because of the effect of $V$ on the $b_i$ terms.
On the other hand, $V t(1+m, 0) b_1^2 d V^{-1}$ is $$V p t(1+m, 0) p^{-1} V^{-1}$$
and $V$ commutes with $p$, so it comes to $$p t(1+m, 0) p^{-1} = t(1+m, 0) b_1^2 d$$
Therefore $$t(1+m, 0) b_1^2 c_1^2 d = t(1+m, 0) b_1^2 d$$
so $c_1^2$ is trivial.
\end{example}
\subsection{List of generators}
As a recap, here is a list of all the relators (HNN or otherwise) of the uber-group we have made.
\begin{itemize}
\item $r_i t(\alpha, \beta) r_i^{-1} = t(\text{new machine state})$ for each $L$-machine-instruction
\item $s_j t(\alpha, \beta) s_j^{-1} = t(\text{new machine state})$ for each $R$-machine instruction
\item $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$, the instructions for unpacking a machine into a machine with its word (all these can be made by other instructions)
\item $p t p^{-1} = t d$ (a single special case of the above)
\item $U_i b_r U_i^{-1} = b_r$ (when loading $i$ onto the tape, deals with the $w_{\alpha}(b)$-chunk)
\item $U_i d U_i^{-1} = b_i d$ (when loading $i$ onto the tape, appends $b_i$ to the abstract word)
\item $U_i x U_i^{-1} = x^m$ (when loading $i$ onto the tape, shift the left-hand tape of the machine up by one, to make room for the new symbol)
\item $U_i t U_i^{-1} = x^i t x^{-i}$ (filling that new empty slot with the required state)
\item $V b_j V^{-1} = b_j c_j$ (unpacking a syntactic word into its represented element)
\item $V d V^{-1} = d$
\item $V t V^{-1} = t$, $V r_i V^{-1} = r_i$, $V s_j V^{-1} = s_j$ (so the $K_{\mathcal{M}}$ component is unchanged)
\item $V p V^{-1} = p$
\item $b_i c_j = c_j b_i$
\end{itemize}
There are finitely many of all of these, except the $p t(\alpha, 0) p^{-1} = t(\alpha, 0) w_{\alpha}(b) d$ which can be made by other instructions.
\end{document}

171
FriedbergMuchnik.tex Normal file
View File

@@ -0,0 +1,171 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\newmdtheoremenv{defn}{Definition}
\newmdtheoremenv{thm}{Theorem}
\newmdtheoremenv{motiv}{Motivation}
\title{Friedberg-Muchnik theorem}
\author{Patrick Stevens, with tip of the hat to Dr Thomas Forster}
\date{5th February 2016}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/FriedbergMuchnik/FriedbergMuchnik.pdf} \end{center}
\normalsize
\section{Introduction}
We consider Turing machines which query an oracle $O$: that is, they come equipped with an extra instruction ``call the oracle with this input'', where a call to the oracle is simply a test for membership in $O$.
We may supply different oracles to the same Turing machine, and potentially get different results: for example, the Turing machine which has as its only instruction ``output the result of calling the oracle on my input'' precisely mimics the oracle.
Recall that a set $A$ is \emph{semi-decidable} if there is a Turing machine $T$ such that for all $n \in \mathbb{N}$, $T(n)$ halts iff $n \in A$.
\
\begin{thm}[Friedberg-Muchnik theorem] There are semidecidable sets $A$ and $B$ such that for all Turing machines $T$ which may query an oracle, the following fail to be equivalent:
\begin{enumerate}
\item $T$-querying-$B$ doesn't halt with output $0$
\item $T$-querying-$B$ has input in $A$
\end{enumerate}
and the following fail to be equivalent:
\begin{enumerate}
\item $T$-querying-$A$ doesn't halt with output $0$
\item $T$-querying-$A$ has input in $B$
\end{enumerate}
\end{thm}
\
That is, we can find two semidecidable sets $A$ and $B$ such that neither allows a Turing machine to decide the other, where by ``$T$ decides $A$'' we mean ``$T$ halts and outputs $0$ iff its input is not in $A$, and it halts and outputs $1$ iff its input is in $A$''.
(Equivalently, $T$ is a machine which computes the characteristic function of $A$.)
\section{Proof}
We can enumerate all the Turing machines which call an oracle; write $[ n ]^X$ for the $n$th Turing machine in the enumeration, calling oracle $X$.
What would it mean for the Friedberg-Muchnik theorem to hold?
We would be able to find $e_n$ (resp. $f_n$) that witness in turn that the $n$th Turing machine doesn't manage to decide $A$ in the presence of $B$ (resp. $B$ in the presence of $A$).
That is, it would be enough to show that:
\
\begin{thm} There are semidecidable sets $A, B$ such that for all $n \in \mathbb{N}$:
\begin{itemize}
\item there is $e_n \in \mathbb{N}$ such that \begin{itemize} \item $e_n \in A$ but $[n]^B(e_n)$ halts with output $0$, or \item $e_n \not \in A$ but $[n]^B(e_n)$ fails to halt, or halts at something other than $0$ \end{itemize}
\item there is $f_n \in \mathbb{N}$ such that \begin{itemize} \item $f_n \in B$ but $[n]^A(f_n)$ halts with output $0$, or \item $f_n \not \in B$ but $[n]^A(f_n)$ fails to halt, or halts at something other than $0$ \end{itemize}
\end{itemize}
\end{thm}
\
The way we are going to do this is as follows.
We'll construct our $A$ and $B$ iteratively, starting from the empty set and only ever adding things in to our current attempts.
For each $n \in \mathbb{N}$, we can make an infinite list of ``possible'' witnesses: numbers which might eventually be our choice for $e_n$.
We don't care what these guesses are at the moment, but we just insist that they be disjoint and sorted into increasing order.
Write $$G_i = \{g^{(i)}_1, g^{(i)}_2, \dots \}$$ for the set of possibilities for $e_i$, and $$H_i = \{ h^{(i)}_1, h^{(i)}_2, \dots \}$$ for the set of possibilities for $f_i$.
(I emphasise again that we don't assume any properties of these numbers, other than that $g^{(i)}_m$ and $h^{(i)}_m$ are increasing with $m$, and that no $g^{(i)}_m, g^{(j)}_n, h^{(k)}_p, h^{(l)}_q$ are equal.)
Now, at time-step $0$, we have no information about what's going to be in $A$ and $B$, so let $$A_0 = B_0 = \emptyset$$
Every $G_i$ and $H_i$ is looking for a witness among its members.
We assign a priority order to them: $$G_1 > H_1 > G_2 > H_2 > \dots$$
The idea is that the high-priority sets quickly decide on their witness, and the lower-priority sets get to choose their witness subject to not being allowed to mess up any higher-priority set's decision.
At time-step $t$, we've already built $A_{t-1}, B_{t-1}$ as our best guesses at $A$ and $B$.
We seek an $e_i$ for any $G_i$ which hasn't got one (for $1 \leq i \leq t$), and then we can work on finding $f_i$ for the $H_i$ next.
Run the machines $[i]^{B_{t-1}}$ for $i=1, 2, \dots, t$, for $t$ steps each, each on input $g^{(i)}_1$.
This will approximate $[i]^B$, because $B_{t-1} \subseteq B = \cup_{t=1}^{\infty} B_t$, but it is by no means exactly what we need yet.
\begin{itemize}
\item If our machine $[i]^{B_{t-1}}$ ever attempts to query its oracle on a value greater than $\text{max}(B_{t-1})$, we declare that the machine crashes for this round, and sits out.
Indeed, $B_{t-1}$ is incomplete at this point as a reflection of $B$ - we only have information about it up to $\text{max}(B_{t-1})$ - so it would be useless to try and infer information about $B$ from parts of $B_{t-1}$ which are even bigger than that maximum.
\item If $[i]^{B_{t-1}}$ halts at something other than $0$, then it's no use to us: it can't possibly be a witness we can add to $B$, because such witnesses $e_i$ must satisfy ``$[i]^B(e_i)$ halts at $0$''.
So $G_i$ will sit out of this round.
\item If $[i]^{B_{t-1}}$ fails to halt in the allotted $t$ steps, it is likewise not something we can add to $B$, because we can't (yet) even prove that the machine \emph{halts} on this input, let alone that it halts on $0$.
So $G_i$ will again sit out of this round.
\item But if $[i]^{B_{t-1}}$ halts and outputs $0$ in the allotted $t$ steps, we're in business: for some collection of $i$, we have found some things ($g^{(i)}_1$) that might serve as witnesses.
Throw each of these into $B_{t-1}$ to make $B_t$.
\end{itemize}
OK. Now $G_i$ is happy, but remember we might have had a side-effect here, because if (for the sake of argument) $G_1$ had already decided on its witness during time-step $t-1$, it made that decision with reference to $B_{t-1}$ and not with reference to $B_t$.
The fact that $g^{(i)}_1$ is now in our $B$-guess may alter the computation that $[1]^{B_{t-1}}$ performed to decide on its witness.
(This is because $[1]^{B_{t-1}}(e_1)$ is not in general equal to $[1]^{B_t}(e_1)$.)
How can we ensure that in fact $G_1$'s witness isn't broken?
Well, $[1]$ is a finite machine which we have run for a finite time, so it can only have asked the oracle for values up to some finite number $\beta_1$ before it halted.
So if we can make sure we only ever added $g^{(i)}_n$ to $B_{t-1}$ if it was above $\beta_1$, then we haven't actually changed $B$ from the point of view of $[1]$.
Even after $B_{t-1}$ becomes $B_t$, the computation $[1]^{B_{t-1}}$ performs is exactly the same as the computation $[1]^{B_t}$ performs, because the oracles are the same on all points $[1]$ might query.
Therefore, after adding something from $G_i$ to make $B_t$, we need to delete all numbers below $\beta_i$ from all lower-priority $G_j$ and $H_j$.
(This is easy to do because of our stipulation that the $G_j$ be listed with elements in ascending order.)
That way, no $G_j$ will never even consider any element that breaks a higher-priority $G_i$.
Once we've found the $G$-witnesses at time-step $t$, we can find the $H$-witnesses in exactly the same way; and finally, we move on to time-step $t+1$.
\subsection{Problem} This procedure works pretty well, but there's a problem with it. You might like to meditate on this for a few minutes, because it's revealed in the next paragraph.
The problem is simply that while no lower-priority entry can break a higher-priority one, the reverse might happen!
It might be that $G_1, G_2, G_3$ take ten steps of execution before halting, while $G_4$ halts after just one step and so decides on its witness immediately (that is, at time $t=4$, as opposed to $G_1$'s $t=10$).
Subsequently, $G_1$ will decide on its witness, and the act of throwing its witness into $B$ might break $G_4$'s choice.
Remember that $G_4$ only eliminated breaking-values from \emph{lower}-priority $G_j$, not the higher-priority $G_1$.
(Allowing it to eliminate breaking-values from higher-priority sets could cause the entire protocol to enter an infinite loop, with $G_1$ and $G_4$ each invalidating the results of the other on successive time-steps.)
However, this isn't actually too much of a problem.
Since $G_1$ can only ever decide on its witness once (being the highest-priority), that means $H_1$ will only ever need to decide twice; $G_2$ only ever needs to decide at most four times (it could be first to pick its witness, then $H_1$ overrules it, then it picks again, then $G_1$ overrules it and $H_1$, then it picks again, then $H_1$ overrules it, and finally it picks again).
In general, the $i$th element of the priority order can only be overruled $2^i - 1$ times.
So if $G_i$ is overruled, we can just keep churning through the procedure, chucking more and more elements into $B$; $G_i$ can only be overruled finitely many times, and it has infinitely many elements in its list to play with, so eventually it will work its way into a position when it can never be overruled.
\subsection{Final problem}
OK, this works fine if every $G_i$ eventually finds a witness.
But there's another case: $G_1$ may never find a witness.
For example, $[1]^{B_t}(g^{(1)}_1)$ may never halt, or it may halt but output the value $1$ instead of $0$ (so the protocol sees it as ``uninteresting'' and just repeatedly tells $G_1$ to sit out of the round).
But remember that we're trying to construct a witness that a certain equivalence fails, and so far we've been constructing witnesses that it fails in one of its directions.
(Remember: the equivalence we want to fail is that $T^B$ halts with $0$ iff it has input not in $A$.)
We could still win by finding $e_1$ such that $[1]^B(e_1)$ fails to halt at $0$ despite not being in $A$.
And look! We've precisely got one of those elements, and it's $g^{(1)}_1$.
\section{Summary}
The output of this procedure is a pair of sets $$\displaystyle A = \cup_{t=1}^{\infty} A_t, B = \cup_{t=1}^{\infty} B_t$$
which are semi-decidable (because we built them as a union of finite sets $A_t, B_t$).
For each Turing machine $[i]^X$, we have a witness $e_n$, such that:
\begin{itemize}
\item either $[i]^B(e_n)$ halts with output $0$ and $e_n \in A$, or
\item $[i]^B(e_n)$ fails to halt, or halts with output not equal to $0$, and $e_n \not \in A$.
\end{itemize}
(This can be proved by induction: if $e_n$ is a witness at time $t$ and it is never overruled, then it remains a witness when we pass to $B$, because by construction its computation doesn't change on passing to $B$.)
That is, no Turing machine $[i]^B$ decides $A$.
Likewise, no Turing machine $[i]^A$ decides $B$.
\end{document}

624
MonadicityTheorems.tex Normal file
View File

@@ -0,0 +1,624 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{tikz-cd}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\newmdtheoremenv{defn}{Definition}
\newmdtheoremenv{thm}{Theorem}
\newmdtheoremenv{motiv}{Motivation}
\title{Motivation for the Monadicity Theorems}
\author{Patrick Stevens}
\date{12th December 2015}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/MonadicityTheorems/MonadicityTheorems.pdf} \end{center}
\normalsize
\emph{You should draw diagrams yourself throughout this document. It will be unreadable as mere symbols.}
\section{Definitions}
\begin{defn}
Given a monad $\mathbb{T}$ on $\mathcal{C}$, the Eilenberg-Moore category is $\mathcal{C}^{\mathbb{T}}$, the category of all algebras $(A, \alpha)$ with $A \in \mathcal{C}$ and $\alpha: TA \to A$, whose arrows are ``homomorphisms'' $f \in \text{mor}(\mathcal{C})$ such that $(A, \alpha) \to (B, \beta)$ has $f \alpha: TA \to B$ equalling $(Tf) \beta : TA \to B$.
\end{defn}
\
\begin{defn} Given $\mathbb{T}$ a monad induced by $(F: \mathcal{C} \to \mathcal{D}) \dashv (G: \mathcal{D} \to \mathcal{C})$, where the unit and counit of the adjunction are $\eta, \epsilon$ respectively, the comparison functor $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ is given by $K(B) = (GB, G \epsilon_B)$ and $K(f: A \to B) = Gf$.
\end{defn}
\
\begin{defn}An adjunction $(F: \mathcal{C} \to \mathcal{D}) \dashv (G: \mathcal{D} \to \mathcal{C})$ is monadic if the comparison functor $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ is part of an equivalence, where $\mathbb{T}$ is the monad induced by the adjunction.
\end{defn}
\
That is, a monadic adjunction is one of the form ``$\mathcal{C}$ is being taken to a category of algebras'', like the free-group/forgetful functor adjunction.
\
\begin{defn}
A functor $G: \mathcal{D} \to \mathcal{C}$ is monadic if it has a left adjoint and the adjunction is monadic.
\end{defn}
\
\section{Algebras are coequalisers} \label{coeq}
This is a lemma we're going to need a lot, so without motivating why, I'll just present it.
I'll use the perhaps nonstandard terminology that $\alpha$ \emph{coequalises} two maps if the composites are equal, and it is a \emph{coequaliser} if it is universal among all coequalising maps.
\
\begin{thm}
Let $(A, \alpha)$ be an algebra for monad $T: \mathcal{C} \to \mathcal{C}$.
Then $\alpha: TA \to A$ is a coequaliser.
\end{thm}
\
The way to think about this is from the free-group/forgetful-functor monad on $\mathbf{Sets}$.
In this context, it means: ``every group is a quotient of a free group''.
In the free-group context, $(A, \alpha)$ is a set $A$ together with a map $\alpha: TA \to A$ which ``tells us how to multiply''.
That is, $\alpha$ precisely specifies the multiplication table for the group.
Consider $TTA$, which is the set of words of words of elements of $A$.
On the one hand, to get from a word $((a_1, a_2, \dots), (b_1, b_2, \dots), \dots)$ to an element of $A$, we could multiply out each word and then multiply again;
on the other, we could flatten the word-of-words and then multiply.
(Remember, $\alpha: TA \to A$ is the multiplication, and $\mu_A: TTA \to TA$ is the flattening operation.
Note that $\alpha: (TA, \mu_A) \to (A, \alpha)$ is an algebra homomorphism, because $\alpha \mu_A = \alpha (T \alpha)$ is just part of the definition of an algebra homomorphism.)
\[
\begin{tikzcd}
(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"']
& (TA, \mu_A) \arrow[r, "\alpha"]
& (A, \alpha)
\end{tikzcd}
\]
I claim that doing these two gives the same result: that is, $\alpha$ coequalises the maps $T \alpha$ and $\mu_A : (TTA, \mu_{TA}) \to (TA, \mu_A)$.
Indeed, the fact that $\alpha (T \alpha) = \alpha \mu_A : TTA \to A$ is just part of the definition of an algebra homomorphism.
From the groups point of view, it's true that we can re-group terms and then multiply, and still get the same answer as just multiplying.
I now claim that it's actually a coequaliser: any homomorphism $f: (TA, \mu_A) \to (B, \beta)$ which has $f (T \alpha) = f \mu_A$ must factor through $\alpha: TA \to A$.
From the groups point of view, if we do some operation on the words which doesn't care about the grouping of the terms, it must be defined by its action on the generators (elements of the set $A$).
\[
\begin{tikzcd}
(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"']
& (TA, \mu_A)
\arrow[r, "\alpha"]
\arrow[dr, "f"]
& (A, \alpha)
\arrow[d, dashrightarrow, "g"]
\\
&
&
(B, \beta)
\end{tikzcd}
\]
Indeed, if we apply the forgetful functor, we obtain a split coequaliser (and hence a fortiori a coequaliser), because:
\begin{itemize}
\item it coequalises
\item $\eta_A : A \to TA$ inverts $\alpha: TA \to A$ (``inclusion into the free group'' vs ``multiplying out one-element words'')
\item $\eta_{TA}: TA \to TTA$ inverts $T \alpha: TTA \to TA$
\end{itemize}
So there can only be one possible factorisation of $f$ through $\alpha$ (because forgetting the algebra structure would keep the factorisation the same, and we've just shown that forgetting the algebra structure we already have a coequaliser).
There actually is a factorisation of $f$ through $\alpha$: say $g$ is the factorisation $A \to B$ once we've forgotten the algebra structure.
Then we just need to show that $g$ is an algebra homomorphism $(A, \alpha) \to (B, \beta)$.
(That is, we need to show that after we've forgotten the algebra structure, we can add in the algebra structure again without breaking $g$.)
But it is: $g \alpha = \beta (Tg)$ because:
\begin{itemize}
\item We already know $g \alpha = f: TA \to B$ (by ``$g$ is a factorisation of $f$ through $\alpha$'')
\item $f \mu_A = \beta (T f): $ because $f: TA \to B$ is an algebra hom
\end{itemize}
From the second and first together, we have $g \alpha \mu_A = \beta (T g) (T \alpha)$;
since $\alpha$ is an algebra hom, we have $\alpha \mu_A = \alpha (T \alpha)$.
Therefore $g \alpha T(\alpha) = \beta (Tg)(T \alpha)$, so $f T(\alpha) = \beta (Tg)(T\alpha)$.
So it's enough for $T(\alpha)$ to be epic.
Fortunately, we have already got that it's even split epic, so it is indeed epic, so we do have $g \alpha = \beta (Tg)$, so $g$ is an algebra hom.
\subsection{Summary}
That is, every algebra $(A, \alpha)$ is a coequaliser of two free algebras, $(TTA, \mu_{TA}) \to (TA, \mu_A)$ by $T \alpha$ and $\mu_A$.
\section{Overview}
Obviously we can't get rid of the need for $G$ to have an adjoint, because ``monadic'' is solely a property of an adjunction.
So we want to find an equivalent way to characterise the fact that the adjunction is monadic.
The ``equivalence of categories'' condition tells us that we somehow need to characterise when $K$ has a left adjoint $L$.
(Recall that every equivalence can be refined to an adjoint equivalence.)
Once we've done that, we'll need natural isomorphisms $\alpha: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$ and $\beta: LK \to 1_{\mathcal{D}}$.
The way we'll do that is to show that $A$ and $LKA$ are both coequalisers of the same thing, and $(A, \alpha)$ and $KL(A, \alpha)$ are also both coequalisers of the same thing.
Since two objects which satisfy the UMP of the coequaliser must be isomorphic by a \emph{unique} isomorphism, this will prove the existence (``isomorphic'') and naturality (``unique'') of the bijection.
(Recall that a natural transformation is iso if and only if it is pointwise iso.)
\section{Adjoint for \texorpdfstring{$K$}{K}}
We're looking for $L: \mathcal{C}^{\mathbb{T}} \to \mathcal{D}$, such that $LKB = L(GB, G \epsilon_B)$ is isomorphic to $B$ for each $B \in \mathcal{D}$.
\subsection{Definition of \texorpdfstring{$L$}{L}}
Can we show, then, that $L(GB, G \epsilon_B) \cong B$?
This is a bit of a rabbit out of a hat, but remember that we've shown every $B$ is a coequaliser (as is $(B, \beta)$ for any hom $\beta$).
Therefore, we could try and show that the following construction generalises.
(Note from the far future: this diagram is wrong. $B \in \mathcal{D}$, but $GFB \in \mathcal{C}$.)
\[
\begin{tikzcd}
GFGFB \arrow[r, shift left, "GF(G \epsilon_B)"] \arrow[r, shift right, "\mu_B"']
& GFB \arrow[r, "G \epsilon_B"]
& B \cong^{?} L(GB, G \epsilon_B)
\end{tikzcd}
\]
(For the moment, there's not really any reason to suspect that this is an adjoint to $K$. I'm sorry. It is, though.)
Well, there's a way to generalise it so that the objects look right, anyway: just throw away the initial $G$ from anywhere we can.
Now the arrows are going to be nonsense, of course, and now the congruence doesn't make any sense.
(Replace $\epsilon_B$ by $\beta$, because now it's a general arrow.)
\[
\begin{tikzcd}
FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "?"']
& FB \arrow[r, "?"]
& L(B, \beta)
\end{tikzcd}
\]
What has to happen to the arrows?
We had $G \epsilon_B = \beta$ before, but remember that in the monad $(\mathbb{T}, \eta, \mu)$ induced by the adjunction $F \dashv G$, we defined $\mu_B = G \epsilon_{FB}$.
So actually we can get rid of the $G$ here!
\[
\begin{tikzcd}
FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "\epsilon_{F B}"']
& FB \arrow[r, "?"]
& L(B, \beta)
\end{tikzcd}
\]
This is a general definition of $L(B, \beta)$: as the coequaliser of $F\beta$ and $\epsilon_{FB}$ from $FGFB \to FB$.
Is it well-defined? That is, does the coequaliser exist?
Notice that $F\beta$ and $\epsilon_{FB}$ both have inverse $F \eta_B$, so we have a reflexive pair; so it's enough for $\mathcal{D}$ to have coequalisers of reflexive pairs.
Alternatively, if we apply $G$ to the diagram, we get:
\[
\begin{tikzcd}
TTB \arrow[r, shift left, "T \beta"] \arrow[r, shift right, "G \epsilon_{F B}"']
& TB
\end{tikzcd}
\]
But we've already seen that this is part of a split coequaliser diagram (the three bullet-points in Section \ref{coeq}), so it does have a coequaliser in $\mathcal{C}$;
therefore it's also, separately, enough for $G$ to create coequalisers of $G$-split pairs.
\subsubsection{Summary} \label{summaryL}
We have seen that if we have either of
\begin{enumerate}
\item $\mathcal{D}$ has coequalisers of reflexive pairs
\item $G$ creates coequalisers of $G$-split pairs
\end{enumerate}
then our putative left-adjoint $L$ is well-defined as a function.
It is defined on objects as a coequaliser:
\[
\begin{tikzcd}
FGFB \arrow[r, shift left, "F \beta"] \arrow[r, shift right, "\epsilon_{F B}"']
& FB \arrow[r, "l"]
& L(B, \beta)
\end{tikzcd}
\]
How do we define it on arrows?
An entirely mechanical check tells us how:
\[
\begin{tikzcd}
FGFB
\arrow[r, shift left, "F \beta"]
\arrow[r, shift right, "\epsilon_{F B}"']
\arrow[d, "FGFf"]
& FB
\arrow[r, "l_B"]
\arrow[d, "Ff"]
& L(B, \beta)
\arrow[d, dashrightarrow]
\\
FGFC
\arrow[r, shift left, "F \gamma"]
\arrow[r, shift right, "\epsilon_{F C}"']
& FC
\arrow[r, "l_C"]
& L(C, \gamma)
\end{tikzcd}
\]
where the upper square and the lower square both commute (functoriality of $F$ and naturality of $\epsilon$, respectively).
We can then chase arrows to show that $l_C \circ Ff$ coequalises $F\beta$ and $\epsilon_{FB}$, and so it must factor uniquely through $l_B$.
\subsection{Functoriality of \texorpdfstring{$L$}{L}}
We've shown that $L$ is defined, but is it a functor?
It needs to:
\begin{enumerate}
\item preserve identities
\item preserve composition
\end{enumerate}
Looking at the diagram at the end of the summary above (section \ref{summaryL}), it's clear that if $f = 1_B$ then the induced $L(f)$ must be $1_{(B, \beta)}$.
For composition, if we just shove another copy of that diagram below itself, and join up all the arrows, it's another mechanical diagram-chase to show that the composition is preserved (using the uniqueness of the dashed arrows).
\subsection{Adjointness of \texorpdfstring{$K$}{K} and \texorpdfstring{$L$}{L}}
Finally, we get to the key bit!
We'll use my favourite definition of an adjoint: as an initial object of $((A, \alpha) \downarrow L)$ for each $(A, \alpha) \in \mathcal{C}^{\mathbb{T}}$.
The diagram we need to verify is:
\[
\begin{tikzcd}
FGFA
\arrow[r, shift left, "F \alpha"]
\arrow[r, shift right, "\epsilon_{F A}"']
& F A
\arrow[r]
& L(A, \alpha)
\arrow[r, dashrightarrow, "g"]
& B
\\
& & KL(A, \alpha)
\arrow[r, "K g"]
& K B
\\
& & (A, \alpha)
\arrow[u, "i_{(A, \alpha)}"]
\arrow[ur, "f"']
&
\end{tikzcd}
\]
Since $KB = (GB, G \epsilon_B)$, the diagram becomes:
\[
\begin{tikzcd}
FGFA
\arrow[r, shift left, "F \alpha"]
\arrow[r, shift right, "\epsilon_{F A}"']
& F A
\arrow[r]
& L(A, \alpha)
\arrow[r, dashrightarrow, "g"]
& B
\\
& & KL(A, \alpha)
\arrow[r, "Kg"]
& (GB, G \epsilon_B)
\\
& & (A, \alpha)
\arrow[u, "i_{(A, \alpha)}"]
\arrow[ur, "f"']
&
\end{tikzcd}
\]
\subsubsection{Uniqueness of $g$}
Notice that the UMP of the coequaliser guarantees uniqueness of the arrow $g$, if it exists, because $FA \to B$ coequalises $F\alpha$ and $\epsilon_{FA}$ so it factors \emph{uniquely} through the coequaliser $L(A, \alpha)$.
So we just need to define $i_{(A, \alpha)}$ and to show that $g$ exists for every $f$.
\subsubsection{Existence of $g$}
The only thing we can do with $f$ is to view it as a map $A \to GB$ and then apply $F$ to it, to raise it up to $\mathcal{D}$.
And once we do that, it immediately becomes clear that we want to show that $\epsilon_B \circ Ff: FA \to B$ coequalises $F\alpha$ and $\epsilon_{FA}$.
When we've got that, $\epsilon_B \circ Ff$ will factor through $L(A, \alpha)$ to give our map $g$.
Since $f$ is an algebra hom, we have $f \alpha = G(\epsilon_B) \circ (Tf) = G(\epsilon_B \circ Ff)$.
Applying $F$ obtain $F(f \alpha) = FG(\epsilon_B \circ Ff)$; if we hit this on the left with $\epsilon_B$ then we can use naturality on the right-hand side, so have $$\epsilon_B F(f \alpha) = \epsilon_B FG(\epsilon_B \circ Ff) = \epsilon_B \circ Ff \circ \epsilon_{FA}$$
That is, for every arrow $f: (A, \alpha) \to K(B)$ there is exactly one arrow $g: L(A, \alpha) \to B$ such that $\epsilon_B \circ Ff = g \circ l: FA \to B$.
\subsubsection{Definition of $i_A$}
Conversely, any arrow $g: L(A, \alpha) \to B$ has a unique arrow $h = g \circ l: FA \to B$ satisfying $h(F \alpha) = h (\epsilon_{FA})$, and by the $F \dashv G$ adjunction, this corresponds to a unique arrow $\bar{h} = (Gh) \eta_A : A \to GB$.
This tells us to define $i_{(A, \alpha)} : (A, \alpha) \to KL(A, \alpha)$ by $i_{(A, \alpha)} = (Gl) \eta_A$, where recall that $\eta$ is the unit of the $F \dashv G$ adjunction.
$i_{(A, \alpha)}$ really is a homomorphism: have $KL(A, \alpha) = (GL(A, \alpha), G \epsilon_{L(A, \alpha)})$, so need $$G \epsilon_{L(A, \alpha)} \circ T(i_{(A, \alpha)}) = i_{(A, \alpha)} \alpha$$
The right-hand side is $(Gl) \eta_A \alpha$, which by definition of an algebra hom is just $Gl$; the left-hand side is $$G(\epsilon_{L(A, \alpha)} \circ FGl \circ F\eta_A)$$
which by naturality of $\eta$ is $$G(l \epsilon_{FA} \circ F \eta_A)$$
which by a monad identity is just $Gl$.
So we have the homomorphism $i_{(A, \alpha)}$.
\subsubsection{Verify the triangle}
We have one thing left to do: to show that $Kg \circ i_{(A, \alpha)} = f$ for all $f: (A, \alpha) \to K(B)$.
We have $$Kg \circ i_{(A, \alpha)} = Gg \circ Gl \circ \eta_A = G(\epsilon_B \circ Ff) \circ \eta_A$$
so by naturality of $\eta$, this is just $f$ straight away.
\subsection{Unmotivated bits}
Once you have the idea of ``generalise the fact that $GB$ is a coequaliser'', the rest is basically forced, although it's taken me several hours to work out the steps.
(``Forced'' is not the same as ``easy''!)
That is the only unmotivated step in the construction of this adjoint.
\section{Monadicity Theorems}
We've actually done most of the hard work now.
We've shown that every algebra occurs as a certain coequaliser, and that under one of two conditions
\begin{enumerate}
\item $\mathcal{D}$ has coequalisers of reflexive pairs
\item $G$ creates coequalisers of $G$-split pairs
\end{enumerate}
the comparison functor has a left adjoint which is defined as another coequaliser.
I'll restate them here for convenience.
\
\begin{thm}[Algebra as coequaliser]
\[
\begin{tikzcd}
(TTA, \mu_{TA}) \arrow[r, shift left, "T\alpha"] \arrow[r, shift right, "\mu_A"']
& (TA, \mu_A) \arrow[r, "\alpha"]
& (A, \alpha)
\end{tikzcd}
\]
\end{thm}
\
\begin{defn}[Adjoint $L$ of $K$]
\[
\begin{tikzcd}
FGFA
\arrow[r, shift left, "F \alpha"]
\arrow[r, shift right, "\epsilon_{F A}"']
& F A
\arrow[r, "l"]
& L(A, \alpha)
\end{tikzcd}
\]
The unit of the adjunction is $i_{(A, \alpha)} = (Gl) \eta_A : (A, \alpha) \to KL(A, \alpha)$.
\end{defn}
\
We also showed that if we forget the algebra structure from the coequaliser, then we get another coequaliser, this time in $\mathcal{C}$ rather than $\mathcal{C}^{\mathbb{T}}$; this was because the algebra-coequaliser diagram is in fact a split coequaliser, so is preserved by all functors.
Remembering that $\mu_A = G \epsilon_{FA}$, we see that actually these two diagrams are extremely similar.
$(A, \alpha)$ is a coequaliser of the diagram you get when you apply $G$ to the diagram for $L(A, \alpha)$.
We want to find two natural isomorphisms $\gamma: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$ and $\delta: 1_{\mathcal{D}} \to LK$; this will demonstrate the equivalence of the two categories.
\subsection{Finding \texorpdfstring{$\delta: 1_{\mathcal{D}} \to LK$}{delta}} \label{findingdelta}
Recall that for $B \in \mathcal{D}$, have $LK(B) = L(GB, G \epsilon_B)$.
That is, $LK(B)$ is the coequaliser in the following diagram:
\[
\begin{tikzcd}
FGFGB
\arrow[r, shift left, "FG \epsilon_B"]
\arrow[r, shift right, "\epsilon_{FGB}"']
& FGB
\arrow[r, "l"]
& L(GB, G \epsilon_B)
\end{tikzcd}
\]
Now observe that $\epsilon_B: FGB \to B$ coequalises that diagram too - that's just what it means for $\epsilon$ to be natural - so it must factor through the coequaliser.
\[
\begin{tikzcd}
FGFGB
\arrow[r, shift left, "FG \epsilon_B"]
\arrow[r, shift right, "\epsilon_{FGB}"']
& FGB
\arrow[r, "l"]
\arrow[dr, "\epsilon_B"']
& L(GB, G \epsilon_B) = LKB
\arrow[d, dashrightarrow, "\nu_B"]
\\
& & B
\end{tikzcd}
\]
We want $\nu_B$ to be an isomorphism.
We had two conditions for $L$ being well-defined.
\subsubsection{$G$ creates coequalisers of $G$-split pairs}
In this case, in fact $\epsilon_B$ is a coequaliser, not just coequalising, because when we apply $G$ to the diagram we get a coequaliser of a split pair.
So immediately $\nu_B$ is an iso.
\subsubsection{$\mathcal{D}$ has coequalisers of reflexive pairs}
In this case, applying $G$ to the diagram, and assuming that $G$ preserves coequalisers of reflexive pairs, we will get a diagram as follows, where the top line is a coequaliser:
\[
\begin{tikzcd}
GFGFGB
\arrow[r, shift left, "GFG \epsilon_B"]
\arrow[r, shift right, "G \epsilon_{FGB}"']
& GFGB
\arrow[r, "Gl"]
\arrow[dr, "G\epsilon_B"']
& GL(GB, G \epsilon_B) = GLKB
\arrow[d, dashrightarrow, "G \nu_B"]
\\
& & GB
\end{tikzcd}
\]
But notice that the following is a split coequaliser diagram, and hence is a coequaliser:
\[
\begin{tikzcd}
GFGFGB
\arrow[r, shift left=0.7ex, "GFG \epsilon_B"]
\arrow[r, shift right=1.4ex, "G \epsilon_{FGB}"]
& GFGB
\arrow[l, shift left=2.1ex, "\eta_{GFGB}"]
\arrow[r, shift right, "G \epsilon_B"']
& GB
\arrow[l, shift right, "\eta_{GB}"']
\end{tikzcd}
\]
Indeed:
\begin{itemize}
\item The rightwards maps are coequalising (because if we remove a $G$ from the diagram, we get something which was coequalising)
\item $(G \epsilon_B) \eta_{GB} = 1_{GB}$ (adjunction identity)
\item $\eta_{GB} (G \epsilon_B) = (GFG \epsilon_B)(\eta_{GFGB})$ (naturality of $\eta$)
\item $(G \epsilon_{FGB})(\eta_{GFGB}) = 1_{GFGB}$ (adjunction identity)
\end{itemize}
Therefore $G \nu_B$ is an isomorphism.
If we make the further assumption that $G$ reflects isos, then $\nu_B$ must be an isomorphism.
What else did we assume in this section?
\begin{enumerate}
\item $G$ preserves coequalisers of reflexive pairs
\item $G$ reflects isomorphisms
\end{enumerate}
\subsubsection{Naturality of the isomorphism}
I claim that actually $\nu$ is the counit of the adjunction.
This would make it automatically natural.
Recall from the start of this section (\ref{findingdelta}) that $\nu_B$ is just the factorisation of $\epsilon_B$ across the coequaliser $l: FGB \to L(GB, G \epsilon_B)$.
Therefore it corresponds to $\epsilon_B$ in the natural bijection between ``maps $FGB \to B$ which coequalise $FG \epsilon_B$ and $\epsilon_{FGB}$'' and ``maps $L(GB, G \epsilon_B)$''.
But $\epsilon_B$ corresponds naturally to the identity $1_{GB}$ via the adjunction $F \dashv G$.
So using $U$ for the forgetful functor, $U(1_{KB}) = 1_{GB} \leftrightarrow \epsilon_B \leftrightarrow \nu_B$, all naturally, and so $\nu_B$ must be the lift of $1_{KB}$ over the adjunction $L \vdash K$.
Therefore $\nu$ is actually the counit.
\subsection{Finding \texorpdfstring{$\gamma: 1_{\mathcal{C}^{\mathbb{T}}} \to KL$}{gamma}}
Recall that the unit of the $L \dashv K$ adjunction was $i_{(A, \alpha)} = (Gl) \eta_A : (A, \alpha) \to KL(A, \alpha)$.
We'll show that this unit is a natural isomorphism.
Recall that $\alpha: (TA, \mu_A) \to (A, \alpha)$ is a coequaliser; that is precisely $\alpha: KFA \to (A, \alpha)$.
Applying $K$ to the diagram of Definition 5, we must still get something which coequalises, and so it factors through the coequaliser:
\[
\begin{tikzcd}
KFGFA
\arrow[r, shift left, "KF \alpha"]
\arrow[r, shift right, "K \epsilon_{F \alpha}"']
& KFA
\arrow[r, "\alpha"]
\arrow[dr, "K l"']
& (A, \alpha)
\arrow[d, dashrightarrow]
\\
& & KL(A, \alpha)
\end{tikzcd}
\]
I claim that $i_{(A, \alpha)}$ is this dashed arrow.
Indeed, $i_{({A, \alpha})} \circ \alpha = (Gl) \eta_A \circ \alpha = Gl = Kl$, so by uniqueness of the lift, it must be $i_{(A, \alpha)}$.
It is therefore immediately a natural transformation, being the unit of an adjunction.
It is an iso: if we forget the algebra structure, we end up with the following diagram:
\[
\begin{tikzcd}
GFGFA
\arrow[r, shift left, "GF \alpha"]
\arrow[r, shift right, "G \epsilon_{F \alpha}"']
& GFA
\arrow[r, "\alpha"]
\arrow[dr, "G l"']
& A
\arrow[d, dashrightarrow, "i_A"]
\\
& & GL(A, \alpha)
\end{tikzcd}
\]
But $\alpha$ is still a coequaliser here (we proved earlier that it remained a coequaliser after forgetting the algebra structure), and $Gl$ is also a coequaliser because of our two assumptions: either $G$ preserves coequalisers of reflexive pairs (and hence the coequaliser structure of Definition 5's diagram), or else it creates coequalisers of $G$-split pairs and so we're immediately done.
Therefore $i_A$ must be an isomorphism in $\mathcal{C}$, because $\alpha$ and $Gl$ are both coequalisers.
It is also an isomorphism in $\mathcal{C}^{\mathbb{T}}$, because the forgetful functor reflects isomorphisms: suppose $f: (A, \alpha) \to (B, \beta)$ has an inverse $g: B \to A$ once the algebra structure is forgotten.
Then $g$ is an algebra homomorphism, because we just need $g \beta = \alpha T(g)$; but that is equivalent to $\beta T(f) = f \alpha$ by pre-composing $f$ and post-composing $T(f)$.
Because $f$ is already an algebra homomorphism, this condition is true.
Therefore $i_A$ is an iso in $\mathcal{C}^{\mathbb{T}}$, and so we are finished.
\section{Summary}
We have shown that $K: \mathcal{D} \to \mathcal{C}^{\mathbb{T}}$ has a left adjoint $L$ under one of two conditions;
that the adjunction's unit is an isomorphism under a further assumption;
and that the adjunction's counit is an isomorphism under a further assumption.
Together, this tells us that $K$ is part of an equivalence: that is, $G$ is monadic.
\
\begin{thm}[Monadicity Theorems, one direction]
Suppose $G: \mathcal{D} \to \mathcal{C}$ is a functor which has a left adjoint. If
\begin{enumerate}
\item $G$ creates coequalisers of $G$-split pairs, or
\item $G$ reflects isomorphisms, $\mathcal{D}$ has coequalisers of reflexive pairs, and $G$ preserves them
\end{enumerate}
then $G$ is monadic.
\end{thm}
\end{document}

View File

@@ -0,0 +1,85 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{amsmath,amsthm}
\usepackage{epstopdf}
\usepackage{hyperref}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\title{Proof that the determinant is multiplicative}
\author{Patrick Stevens}
\date{17th April 2014}
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\newenvironment{definition}[1][Definition]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/MultiplicativeDetProof/MultiplicativeDetProof.pdf} \end{center}
\normalsize
\
This is a very concrete proof of the multiplicity of the determinant.
It contains no cleverness at all, and is simply manipulation of expressions.
\
\begin{definition}
The \emph{determinant} of a matrix $A$ is given by $$\sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{i, \sigma(i)}$$
where $S_n$ is the symmetric group on $n$ elements, and $\epsilon$ is the signature of that element.
\end{definition}
\begin{lemma}
Let $\rho \in S_n$, and let $A$ be a matrix. Then
$$\sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{\rho(i) \sigma(i)} = \epsilon(\rho) \det(A)$$
\end{lemma}
\begin{proof}
\begin{align*}
\epsilon(\rho) \det(A) &= \epsilon(\rho) \sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{i, \sigma(i)}
\\&= \sum_{\sigma \in S_n} \epsilon(\sigma \rho) \prod_{i=1}^n A_{i, \sigma(i)}
\\&= \sum_{\sigma \in S_n} \epsilon(\sigma \rho) \prod_{i=1}^n A_{\rho(i), \sigma(\rho(i))}
\\&= \sum_{\tau \in S_n} \epsilon(\tau) \prod_{i=1}^n A_{\rho(i), \tau(i)}
\\&= \sum_{\tau \in S_n} \epsilon(\sigma) \prod_{i=1}^n A_{\rho(i), \sigma(i)}
\end{align*}
\end{proof}
\begin{theorem}$$\det(AB) = \det(A) \det(B)$$
\end{theorem}
\begin{proof}
We use summation convention throughout.
\begin{align*}
\det(AB) &= \sum_{\sigma \in S_n} \epsilon(\sigma) \prod_{i=1}^n (AB)_{i, \sigma(i)}
\\ &= \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, k_1} B_{k_1, \sigma(1)} A_{2, k_2} B_{k_2, \sigma(2)} \dots A_{n, k_n} B_{k_n, \sigma(n)}
\\ &= \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, k_1} A_{2, k_2} \dots A_{n, k_n} B_{k_1, \sigma(1)} \dots B_{k_n, \sigma(n)}
\end{align*}
But the $k_1, \dots, k_n$ only ever contribute when they are a permutation of $1, \dots, n$, because (assuming wlog $k_1 = k_2$) for each $\sigma^+$ there exists $\sigma^-$ such that $\sigma^+(1) = \sigma^-(2), \sigma^-(1) = \sigma^+(2), \sigma^-(k) = \sigma^+(k)$ for other $k$. Then we have
$$A_{1, k_1} B_{k_1, \sigma^+(1)} A_{2 k_1} B_{k_1 \sigma^+(2)} \text{terms} = A_{1, k_1} B_{k_1, \sigma^-(1)} A_{2 k_1} B_{k_1 \sigma^-(2)} \text{terms}$$
and because $\epsilon$ negates the sign, we have that these two terms cancel.
Hence the sum over $k_i$ is in fact a sum over all $\rho$ such that $\rho(i) = k_i$ for all $i$:
Then $$\det(AB) = \sum_{\rho \in S_n} \sum_{\sigma \in S_n} \epsilon(\sigma) A_{1, \rho(1)} A_{2, \rho(2)} \dots A_{n, \rho(n)} B_{\rho(1), \sigma(1)} \dots B_{\rho(n), \sigma(n)}$$
Applying the lemma gives
\begin{align*}
\det(AB) &= \det(B) \sum_{\rho \in S_n} \epsilon(\rho) A_{1, \rho(1)} A_{2, \rho(2)} \dots A_{n, \rho(n)}
\\&= \det(B) \det(A)
\end{align*}
\end{proof}
\end{document}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,154 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\newmdtheoremenv{thm}{Theorem}[section]
\newcommand{\prov}{\square}
\newcommand{\encode}[1]{\ulcorner #1 \urcorner}
\newcommand{\lob}{L\"ob's Theorem}
\title{Parametric bounded version of L\"ob's theorem}
\author{Patrick Stevens}
\date{24th July 2016}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/ParametricBoundedLoeb2016/ParametricBoundedLoeb2016.pdf} \end{center}
\normalsize
\section{Introduction}
I was recently made aware of a preprint\cite{critch} of a paper which proves a bounded version of \lob.
\
\begin{thm}[Parametric Bounded L\"ob]
If $\prov A$ is the operator ``there exists a proof of $A$ in Peano arithmetic''
and $\prov_k A$ is the operator ``there exists a proof of $A$ in $k$ or fewer lines in Peano arithmetic'',
then for every formula $p$ of one free variable in the language of PA, and every computable $f: \mathbb{N} \to \mathbb{N}$
which grows sufficiently fast, it is true that
$$(\exists \hat{k})[\color{red}(\vdash [\forall k][\color{green}\prov_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k > \hat{k}][p(k)]) \color{black}]$$
\end{thm}
\
(Colour is used only to emphasise logical chunks of the formula.)
The paper gives plenty of motivation about why this result should be interesting and useful:
section 6 of the paper, for instance, is an application to the one-shot Prisoner's Dilemma
played between agents who have access to each other's source code.
However, I believe that while the theorem may be true and the proof may be correct,
its application may not be as straightforward as the paper suggests.
\section{Background}
\begin{thm}[\lob]
Suppose $\prov \encode{A}$ denotes ``the formula $A$ with G\"odel number $\encode{A}$ is provable''.
If $$\mathrm{PA} \vdash (\prov \encode{P} \to P)$$
then
$$\mathrm{PA} \vdash P$$
\end{thm}
\
\lob{} is at heart a statement about the incompatibility of the interpretation of the box as ``provable''
with the intuitively plausible deduction rule that $\prov \encode{P} \to P$.
(``If we have a proof of $P$, then we can deduce $P$!'')
The Critch paper has an example in Section 1.4 where $P$ is the Riemann hypothesis.
\section{Problem with the paper}
Suppose $\mathcal{M}$ is a model of Peano arithmetic, in which our agent is working.
It is a fact of first-order logic (through the L\"owenheim-Skolem theorem) that there is no first-order way of distinguishing any particular model of PA.
Therefore the model of PA could be non-standard; this is not something a first-order reasoning agent could determine.
If the agent is working with a non-standard model of PA, then all the theorems of the Critch paper may well go through.
However, they become substantially less useful, as follows.
Let us write $M$ for the underlying class (or set) of the model $\mathcal{M}$ of PA. Then the statement $$(\exists \hat{k})[\color{red}(\vdash [\forall k][\color{green}\prov_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k > \hat{k}][p(k)]) \color{black}]$$
when relativised to the model $\mathcal{M}$ becomes
$$(\exists \hat{k} \in M)[\color{red}(\vdash [\forall k \in M][\color{green}\prov^{\mathcal{M}}_{f(k)} p(k) \to p(k) \color{red}])\color{black} \Rightarrow \color{blue}(\vdash [\forall k \in M^{>\hat{k}}][p(k)]) \color{black}]$$
where $\prov^{\mathcal{M}}_{f(k)}$ is now shorthand for ``there is a proof-object $P$ in $M$ such that $P$ encodes a $M$-proof of $p(k)$ which is fewer than $f(k)$ lines long''.
Notice that the quantifiers have been restricted to $M$; in particular, $\hat{k}$ might be a non-standard natural number.
Likewise, the ``there is a proof'' predicate is now ``there is an object which $M$ unpacks into a proof''; but such objects may be non-standard naturals themselves, and unpack into non-standard proofs (which $\mathcal{M}$ still believes are proofs, because it doesn't know the difference between ``standard'' and ``non-standard'').
\subsection{Aside: non-standard proof objects}
What is a non-standard proof object?
Let's imagine we have some specific statements $a_i$ for each natural $i$ such that $a_i \to a_{i+1}$ for each $i$, and such that $a_0$ is an axiom of PA.
I'm using $a_i$ only for shorthand; the reader should imagine I had some specific statements and specific proofs of $a_i \to a_{i+1}$.
Consider the following proof of $a_2$:
\begin{enumerate}
\item $a_0$ (axiom)
\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line)
\item $a_2$ (by writing out the proof of $a_1 \to a_2$ above this line)
\end{enumerate}
If we take a simple G\"odel numbering scheme, namely ``take the number to be an ASCII string in base $256$'',
it's easy to see that this proof has a G\"odel number.
After all, we're imagining that I have specific proofs of $a_i \to a_{i+1}$, so I could just write them in.
Then you're reading this document which was originally encoded as ASCII, so the G\"odel numbering scheme must have worked.
Similarly, there is a G\"odel number corresponding to the following:
\begin{enumerate}
\item $a_0$ (axiom)
\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line)
\item \dots
\item $a_k$ (by writing out the proof of $a_{k-1} \to a_k$ above this line)
\end{enumerate}
Now, suppose we're working in a non-standard model, and fix non-standard $K$.
Then there is a (probably non-standard) natural $L$ corresponding to the following proof:
\begin{enumerate}
\item $a_0$ (axiom)
\item $a_1$ (by writing out the proof of $a_0 \to a_1$ above this line)
\item \dots
\item $a_K$ (by writing out the proof of $a_{K-1} \to a_K$ above this line)
\end{enumerate}
Now, this is not a ``proof'' in our intuitive sense of the word, because from our perspective it's infinitely long.
However, the model still thinks this is a proof, and that it's coded by the (non-standard) natural $L$.
\subsection{Implication for PBL}
So the model $\mathcal{M}$ believes there is a natural $\hat{k}$ such that \dots
But if that natural is non-standard (and remember that this is not something the model can determine without breaking into second-order logic!)
then PBL doesn't really help us.
It simply tells us that all sufficiently-large non-standard naturals have a certain property; but that doesn't necessarily mean any standard naturals have that property.
And the application to the Prisoners' Dilemma in Critch's paper requires a standard finite $\hat{k}$.
If we, constructing the agent Fairbot, could somehow guarantee that it would be working within the standard model of PA, then all would be well.
However, we can't do that within first-order logic.
It could be the case that when constructing Fairbot, the only sufficiently-large naturals turn out to be non-standard.
When we eventually come to run $\mathrm{Fairbot}_k(\mathrm{Fairbot}_k)$, it could therefore be that it will take nonstandardly-many proof steps to discover the ``(coooperate, cooperate)'' outcome.
In practice, therefore, the agents would not find that outcome: we can only run them for standardly-many steps, and all non-standard naturals look infinite to us.
\section{Acknowledgements}
My thanks are due to Mi\"etek Bak (who persuaded me that there might be a problem with the article)
and to John Aspden (who very capably forced Mi\"etek to clarify his objection until I finally understood it).
As ever, any mistakes in this article are due only to me.
\begin{thebibliography}{9}
\bibitem{critch}
Andrew Critch,
\emph{Parametric Bounded L\"ob's Theorem and Robust Cooperation of Bounded Agents},
\url{http://arxiv.org/abs/1602.04184v4}
\end{thebibliography}
\end{document}

View File

@@ -7,5 +7,5 @@ It is intended to be read in as a subtree to its [build pipeline](https://github
## The licence
The `RepresentableFunctors.tex` file is CC BY-SA 4.0.
The `pdf-targets.txt` file is licensed [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).
All `.tex` files are CC BY-SA 4.0.

160
Tennenbaum.tex Normal file
View File

@@ -0,0 +1,160 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{mdframed}
\usepackage{hyperref}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\newmdtheoremenv{thm}{Theorem}[section]
\theoremstyle{remark}
\newtheorem*{note}{Notation}
\newtheorem*{rmk}{Remark}
\newtheorem*{example}{Example}
\title{Tennenbaum's Theorem}
\author{Patrick Stevens}
\date{27th April 2016}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/Tennenbaum/Tennenbaum.pdf} \end{center}
\normalsize
\section{Introduction}
\begin{thm}[Tennenbaum's Theorem]
Let $\mathfrak{M}$ be a countable non-standard model of Peano arithmetic, whose carrier set is $\mathbb{N}$.
Then it is not the case that $+$ and $\times$ have decidable graphs in the model.
\end{thm}
\
\begin{note}
We will use the notation $\{ e\}$ to represent the $e$th Turing machine.
$e$ is considered only to be a standard integer here.
For example, we might view the G\"odel numbering scheme as being ``convert from ASCII and then interpret as a Python program''.
\end{note}
\begin{rmk}
How might our standard Turing machine refer to a nonstandard integer?
The ground set of our nonstandard model is $\mathbb{N}$: every nonstandard integer has a standard one which represents it in $\mathbb{N}$.
Perhaps $4 \in \mathbb{N}$ is the object that the nonstandard model $\mathfrak{M}$ thinks is the number $7$, for instance.
So the way a Turing machine would refer to the number $7$-in-the-model is to use $4$ in its source code.
\end{rmk}
What does it mean for $+$ to have a decidable graph?
Simply that there is some (standard) natural $n$ such that,
when we unpack $n$ into instructions for running a Turing machine,
we obtain a machine that takes three naturals (that is, standard naturals) $a, b, c$ and outputs $1$ iff, when we take the referents $a', b', c'$ of $a, b, c$ in the model $\mathfrak{M}$, it is true that $a' +_{\mathfrak{M}} b' = c'$.
\begin{example}
A strictly standard-length program may halt in nonstandard time, when interpreted in a nonstandard model.
Indeed, fix some nonstandard ``infinite'' $n$ (i.e. $n$ is not a standard natural).
Then the following program halts after $n$ steps.
\begin{verbatim}
ans = 0;
for i = 1 to n:
ans := ans + 1;
end
HALT with output ans;
\end{verbatim}
\end{example}
\section{Overview of the proof}
The proof \emph{est omnis divisa in partes tres}.
\begin{enumerate}
\item In any model, there is some pair of semidecidable but recursively inseparable sets.
\item We can use these to create an undecidable set of true standard naturals which can, in some sense, be coded up into a (nonstandard) natural in our model.
\item If $+$ and $\times$ were decidable, then the coding process would produce an object which would let us decide the undecidable set; contradiction.
\end{enumerate}
\section{Existence of recursively inseparable sets}
This is fairly easy. Take $A = \{ e : \{ e \}(e) \downarrow = 0 \}$ and $B = \{ e: \{e \}(e) \downarrow > 0 \}$, where $\downarrow=$ means ``halts and is equal to'', and $\downarrow >$ means ``halts and is greater than''.
Recall that $e$ must be standard.
Now, suppose there were a (standard) integer $n$ such that $\{ n \}$ were the indicator function on set $X$, where $X \cap B = \emptyset$ and $A \subseteq X$.
Then what is $\{n\}(n)$?
If it were $0$, then $n$ is not in $X$, so $n$ is not in $A$ and so $\{n\}(n)$ doesn't halt at $0$.
That's a contradiction.
If it were $1$, then $n$ is in $X$ and hence is not in $B$, so $\{n\}(n)$ doesn't halt at something bigger than $0$; again a contradiction.
So we have produced a pair of sets which are both semidecidable but are recursively inseparable, in the sense that no standard integer $n$ has $\{n\}$ deciding a superset $X$ of $A$ where $X \cap B = \emptyset$.
(This is independent of the model of PA we were considering; it's purely happening over the ground set.)
\section{Coding sets of naturals as naturals}
We can take any set of (possibly nonstandard) naturals and code it as a (possibly nonstandard) natural, as follows.
Given $\{ n_i : i \in I \}$, code it as $\sum_{i \in I} 2^{n_i}$.
If $+$ and $\times$ are decidable, then this is a decidable coding scheme.
(The preceding line is going to be where our contradiction arises, right at the end of the proof!)
Notice that if $I$ is ``standard-infinite'' (that is, it contains nonstandardly-many elements) then the resulting code is nonstandard.
Additionally if any $n_i$ is strictly-nonstandard.
\section{Undecidable set in \texorpdfstring{$\mathfrak{M}$}{M}}
Take our pair of recursively inseparable semidecidable sets: $\mathfrak{A}$ and $\mathfrak{B}$.
(We constructed them explicitly earlier, but now we don't care what they are.)
Recalling a theorem that being semidecidable is equivalent to being a projection of a decidable set,
write $A$ for a decidable set such that $(\exists y)[(n, y) \in A]$ if and only if $n \in \mathfrak{A}$,
and similarly for $B$.
(The quantifiers range over $\mathbb{N}$, because $A$ and $B$ consist only of standard naturals, being subsets of the ground set.)
By their recursive-inseparability, they are in particular disjoint, so we have $$(\forall n)[(\exists x)(\langle n, x \rangle \in A) \to \neg (\exists y)(\langle n, y \rangle \in B)]$$
where the quantifiers all range over $\mathbb{N}$.
Equivalently, $$(\forall n)(\forall x)(\forall y)(\neg \langle n,x \rangle \in A \vee \neg \langle n,y \rangle \in B)$$
If we bound the quantifiers by any standard $m = SS\dots S(0)$ (which we explicitly write out, so it's absolute between all models of PA), we obtain an expression which our nonstandard model believes, because the expression is absolute for PA:
$$(\forall n < m)(\forall x < m)(\forall y < m)(\neg \langle n,x \rangle \in A \vee \neg \langle n,y \rangle \in B)$$
This is true for every standard $m$, and so it must be true for some nonstandard $m$ by overspill, since $\mathfrak{M}$ doesn't know how to distinguish between standard and nonstandard elements.
If the property were only ever true for standard $m$, then $\mathfrak{M}$ could identify nonstandard $m$ by checking whether that property held for $m$.
Let $e$ be strictly nonstandard such that
\begin{equation} \label{eqn:prop}
\mathfrak{M} \vDash (\forall n < e)(\forall x < e)(\forall y < e)(\langle n,x \rangle \not \in A \vee \langle n,y \rangle \not \in B)
\end{equation}
where we note that this time $e$ is not written out explicitly as $SS\dots S(0)$ because it's too big to do that with.
Finally, we define our undecidable set $X \subseteq \mathbb{N}$ of \emph{standard} naturals to be those standard naturals $x$ such that $$\mathfrak{M} \vDash (\exists y < e) (\langle x, y \rangle \in A)$$
This is undecidable in the standard sense: there are no standard $m$ such that $\{m \}$ is the indicator function of $X$.
Indeed, I claim that $X$ separates $\mathfrak{A}$ and $\mathfrak{B}$.
(Recall that all members of $X$, $\mathfrak{A}$ and $\mathfrak{B}$ are standard.)
\begin{itemize}
\item If $a \in \mathfrak{A}$ then there is some standard natural $n$ such that $\langle a, n \rangle \in A$;
and $n$ is certainly less than the nonstandard $e$.
Hence $a \in X$.
\item If $b \in \mathfrak{B}$, then there is standard $n$ such that $\langle b, n \rangle \in B$.
Then $n < e$, so by (\ref{eqn:prop}) we have $\langle b, x \rangle \not \in A$ for all $x < e$.
That is, $b \not \in X$.
\end{itemize}
\section{Coding up \texorpdfstring{$X$}{X}}
Now if we code up $X$, which is undecidable, using our coding scheme $$\{ n_i : i \in I \} \mapsto \sum_{i \in I} 2^{n_i}$$
we obtain some nonstandard natural; say $p = \sum_{x \in X} 2^x$.
Supposing the $+$ and $\times$ relations to be decidable, this coding is decidable.
Remember that $X$ is a set of standard naturals which is undecidable: no standard Turing machine decides $X$.
But here is a procedure to determine whether a standard element $i \in \mathbb{N}$ is in $X$ or not:
\begin{enumerate}
\item Take the $i$th bit of $p$. (This is decidable because $+$ and $\times$ are.)
\item Return ``not in $X$'' if the $i$th bit is $0$.
\item Otherwise return ``is in $X$''.
\end{enumerate}
This contradicts the undecidability of $X$.
\section{Acknowledgements}
The structure of the proof is from Dr Thomas Forster's lecture notes on Computability and Logic from Part III of the Cambridge Maths Tripos, lectured in 2016.
\end{document}

99
TokyoEntrance2016.tex Normal file
View File

@@ -0,0 +1,99 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{hyperref}
\usepackage{lmodern}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\title{Tokyo 2016 Graduate School Entrance Exam}
\author{Patrick Stevens}
\date{16th March, 2017}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/TokyoEntrance2016/TokyoEntrance2016.pdf} \end{center}
\normalsize
\section{Question 2}
\subsection{Prove \texorpdfstring{that $S = 2\pi \int_{-1}^1 F(y, y') \ \mathrm{d}x$}{a certain integral form for S}}
The surface may be parametrised as $$S(x, \theta) = (x, y(x) \cos(\theta), y(x) \sin(\theta))$$
where $\theta \in [0, 2\pi)$ and $x \in [-1,1]$.
Hence $$\dfrac{\partial S}{\partial x} = (1, y'(x) \cos(\theta), y'(x) \sin(\theta))$$ and $$\dfrac{\partial S}{\partial \theta} = (0, -y(x) \sin(\theta), y(x) \cos(\theta))$$
so the surface element $$\mathrm{d}\Sigma = \left| \left(1, y'(x) \cos(\theta), y'(x) \sin(\theta) \right) \times (0, -y(x) \sin(\theta), y(x) \cos(\theta)) \right| \mathrm{d}x \mathrm{d}\theta$$
i.e. $$y \sqrt{1+(y')^2}$$
The integral is therefore $$\int_{0}^{2 \pi} \int_{-1}^1 y \sqrt{1+(y')^2} \mathrm{d}x \mathrm{d}\theta$$
as required.
\subsection{Prove the first integral of the Euler-Lagrange equation}
We know the Euler-Lagrange equation $$\dfrac{\partial F}{\partial y} = \dfrac{\mathrm{d}}{\mathrm{d}x} \dfrac{\partial F}{\partial y'}$$
Now, $$\frac{\mathrm{d}F}{\mathrm{d}{x}} = \dfrac{\partial F}{\partial y} \dfrac{\mathrm{d}y}{\mathrm{d}x} + \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y'}{\mathrm{d}x}$$
so substituting Euler-Lagrange into this:
$$\frac{\mathrm{d}F}{\mathrm{d}{x}} = \dfrac{\mathrm{d}}{\mathrm{d}x} \left(\dfrac{\partial F}{\partial y'}\right) \dfrac{\mathrm{d}y}{\mathrm{d}x} + \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y'}{\mathrm{d}x}$$
Notice the right-hand side is just what we get by applying the product rule: it is $$\dfrac{\mathrm{d}}{\mathrm{d}x} \left( \dfrac{\partial F}{\partial y'} \dfrac{\mathrm{d}y}{\mathrm{d}x} \right)$$
The result follows now by simply integrating both sides with respect to $x$.
\subsection{Solve the differential equation}
Just substitute $F(y,y') = y \sqrt{1+(y')^2}$:
$$y\sqrt{1+(y')^2} - y' \left[ \frac{1}{2} y (1+(y')^2)^{-1/2} \cdot 2 y'\right] = c$$
which can be simplified to $$y (1+(y')^2)^{-1/2} \left[ (1+(y')^2) -(y')^2\right] = c$$
i.e. $$y^2 - c^2 = (c y')^2$$
If $c=0$ then this is trivial: $y=0$. From now on, assume $c \not = 0$; then since $y$ is known to be positive, $c > 0$.
Invert: $$\frac{c^2}{y^2-c^2} = \left(\frac{dx}{dy}\right)^2$$
so $$\dfrac{dx}{dy} = \pm \frac{c}{\sqrt{y^2-c^2}}$$
which is a standard integral: $$x = \pm c \log(y+\sqrt{y^2-c^2}) + K$$
Also $y(-1) = 2 = y(1)$, so $$\{1,-1\} = \{c \log(2+\sqrt{4-c^2}) + K, -c \log(2+\sqrt{4-c^2}) + K\}$$
which means $K = 0$.
Then $$\exp\left(\pm \frac{x}{c}\right) = y+\sqrt{y^2-c^2}$$
Since $y(1) = 2$, we have $$\exp(\pm 1/c) = 2+\sqrt{4-c^2}$$ and in particular (since $c>0$) we have the $\pm$ on the left-hand side being positive; that is the expression $c$ is required to satisfy.
Rearrange: $$y = \frac{c+\exp(2 \frac{x}{c})}{2 \exp\left(\frac{x}{c}\right)}$$
which completes the question.
\section{Question 3}
\subsection{Part 1}
We must put one ball into each box. Then we are distributing $n-r$ balls freely among $r$ boxes, so the answer is $$\binom{n-r-1}{r-1}$$
(standard stars-and-bars result).
\subsection{Part 2}
Consider the $n$ black balls laid out in a line; we are interspersing the $m$ white balls among them.
Equivalently, we have $n+1$ boxes (represented by the gaps between black balls) and we are trying to put $m$ balls into them.
By stars-and-bars again, the answer is $\binom{n}{m-1}$.
\subsection{Part 3}
Condition on the colour of the first ball, and write $l$ for the length of the first run.
Then
$$P_{n,m}(r,s) = \frac{n}{n+m} \sum_{l=0}^{n} P_{n-l,m}(r-1, s) + \frac{m}{n+m} \sum_{l=0}^m P_{n, m-l}(r, s-1)$$
Also $P_{n,m}(0,s) = \chi[n=0] \chi[s=1]$ where $\chi$ is the indicator function, and $P_{n,m}(r,0) = \chi[m=0] \chi[r=1]$.
\subsection{Part 4}
\subsection{Part 5}
If $m \leq n$, then the sum is $$\sum_{l=0}^m \binom{n}{l} \binom{m}{m-l}$$ which is the $x^m$ coefficient of the left-hand side and hence of the right-hand side.
If $m > n$, then the sum is $$\sum_{l=0}^n \binom{n}{n-l} \binom{m}{l}$$ which is the $x^n$ coefficient of the left-hand side and hence of the right-hand side.
For the second equation: this follows by setting $n \mapsto n-1$ in the above.
\subsection{Part 6}
\end{document}

256
YonedaWithoutTears.tex Normal file
View File

@@ -0,0 +1,256 @@
\documentclass[11pt]{amsart}
\usepackage{geometry}
\geometry{a4paper}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{mdframed}
\usepackage{hyperref}
% Reproducible builds
\pdfinfoomitdate=1
\pdftrailerid{}
\pdfsuppressptexinfo=-1
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\newmdtheoremenv{note}{Note}
\newmdtheoremenv{thm}{Theorem}
\newmdtheoremenv{motiv}{Motivation}
\newmdtheoremenv{definition}{Definition}
\newcommand{\homfrom}[1]{\mathrm{Hom\left(#1, -\right)}}
\newcommand{\homto}[1]{\mathrm{Hom\left(-,#1\right)}}
\newcommand{\Set}{\mathbf{Set}}
\newcommand{\Nat}{\mathrm{Nat}}
\title{Yoneda Without Tears}
\author{Patrick Stevens}
\begin{document}
\maketitle
\tiny \begin{center} \url{https://www.patrickstevens.co.uk/misc/YonedaWithoutTears/YonedaWithoutTears.pdf} \end{center}
\normalsize
\section{Introduction}
This document will assume that you are familiar with the notion of a category and a functor.
Ideally you will also be familiar with the idea of a natural transformation and a hom-set.
This notation varies widely; we will fix one version of it now.
\
\begin{definition}
Fix a locally small category $\mathcal{C}$, and pick an object $A \in \mathcal{C}$.
The \emph{hom-set} $\homfrom{A}$ is defined to be the set $\{f \in \mathrm{mor} \mathcal{C} : \mathrm{dom} f = A\}$.
There is of course a dual: $\homto{A}$ is defined to be $\{f \in \mathrm{mor} \mathcal{C} : \mathrm{cod} f = A\}$.
\end{definition}
\
\label{defn:nattrans}
\begin{definition}
Let $F, G: \mathcal{C} \to \mathcal{D}$ be functors.
A \emph{natural transformation} from $F$ to $G$ is a selection $\alpha$, parametrised over each $X \in \mathcal{C}$, of an arrow $\alpha_X$ in $\mathcal{D}$ from $FX \to GX$.
We require this selection to be ``natural'' in that whenever $f : X \to Y$ is an arrow in $\mathcal{C}$, we have $$FX \xrightarrow{Ff} FY \xrightarrow{\alpha_Y} GY = FX \xrightarrow{\alpha_X} GX \xrightarrow{Gf} GY$$
\end{definition}
\
\begin{definition}
Let $F, G : \mathcal{C} \to \mathcal{D}$ be functors.
Then the set of natural transformations from $F$ to $G$ is denoted $\Nat[F, G]$.
\end{definition}
\
Then you should be able to parse the statement of the Yoneda lemma, though not necessarily understand it:
\
\begin{thm}[The Yoneda lemma]
Let $\mathcal{C}$ be a category, and let $G: \mathcal{C} \to \Set$ be a functor.
Let $A$ be an object of $\mathcal{C}$.
Then $$\Nat [\homfrom{A} \to G ] \cong G A$$
and moreover the bijection is natural in both $G$ and $A$.
\end{thm}
\section{The Relevant Interpretation of ``Category''}
The main insight comes from asking the question, ``how can I better understand what $\homfrom{A}$ means?''.
There are many ways to understand what a category is, but the relevant one here is that a category is a description of a many-sorted algebraic theory with unary functions between them.
An object is a placeholder for a type, and an arrow is a placeholder for a function from that type.
A category is only an abstract description of some types and their interactions.
It's a mistake to think of an object of a category viewed this way as ``being'' $\mathbb{N}$.
We only get to think of the type in this way when we instantiate the theory: when we find a model of it somewhere.
Since sets are the easiest places to work, we will consider set-models only.
\subsection{Examples}
\begin{itemize}
\item Any category at all describes a theory which has a model that has only one type which is empty: simply interpret all the type-templates (i.e. objects in the category) as being that set, and all arrows become the identity.
\item Any category at all describes a theory which has a model that has only one type which has one element in: simply interpret all the type-templates (i.e. objects in the category) as being that set, and all arrows again become the identity.
\item Any category with one object (i.e. any monoid when viewed as a category) has a model where there is only one type, and the elements of that type are the elements of the monoid (i.e. the arrows in the category). (Bear this one in mind.)
\item The category with three objects $A, B, C$, with unique non-identity arrows $f, g : A \to B$, $h : B \to C$, $k : A \to C$ has a model where $A$ is represented by $\mathbb{Z}$, $B$ is represented by $\mathbb{N}$, and $C$ is represented by $\mathrm{Bool}$.
Indeed, take $f = |\cdot|$ the absolute value function, $g$ the negation function, and $h$ the ``is even'' function.
\item That category also has a model (which is not a set-model) where $A$ is the type of groups, $B$ is also the type of groups, and $C$ is the type of sets.
Take $f$ to be the identity, $g$ to be the ``construct a group with the same objects but with multiplication reversed'', and $h$ to be ``take the underlying set of the group''.
\item That category has a much more boring set-model: take $A = \{1,2\}$, $B = \{5, 6\}$, and $C = \{1\}$.
Let $f : n \mapsto n+4$, let $g$ be the other bijection, and let $h$ be $n \mapsto 1$.
\end{itemize}
In general, any category has lots and lots of models.
\subsection{What is a model anyway?}
A set-model of the theory is identifying, for each type-description in the category, a set whose elements are representing elements of that type.
It also identifies, for each description of a unary predicate on the types, a function between the types; and the models of the unary predicates have to compose in a way that is reflected in the type-description.
This is just a functor from $\mathcal{C} \to \Set$!
So the first key insight is that a functor from $\mathcal{C} \to \Set$ is precisely a model of the theory described by $\mathcal{C}$.
\subsection{Homomorphisms between models}
We define a certain restricted form of model homomorphism as follows.
(Note that this is not quite what is usually meant by a model homomorphism, and I have invented the term ``fixed model homomorphism'' to describe it.)
A \emph{fixed model homomorphism} $\alpha$ is a function from one model $F: \mathcal{C} \to \Set$ of the theory $\mathcal{C}$ to another model $G: \mathcal{C} \to \Set$, which assigns to each type $FA$ of the model $F$ the corresponding type $GA$ of the model $G$, in such a way that $\alpha$ respects the predicates $Ff: FA \to FB$ of the model:
$$FA \xrightarrow{Ff} FB \xrightarrow{\alpha_B} GB = FA \xrightarrow{\alpha_A} GA \xrightarrow{Gf} GB$$
Notice that this is a model homomorphism which additionally ensures that $FA$ is always mapped to $GA$ (for any $A$), so (for example) it won't collapse all the objects $FA$ into a single object in $G$'s image unless $G$ is the trivial model.
You might recognise the definition of a fixed model homomorphism as being the definition of a natural transformation between $F$ and $G$ when viewed as functors.
So the second key insight is that a natural transformation between functors $F: \mathcal{C} \to \Set$ and $G$ is just a fixed homomorphism between the $\Set$-models $F$ and $G$ of the theory $\mathcal{C}$.
\section{Free models}
Throughout mathematics, there is the notion of a free object: an object which somehow has the least possible structure while still obeying all the rules it has to obey.
Can we find a free model of the theory represented by the category $\mathcal{C}$?
Imagine $\mathcal{C}$ has two objects. Then any free model worth its name must have at least two types - otherwise we've definitely lost information in the model.
(The theory said there were two types, so our model had better have at least two types or else it's not a good model of the theory.)
Likewise, any free model worth its name had better have \emph{at most} two types, since otherwise we've added extra structure that the theory didn't specify.
For concreteness, take our category to be the unique category with two objects and a single arrow from one to the other (and also the identity arrows).
Then if our free model had three types in it, that would be very weird (somehow the model would fundamentally require introducing extra things into the universe if we ever wanted to realise the model).
So our free model had better have exactly one type for every object in the category.
Moreover, for the same reasons, every arrow in the category should have exactly one corresponding unary function in the model.
(Any fewer, and we've lost the information that the theory should have some particular predicate; any more and we've somehow got a theory with a predicate that can't be realised without adding an extra function to the universe.)
An excellent guess for a free model turns out to be the following (called the \emph{term model}): pick some type (i.e. some object in the category).
Let that type have exactly one element, and then chase through all the functions, declaring that everything which is not obviously the same as something we've already made is different.
(By analogy with the construction of the free group on some generators: keep piling together the generators, and declare to be different every word which you haven't obviously already made.)
Declare that there are no other things in the universe: if we haven't constructed some member of a type this way, then that member can't exist.
It's called the term model because we select a type, declare that there is a term of that type, and then see what else is forced to exist.
\subsection{Examples}
\subsubsection{Simplest example}
For example, in the category above with two objects $A$ and $B$, and a single arrow $f$ from $A$ to $B$, we could construct two different models this way.
The first is the $A$-related model: declare that the type $A$ has a single element $a$, and then all the other things in the universe are those constructed from $a$.
(That is, $\mathrm{id}(a) = a$ which we already know about; $f(a) \in B$ which we don't yet know about so we'll note down that this is a new thing in the universe; and then $\mathrm{id}(f(b))$ which does already exist and is $f(b)$).
The second is the $B$-related model: declare that the type $B$ has a single element $b$, and then all the other things in the universe are those constructed from $b$.
(That is, $\mathrm{id}(b) = b$, and then there are no more ways to create elements because there are no other arrows from $B$.)
So the two term models we have constructed are:
\begin{itemize}
\item The one based at $A$, which has the type $A$ consisting of a single element, and the type $B$ consisting of a single element.
\item The one based at $B$, which has the type $A$ consisting of no elements at all, and the type $B$ consisting of a single element.
\end{itemize}
\subsection{More complex example}
We'll look at an example with three types: $A, B, C$ with arrows $f, g: A \to B$ and $h : B \to C$, as well as two distinct arrows $hf, hg: A \to C$.
One model of this is where $A = \mathbb{N}$, $B = \mathbb{Z}$, $C = \mathrm{Bool}$, and $f$ is the obvious injection $n \mapsto n$, $g$ is the ``negate'' function $n \mapsto -n$, and $h$ is the ``is negative'' function.
Then there will be three term models: one based at $A$, one at $B$, and one at $C$.
\begin{itemize}
\item At $A$: we have one element $a \in A$; then two elements of $B$, namely $f(a)$ and $g(a)$; then two elements of $C$, namely $hf(a)$ and $hf(b)$.
\item At $B$: we have one element $b \in B$; then only one element of $C$, namely $h(b)$.
\item At $C$: we have one element $c \in C$ only.
\end{itemize}
\subsection{A cyclic example}
Consider the category with two objects $A, B$ only, and arrows $f: A \to B$ and $g: B \to A$ which compose so that $gf = 1_A$ but $fg \not = 1_B$.
Then there are two term models:
\begin{itemize}
\item At $A$: there is $a \in A$, then $f(a) \in B$, then $gf(a) \in A$ but that is known to be just $a$, so we're done.
\item At $B$: there is $b \in B$, then $g(b) \in A$, then $fg(b) \in B$, then $gfg(b) = g(b) \in A$, and we're done.
\end{itemize}
\subsection{An infinite example}
Consider the category which is the monoid $\mathbb{N}$: namely, a single element $A$ and then an arrow $f_i$ for each $i \in \mathbb{N}$ such that $f_i f_j = f_{i+j}$.
Then there is just one term model, and its elements are $a, f_1(a), f_2(a), \dots$.
\subsection{General definition of the term model}
If you have been thinking categorically, you may have noticed that in every example above, if you simply remove the term from everything when we list elements, we end up with a collection of arrows in the category.
For example, in the natural numbers case, we had the elements $a, f_1(a), f_2(a), \dots$; delete the $a$ and we get $\{\mathrm{id}, f_1, f_2, \dots\}$.
And this, crucially, is simply a list of the arrows out of $A$.
In general, the collection of $\{\text{things which exist in the term model based at an object $A$}\}$ is isomorphic to the set of arrows out of $A$:
that is, it's the hom-set $\homfrom{A}$.
So the third key insight is that the term models are precisely the hom-sets of the category.
\section{The Yoneda lemma}
Now we can understand the Yoneda lemma's statement in the light of these new concepts:
\
\begin{thm}[The Yoneda lemma, in new terms]
Let $\mathcal{C}$ be an algebraic theory with unary predicates, and let $G: \mathcal{C} \to \Set$ be a model of that theory.
Let $A$ be a type in $\mathcal{C}$.
Then the collection of fixed model homomorphisms from the term model based at $A$ into the model $G$ is isomorphic to the set of things of type $A$ in the model $G$.
Moreover, the isomorphism is natural in both the type we chose, and the model we chose.
\end{thm}
\
After a little thought, the existence of the bijection is just obvious.
Indeed, a homomorphism from the term model based at $A$ into any other model $G$ is exactly defined by where $a$ goes: nothing exists in the term model except things which are derived by applying arrows to $a$,
so if we've decided where $a$ goes then we've decided where everything in the term model goes.
Hence for every fixed model homomorphism from $\homfrom{A}$ to $G$, we can canonically define a member of the concrete type $GA$ which is ``where did $a$ end up''.
Conversely, if we're given an element $x \in GA$ of the instantiation of the type $A$ in model $G$, we can canonically define a fixed model homomorphism from $\homfrom{A}$ by sending our abstract term $a$ to $x$, and letting all the rest of the elements of the term model get pulled along with it.
\subsection{Naturality}
I'm afraid I don't know of a good way to think about naturality other than just to draw out the diagrams and show they commute; but they're both easy and I can't be bothered to do them right now.
\section{Relation to the notion of the free model}
It turns out that the Yoneda lemma can be used to prove that the term models together form a ``free'' collection in some sense.
The terse way to say this is that the Yoneda embedding $A \mapsto \homfrom{A}$ is full and faithful from $\mathcal{C}^{\mathrm{op}} \to [\mathcal{C}, \Set]$.
In more elementary terms: the conversion from objects to models, given by taking an object and producing the term model, loses no information about the category.
If we select two types $A$ and $B$ in the category $\mathcal{C}$, and take a pair of different arrows $f, g : A \to B$, then these two arrows correspond to a pair of fixed model homomorphisms (natural transformations) between the term models $\homfrom{A}$ and $\homfrom{B}$ (given by ``replace $a$ by $f(a)$'' and ``replace $a$ by $g(a)$'' respectively).
Moreover, the two homomorphisms really are different from each other.
So given a category (a specification of an algebraic theory), we can produce a specific collection of models for that theory and a specific collection of homomorphisms between the models, such that all the information about the theory can be recovered from the models.
There are loads more models out there, and loads more homomorphisms between those extra models, but if we restrict our attention only to the term models then we recover all the information about the original category.
Moreover, remove any of the models or any of the homomorphisms, and we stop being able to pick out the theory we are modelling uniquely.
That is, the collection of term models is ``free'': they haven't lost us any information about the theory (we can use them to recover the original category entirely), and nor do they contain any extra information (\emph{every} fixed model homomorphism between term models is required, or else we have lost some information about the category).
\section{Acknowledgements}
This entire document is derived from an answer by Sridhar Ramesh on a Math Overflow answer at \url{https://mathoverflow.net/a/15143}.
\end{document}

View File

@@ -1 +1,12 @@
static/misc/FriedbergMuchnik/FriedbergMuchnik.tex
static/misc/ModularMachines/EmbedMMIntoTuringMachine.tex
static/misc/MonadicityTheorems/MonadicityTheorems.tex
static/misc/AdjointFunctorTheorems/AdjointFunctorTheorems.tex
static/misc/Tennenbaum/Tennenbaum.tex
static/misc/MultiplicativeDetProof/MultiplicativeDetProof.tex
static/misc/ParametricBoundedLoeb2016/ParametricBoundedLoeb2016.tex
static/misc/TokyoEntrance2016/TokyoEntrance2016.tex
static/misc/NonstandardAnalysis/NonstandardAnalysisPartIII.tex
static/misc/RepresentableFunctors/RepresentableFunctors.tex
static/misc/YonedaWithoutTears/YonedaWithoutTears.tex