A Universe of Sorts

Siddharth Bhat

Leave me your thoughts / Chat with me / Email me: [email protected]
Github / Math.se / Resume / Link hoard
RSS feed

You could have invented Sequents

Key idea: define a notation called Γ => Δ iff the conjunction of sentences in gamma implies the disjunction of terms in delta.
Why would anybody do this? isn't this weird?
It's because we first note what we need to think about consequence, validity, and unsatisfiability.
d1 is a consequence of Γ iff g1 /\ g2 .. /\ gn => d1
d1 is valid iff empty => d1, or written differently, 0 => {d1}.
Γ is unsatisfiable iff g1 /\ ... /\ gn => False, or written differently, Γ => 0
Thus, see that on the RHS, we need a set with 0 or 1 inhabitant. We can think of this as Maybe, smooshed together with \/, since we want the empty set to represent False.
Recall that haskell teaches us to replace failure with a list of successes!
Thus we should use Γ => Δ where on the RHS, we have a list that is smooshed together by or (\/)!
Great, we have successfully invented sequents.

Fibrational category theory, sec 1.1, sec 1.2

Key idea: can define a notion of a bundle p: E → B
The idea is that we want to generalize pullbacks into fibres.
A functor p: E \to B is called as a fibration if for each morphism f: b → b' downstairs, and an element e' ∈ E such that π(e') = b', then we have a lift of the morphism f into f♮, such that this morphism has a property called cartesianity.
Given:

        e'
        |

b ----> b'

We want:

  
e===f♮=>e'
|       |
|π      |π
v       v
b --f-->b'

Furthermore, to ensure that this is really a pullback, we ask for the condition that TODO

Omega sets

A set with some sort of "denotation by natural numbers" for each element of the set.
More formally, an omega set is a tuple (S, E: S → 2^N) such that E(s) ≠ ∅. The numbers E(s) are to be thought of as the denotation for the element s.
A morphism of omega sets (S, E) → (S', E') is a function f: S → S', and a partial function realiser(f): N → N such that for all s ∈ S, realiser(f)(E(s)) ⊂ E'(f(s)). That is, for every denotation d ∈ E(s), we have that the realiser realiser(f) maps d into the denotation of f(s), so we must have that d' = realiser(f(d)) lives in E'(f(s)).

PERs

This is a partial equivalence relation, so we only need symmetry and transitivity.
We consider partial equivalence relations (PERs) over N.
Let R be a PER. We think of those elements that are reflexively related (ie, xRx) as "in the domain" of the PER.
Thus we define domain(R) = { x | xRx }.
In this way, R is a real equivalence relation on domain(R).
We write N/R or domain(R)/R for the equivalence classes induced by R on N.
The category of PERs has as objects these PERs.
Intuitively, these give us subsets of the naturals ...

Split and Cloven Fibrations

A fibration is cloven if for every arrow downstairs, there is a chosen cartesian arrow upstairs. So we have a function that computes the cartesian arrow upstairs for an arrow downstairs. This is different from the regular definition where we just know that there exists something upstairs.
Note that given a fibration, we can always cleave the fibration using choice.
Recall the definition of cartesian. For a functor p: E → B, for every arrow downstairs u: I → J ∈ B and every object Y ∈ E lying above J (ie, p(Y) = J), there is a cartesian lift of u given by X → Y for some X lying above I. (p(X) = I).
Having made such a choice, every map u: I → J in B gives a functor u*: E_J → E_I from the the fiber E_J over J to the fiber E_I over I. (Direction changes, pullback)
Recall that E_J is a subcategory of E where the objects are p^{-1}(J), and the morphisms are p^{-1}(id_J).
Implement the map u* as u*(y) as that object given by lifting the map u: I → J along Y. This is well-defined since we have a clevage to pick a unique u*(Y)!

defn
u*(Y)-->Y
        |
        v
I -u--->J

For morphisms, suppose we are given an arrow f: Y → Y' in E_J. Then we use the cartesianity of the lifted morphism to give us the lift. Mediatate on the below diagram:

Simple Type Theory via Fibrations

Objects are contexts, so sequence of (term:type)
Morphisms between contents Γ = (v1:s1, v2:s2) and Δ = (w1:t1, w2:t2) are terms M1, M2 such that we have Γ |- M1 : t1 and Γ |- M2 : t2.
More cleaned up, for a context Γ, and a context Δ with sequence of types (_:t1, _:t2, ..., _:tn), a morphism is a sequence of terms Γ|- M1: t1, Γ|- M2:t2, ..., Γ|-Mn:tn.
For concreteness, let us suppose Δ = (w1:t1, w2: t2)
The identity morphism is Δ -(d1, d2)-> Δ, since we have d1 := w1:t1, w2:t2|-w1:t1 and d2 := w1:t1, w2:t2|-w2:t2. Thus, starting from Δ on the context, we can derive terms of types t1, t2, which are given by the derivations d1, d2.
Let us meditate on composition Γ -(d1, d2)-> Δ -(d1', d2')-> Θ. First off, let us write this more explicitly as:

Γ

(d1 := Γ|-M1:s1, d2 := Γ|-M2:s2)

Δ := (x1:s1, x2:s2)


(d'1 := Δ|-N1:t1, d2 := Δ|-N2:t2)

Θ := (_:t1, _:t2)

See that have (x1:s1, x2:s2)|- N1 : t1
If we substitute N1[x1 := M1, x2 := M2], then under context Γ, we know that M1:s1, and M2:s2, so they have the correct types to be substituted for x1 and x2. Thus, in context Γ, N1[x1 := M1, x2 := M2] has the same type it used to have (t1).
Thus we have that Γ |- N1[x1 := M1, x2 := M2] : t1.
This gives us the composite of the section of the morhphisms, by telling us how to compose d'1 with d1.
Do the same for d2.
What the hell is going on anyway?
Given any well typed term in a context, Γ|-M:t, we can think of this as a morphism Γ --M--> (Δ:=M:t).
This relative point of view (ala grothendieck) lets us extend to larger contexts.
The empty context is the terminal object, since there is precisely one morphism, consisting of the empty sequence (). Can be written as Γ-()-> 0.
The categorical product of contexts is given by sequencing (see that this needs exchange), and the projections are the "obvious" rules: Γ <--Γ-- (Γ;Δ)---Δ--> Δ.

Realisability models

For closed terms u, and type 𝜏 we are going to define u ⊩ 𝜏 (read “u is a realiser of 𝜏”). Let's say that we have PCF's type: ℕ and σ → τ.

u ⊩ ℕ if u reduces to a natural number literal
f ⊩ σ → τ if for all s⊩σ, f s ⊩ τ.

Some immediate observations:

This definition is by induction on the types, and assumes little from the terms (it's a logical relation).
This is “really what we mean” by x : 𝜏 (except non-termination, not modelled here): realisability has explanatory power.
This relation is usually undecidable.
It is strongly related to parametricity (in parametricity we associate a binary relation to types, in realisability we associate a (unary) predicate). 4/10
From this point of view, typing is a (usually decidable) modular approximation of realisability.
For instance, consider if x then 1 else (\y -> y). It isn't well-typed.
However, if we add let x = True in if x then 1 else (λy. y) it becomes a realiser of ℕ [because upon reduction, it produces an N, even though it is not "well-typed".
Typing and realisability are related by a lemma sometimes referred as adequacy.

Take the rule for λ (simplified):

x:X, y:Y ⊢ z : Z
-----------------------
x:X ⊢ λ(y: Y). z : Y → Z

You interpret it as a statement

∀ v⊩A, (∀ w ⊩ B, u[x\v, y\w] ⊩ C) ⟹  \lambda y.u[x\v] ⊩ C

Then, you prove this statement. 7/10

Once you've done so for each rule, you can conclude (very easy induction) that if ⊢ u : A, then u ⊩ A. This gives type safety, since realisers are practically defined as verifying type safety.

What you've gained along the way is that this proof is an open induction. 8/10

In standard combinatorial proofs of type safety, the induction hypothesis may depend on every case. Adding a form in your language may require redoing the entire proof. Here the various case in the adequacy lemma will remain true. So you can just prove the new cases. 9/10

There are many variants of realisability. Tuned for logic, with models instead of terms, … My favourite is Krivine's realisability with both terms and stacks, and magic happening when they interact. But this is another story and shall be told another time. 10/10

Naming left closed, right open with start/stop

Call the variables startPos and stopPos. Since it's called stop, it's a little more intuitive that it's exclusive!

Nested vs mutual inductive types:

inductive m1
| mk: m2 -> m1

inductive m2
| mk: m1 -> m2

inductive n1: Type := 
| mk: n2 n1 -> n1

inductive n2 (a: Type): Type :=
| nil: n2 a
| cons: a -> n2 a -> n2 a

MiniSketch

https://github.com/sipa/minisketch

Finger trees data structure

https://dl.acm.org/doi/abs/10.1145/3406088.3409026

HAMT data structure

https://en.m.wikipedia.org/wiki/Hash_array_mapped_trie

Embedding HOL in Lean

inductive Sets where
| bool: Sets
| ind: Nat -> Sets
| fn: Sets -> Sets -> Sets

def Sets.denote: Sets -> Type 
| bool => Prop
| ind => nat
| fn i o => i.denote -> o.denote

def ifProp (p: Prop) (t: a) (e: a) : a := by 
  match Classical.lem p with 
  | Or.inl _ => t 
  | Or.inr _ => e

def Model := Σ (s: Sets), s.denote

Module system for separate compilation

https://github.com/leanprover/lean4/issues/416
https://www.cs.utah.edu/plt/publications/macromod.pdf
https://raw.githubusercontent.com/alhassy/next-700-module-systems/master/phd-defence.pdf
https://raw.githubusercontent.com/alhassy/next-700-module-systems/master/thesis.pdf

Second order arithmetic

First order arithmetic has variables that range over numbers
Second order arithmetic has variables that range over sets of numbers
Ref: Jeremy Avigad on forcing
Axiomatic second-order arithmetic is often termed “analysis” because, by coding real numbers and continuous functions as sets of natural numbers, one can develop a workable theory of real analysis in this axiomatic framework.

Lean4 Dev Meeting

Mathport uses matli4 to move tactics.
Mathlib4 has syntax definitions ofr every single tactic that exists in mathlib.
These only exist as syntax so far. We need to port this.
The goal for this week is to have as many as possible knocked off.

Macro

A tactic that expands to another tactic.
Example: _ tactic. This expands into ({}), which shows you the current state.
macro_rules need a piece of Syntax, and it expands into another tactic.

-- | do the iff.rfl as well.
macro_rules | `(tactic| rfl => `(tactic| exact Iff.rfl)

Closed syntax categories: syntax rcasesPatLo := ....
Open syntax categories: syntax X.

How to collate info?

Use macro to define syntax + macro
Use elab to define syntax + elaborator together.
Add command to list all places where something was extended.
Add information into docstrings.
match_target.

`Mapsto` arrow

(x: α) \mapsto e: maybe annoying to parse.
\lamnda (x: \alpha) \mapsto e`: easy to parse, but mathematicians don't know what lambda is.

`ext` tactic

implemented as a macro tactic, which uses an elab tactic.

`colGt`

`syntax "ext"
Lean4 is whitespace sensitive, like python. colGt says that we can have the following syntax on a column that is greater than the current line.

ext
  x -- parsed as part of `x`.
  y -- parsed as part of `y`.
z -- is not parsed as part of `x y`.

If in parens, we don't need colGt, because we want to allow something like:

ext (x
 y) -- should parse.

`ppSpace`

Used when pretty printing a tactic.

Scoped syntax

scoped syntax "ext_or_skip: .. .
This declares a syntax that is valid only for the current section/namespace.
Trailing percent ext_proof% is an indicator that it is a term macro / term elaboration.
Protected: identifier cannot appear without prefixing namespaces.

Trivia

{..} pattern matches on a struct.

Tactic development: `Trace`

Create a new file Mathlib/Tactic/Trace.lean
Move the syntax line from ./Mathlib/Mathport/Syntax.lean into Mathib/Tactic/Trace.lean.
On adding a file, add it to Mathlib.lean. So we add import Mathlib.Tactic.Trace
We also want to re-import the syntax into Mathlib/Mathport/Syntax.lean.
We have now moved trace into its own file and hooked into the build system and extension.
The first thing to do is to find out what the tactic even does.
Go to trace at the mathlib docs.

-- Tactic.lean
import Lean
open Lean Meta Elab

syntax (name := trace) "trace " term : tactic

elab "foo" : tactic => do
  -- `TacticM Unit` expected
  logInfo "hi"

open Lean Meta Elab Tactic ~=
open Lean
open Lean.Meta
open Lean.Elab
open Lean.Elab.Tactic

TacticM is a MonadRef, which is aware of source spans to report the errors. so we can write:

elab "foo" : tactic => do
  logInfo "hi"

We can use withRef to control the current source span where errors are reported.

elab tk:"foo" val:term : tactic => do
  withRef tk (logInfo val)

We want to evaluate the val:term, because otherwise, it literally prints the syntax tree for things like (2 + (by trivial)).
Use elabTerm to elaborate the syntax into a term.
set_option trace.Elab.definition true in ... which printso ut the declarations that are being sent to the kernel.
elab is syntax + elab_rules together, just like macro is syntax + macro_rules together.
Create a test file test/trace.lean. Import the tactic, and write some examples.
Recompile, and check that the test works.
How do we check that our port works?

Reducible

To clarify, @[reducible] marks the definition as reducible for typeclass inference specifically. By default typeclass inference avoids reducing because it would make the search very expensive.

Categorical model of dependent types

Motivation for variants of categorical models of dependent types
Seminal paper: Locally cartesian closed categories and type theory
A closed type is interpreted as an object.
A term is interpreted as a morphism.
A dependent type upon $X$ is interpreted as an object of the slice category $C/X$.
A dependent type of the form x: A |- B(x) is a type corresponds to morphisms f: B -> A, whose fiber over x: A is the type f^{-1}(x) = B(x).
The dependent sum $\Sigma_{x : A} B(x)$ is given by an object in $Set/A$, the set $\cup_{a \in A} B_a$. The morphism is the morphism from $B_a \to A$ which sends an elements of $B_{a_1}$ to $a_1$, $,B_{a_2}$ to $a_2$ and so forth. The fibers of the map give us the disjoint union decomposition.
The dependent product $\Pi_{x: A} A(x)$ is given by an object in $Set/A$, the set $
We can describe both dependent sum and product as arising as adjoints to the functor $Set \to Set/A$ given by $X \mapsto (X \times A \to A)$.
Recalling that dependent types are interpreted by display maps, substitution of a term tt into a dependent type BB is interpreted by pullback of the display map interpreting BB along the morphism interpreting tt.
Reference

Key ideas

Intro to categorical logic
Contexts are objects of the category C
Context morphisms are morphisms f: Γ → Δ
Types are morphisms σ: X → Γ for arbitrary X
Terms are sections of σ: X → Γ, so they are functions s: Γ → X such that σ . s = id(Γ)
Substitution is pullback

Why is substitution pullback?

Suppose we have a function $f: X \to Y$, and we have a predicate $P \subseteq Y$.
The predicate can be seen as a mono $P_Y \xrightarrow{py} Y$, which maps the subset where $P$ is true into $Y$.
now, the subset $P_X \equiv P_Y(f(x))$, ie, the subset $P_X \equiv { x : f(x) \in P_Y }$ is another subset $P_X \subseteq X$.
See that $P_X$ is a pullback of $P$ along $f$:

P_X -univ-> P_Y
|            |
px            py
|            |
v            v
X -----f---> Y

This is true because we can think of $Q_X \equiv { x \in X, y \in P_Y: f(x) = py(y) }}$.
If we imagine a bundle, at each point $y \in Y$, there is the presence/absence of a fiber $py^{-1}(y)$ since $py$ is monic.
When pulling back the bundle, each point $x \in X$ either inherits this fiber or not depending on whether $f(x)$ has a fiber above it.
Thus, the pullback is also monic, as each fiber of $px$ either has a strand or it does not, depending on whether $py$ has a strand or not.
This means that $px(x)$ has a unique element precisely when $f(x)$ does.
This means that $px$ is monic, and represents the subset that is given by $P_Y(f(x))$.

Isn't substitution composition?

If instead we think of a subset as a function $P_Y: Y \to \Omega$ where $\OmegaS is the subobject classifier, we then get that $P_X$ is the composite $P_X \equiv P_Y \circ f$.
Similarly, if we have a "regular function" $f: X \to Y$, and we want to substitute $s(a)$ ($s: A \to X$ for substitution) into $f(x)$ to get $f(s(a))$, then this is just computing $f \circ s$.

Using this to do simply typed lambda calculus

Introduction to categories and categorical logic
Judgement of the form A1, A2, A3 |- A becomes a morphism A1xA2xA3 → A.
Stuff above the inference line will be arguments, stuff below the line will be the return value.
Eg, the identity judgement:

Γ,A |- A

becomes the function snd: ΓxA → A.

Display maps

Reference: Substitution on nlah
To to dependent types in a category, we can use display maps.
The display map of a morphism $p: B \to A$ represents $x:A |- B(x): Type$. The intuition is that $B(x)$ is the fiber of the map $p$ over $x:A$.
For any category $C$, a class of morphisms $D$ are called display maps iff all pullbacks of $D$ exist and belong to $D$. Often, $D$ is also closed under composition.
Said differently, $D$ is closed under all pullbacks, as well as composition.
A category with displays is well rooted if the category has a terminal object $1$, and all maps into $1$ are display maps (ie, they can always be pulled back along any morphism)$.
This then implies that binary products exist (?? HOW?)

Categories with families

Lectures notes on categorical logic

How should relative changes be measured

by Leo Tornqvist, Pentti Vartia and Yrjo O. Vartia
https://www.etla.fi/wp-content/uploads/dp681.pdf

Logic of bunched implications

http://www.lsv.fr/~demri/OHearnPym99.pdf

Coends

Dual of an end
A cowedge is defined by injections into the co-end of all diagonal elements.

p(a, a)   p(b, b)
  \          /
  π[a]      π[b]
   \        /
    v      v
     \int^x p(x, x)

It's a universal cowedge, so every cowedge c other must factor.

p(a, a)   p(b, b)
  \   \  /    /
  π[a]  c   π[b]
  \     |    /
   \    ∃!  /
    \   |  /
    v   v  v
     \int^x p(x, x)

Now we have the cowedge condition. For every morphism h: b -> a, and for every cowedge c, the following must hold:

   [p b a]
  /      \
[p a a]   [p b b]
  \      /
      c

By curry howard, type coend p = exists a. p a a

data Coend p where
  MkCoend :: p a a -> Coend p

type End (p :: * -> * -> *) = forall x. p x x

A functor is continuous if it preserves limits.
Recall that Hom functor preserves limits.

-- Hom(\int^x p(x, x), r) ~= \int_x (Hom(p x x, r)) 
type Hom a b = a -> b

Set(\int^x p(x, r), s) is asking for a function Coend p -> r.
But e claim this is the same as having forall a. (p a a -> r).
So we can write Set(\int^x p(x, x), r) ~= \int_x Set(p(x, x), r).

-- | \int_x Set(p(x, x), r)
-- | \int_x Hom(p(x, x), r)
-- | \int_x RHS(x,x)
       where RHS a b = Hom(p(a, b), r)
type RHS p r a b = Hom (p a b) r -- rhs of the above expression

The isomorphisms are witnessed below, reminiscent of building a continuation

-- fwd :: (Coend p -> r) -> End (RHS p r)
-- fwd :: (Coend p -> r) -> (forall x. (RHS r) x x)
-- fwd :: (Coend p -> r) -> (forall x. Hom (p x x) r)
-- fwd :: (Coend p -> r) -> (forall x. (p x x) -> r)
fwd :: Profunctor p => (Coend p -> r) -> (forall x. (p x x) -> r)
fwd coendp2r  pxx = coendp2r (MkCoend pxx)

The backward iso, reminiscent of just applying a continuation.

-- bwd :: End (RHS p r)             -> (Coend p -> r) 
-- bwd :: (forall x. (RHS r) x x)   -> (Coend p -> r) 
-- bwd :: (forall x. Hom (p x x) r) -> (Coend p -> r) 
-- bwd :: (forall x. (p x x) -> r)  -> (Coend p -> r) 
bwd :: Profunctor p => (forall x. (p x x) -> r) -> Coend p -> r
bwd pxx2r (MkCoend paa) = pxx2r paa

Ninja coyoneda lemma: \int^x C(x, a) * f(x) ~= f(a)
Witnessed by the following:

-- ninja coyoneda lemma:
-- \int^x C(x, a) * f(x) ~= f(a)
-- the profunctor is \int^x NinjaLHS[f, a](x, y)
--   where
newtype NinjaLHS g b y z = MkNinjaLHS (y -> b, g z)

Forward iso:

-- ninjaFwd :: Functor f => Coend (NinjaLHS f a) -> f a
ninjaFwd :: Functor g => Coend (NinjaLHS g r) -> g r
ninjaFwd (MkCoend (MkNinjaLHS (x2r, gx))) = fmap x2r gx

Backward iso:

-- ninjaBwd :: Functor f => g r -> (Coend (NinjaLHS g r))
-- ninjaBwd :: Functor f => g r -> (∃ x. (NinjaLHS g r x x))
-- ninjaBwd :: Functor f => g r -> (∃ x. (NinjaLHS (x -> r, g x))
ninjaBwd :: Functor g => g r -> Coend (NinjaLHS g r)
ninjaBwd gr = MkCoend (MkNinjaLHS (x2r, gx)) where
   x2r = id -- choose x = r, then x2r = r2r 
   gx = gr -- choose x = r

We can prove the ninja coyoneda via coend calculus plus yoneda embedding, by using the fact that yoneda is full and faithful.
So instead of showing LHS ~= RHS in the ninja coyoneda, we will show that Hom(LHS, -) ~= Hom(RHS, -).
We compute as:

Set(\int^x C(x, a) * f(x), s) ~? Set(f(a), x)
[continuity:]
\int_x Set(C(x, a) * f(x), s) ~? Set(f(a), x)
[currying:]
\int_x Set(C(x, a), Set(f(x) s)) ~? Set(f(a), x)
[ninja yoneda on Set(f(-), s):]
Set(f(a) s)) ~? Set(f(a), x)
Set(f(a) s)) ~= Set(f(a), x)

Ninja Coyoneda for containers

The type of NinjaLHS, when specialized to NinjaLHS g b r r becomes (r -> b, g r).
This is sorta the way you can get a Functor instance on any g, by essentially accumulating the changes into the (r -> b). I learnt this trick from some kmett library, but I'm not sure what the original reference is.
Start with NinjaLHS:

-- ninja coyoneda lemma:
-- \int^x C(x, a) * f(x) ~= f(a)
-- the profunctor is \int^x NinjaLHS[f, a](x, y)
--   where
newtype NinjaLHS g b y z = MkNinjaLHS (y -> b, g z)

Specialize by taking the diagonal:

-- newtype NinjaLHS' g i o = MkNinjaLHS' (i -> o, g i)
newtype NinjaLHS' g i o = MkNinjaLHS' (NinjaLHS g o i i)

Write a smart constructor to lift values into NinjaLHS':

mkNinjaLHS' :: g i -> NinjaLHS' g i i
mkNinjaLHS' gi = MkNinjaLHS' (MkNinjaLHS (id, gi))

Implement functor instance for NinjaLHS' g i:

-- convert any storage of shape `g`, input type `i` into a functor
instance Functor (NinjaLHS' g i) where
  -- f:: (o -> o') -> NinjaLHS' g i o -> NinjaLHS' g i o'
  fmap o2o' (MkNinjaLHS' (MkNinjaLHS (i2o, gi))) = 
    MkNinjaLHS' $ MkNinjaLHS (\i -> o2o' (i2o i), gi)

See that to be able to extract out values, we need g to be a functor:

extract :: Functor g => NinjaLHS' g i o -> g o
extract (MkNinjaLHS' (MkNinjaLHS (i2o, gi))) = fmap i2o gi

Natural Transformations as ends

Bartosz: Natural transformations as ends
Ends generalize the notion of product/limit. It's sort of like an infinite product plus the wedge condition.
$\int_X p x x$ is the notation for ends, where $p$ is a profunctor.
Remember dimap :: (a' -> a) -> (b -> b') -> p a b -> p a' b'. Think of this as:

-----p a' b'-------
a' -> [a -> b] -> b'
       --pab---

The set of natural transformations is an end.
Haskell: type Nat f g = forall x. f x -> g x.
We can think of this as the "diagonal" of some end p x x for some profunctor p we need to cook up.
type p f g a b = f a -> g b. Is p f g a profunctor?

dimap :: (a' -> a) -> (b -> b') -> (f a -> g b) -> (f a' -> g b')
dimap a'2a b2b' fa2gb = \fa' -> 
  let fa = fmap a'2a fa'
  let gb = fa2gb fa
  let gb' - fmap b2b' gb
  in gb'
dimap a'2a b2b' fa2gb = (@fmap g b2b')  . fa2gb  . (@fmap f a'2a)

Clearly, from the above implementation, we have a profunctor.
So we have a profunctor P(a, b) = C(Fa, Gb).
In haskell, the end is End p = forall a. p a a.
In our notation, it's \int_x C(Fx, Gx).
Recall the wedge condition. For a profunctor p: Cop x C -> C, and any morphism k: a -> b for a, b ∈ C, the following diagram commutes for the end \int_X p(X,X):

 (p x x, p y y, p z z, ... infinite product)
\int_x p(x,x)
 /[πa]  [πb]\
v            v
p(a,a)       p(b,b)
 \            /
[p(id, k)]  [p(k,id)]
   \        /
    v      v
     p a b

If we replace p x x with with our concrete p a b = C(fa, gb), we get:

    (forall x. f x -> g x)
    /               \
  [@ a]            [@ b]
   v                v
 τa:(f a -> g a)     τb:(f b -> g b)
   \                  /
dimap id(a) k τa    dimap k id(b) τb
   \                 /
    \               τb.(@fmap f k): (f a-> g b) 
     \              /
     \           COMMUTES?
     \            /
    (@fmap g k).τa(f a -> g b)

This says that gk . τa = τb . fk
But this is a naturality condition for τ!
So every end corresponds to a natural transformation, and τ lives in [C, D](f, g).
This shows us that the set of natural transformations can be seen as an end (?)
I can write \int_a D(fa, ga) ~= [C, D](f, g)

Invoking Yoneda

Now, yoneda tells us that [C, Set](C(a, -), f(-)) ~= f(a).
Now I write the above in terms of ends as \int_x Set(C(a, (x)), f(x)) ~= f(a).
So we can write this as a "point-full" notation!
In haskell, this would be forall x. (a -> x) -> f x ~= f a.

Ends and diagonals

Bartosz: Wedges
Let's think of Cop x C, and an element on the diagonal (a, a), and a function f: a -> b.
Using the morphism (id, f), I can go from (a, a) to (a, b).
If we have (b, b), I can once again use f to go fo (a, b).
So we have maps:

     b,b
    / |
   /  |
  /   |
 /    v
a,a-->a,b

This tells us that if we have something defined on the diagonal for a profunctor p a a, we can "extrapolate" to get data everywhere!
How do we get the information about the diagonal? Well, I'm going to create a product of all the diagonal elements of the profunctor.
so we need a limit L, along with maps L -> p c c for each c. This kind of infinite product is called a wedge (not yet, but soon).
The terminal object in the category of wedges is the end.
But our cone is "under-determined". We need more data at the bottom of the cone for things to cohere.
suppose at the bottom of the cone, we want to go from p a a to p b b. for this, I need morphisms (f: b -> a, g: a -> b) to lift into the profunctor with dimap.
We might want to impose this as coherence condition. But the problem is that there are categories where we don't have arrows going both ways (eg. partial orders).
So instead, we need a different coherence condition. If we had a morphism from a -> b, then we can get from p a a --(id, f)-->p a b. Or, I can go from p b b --(f, id)-->p a b. The wedge condition says that these commute. So we need p id f . pi_1 = p f id . pi_2

Relationship to haskell

How would we define this wedge condition in haskell?
Because of parametricity, haskell gives us naturality for free.
How do we define an infinite product? By propositions as types, this is the same as providing ∀x..
End p = forall a. p a a
We can define a cone with apex A of a diagram D: J -> C as a natural transformation cone(A): Const(A) => D. What's the version for a profunctor?
Suppose we have a profunctor diagram P: J^op x J -> C. Then we have a constant profunctor Const(c) = \j j' -> c. Then the wedge condition (analogue of the cone condition) is to ask that we need a dinatural transformation cone': Const(A) => P.
NOTE: a dinatural transformation is STRICTLY WEAKER than a natural transformation from J^opxJ -> C.
Suppose we have transformation that is natural in both components. That is to say, it is a natural transfrmation of functors of the type [J^op x J -> C]. This means that we have naturality arrows α(a,b): p(a,b) -> q(a,b). Then the following must commute, for any f: a -> b by naturality of α:

      p(b,a)
    /    |
[p(f,id)]|
  /      |
p(a,a)  [α(b,a)]
 |       |
[α(a,a)] |
 |       |
 |    q(b,a)
 |     /
 |  [q(f,id)]
 |   /
q(a,a)

Similarly, other side must commute:

      p b a
    /   |  \
[p f id]|   [p id f]
  /     |     \
p a a  [α b a] p b b
 |      |        |
[α a a] |      [α b b]
 |      |        |
 |    q b a      |
 |     /   \     |
 |  [q f id]\    |
 |   /  [q id f] |
 |  /          \ |
q a a         q b b

I can join the two sides back together into a q a b by using [q id f] and [q f id]. The bottom square commutes because we are applying [q f id] and [q id f] in two different orders. By functoriality, this is true because to q(f.id, id.f) = q(f,f) = q(id.f, f.id).

      p b a
    /   |  \
[p f id]|   [p id f]
  /     |     \
p a a  [α b a] p b b
 |      |        |
[α a a] |      [α b b]
 |      |        |
 |    q b a      |
 |     /   \     |
 |  [q f id]\    |
 |   /  [q id f] |
 |  /          \ |
q a a         q b b
  \             /
 [q id f]      /
    \        [q f id]
     \      /
     q a b

If we erase the central node [q b a] and keep the boundary conditions, we arrive at a diagram:

      p b a
    /      \
[p f id]    [p id f]
  /           \
p a a          p b b
 |               |
[α a a]        [α b b]
 |               |
 |               |
 |               |
 |               |
 |               |
 |               |
q a a         q b b
  \             /
 [q id f]      /
    \        [q f id]
     \      /
     q a b

Any transformation α that obeys the above diagram is called as a dinatural transformation.
From the above, we have proven that any honest natural transformation is a dinatural transformation, since the natural transformation obeys the diagram with the middle node.
In this diagram, see that we only ever use α a a and α b b.
So for well behavedness, we only need to check a dinatural transformation at the diagonal. (diagonal natural transformation?)
so really, all I need are the diagonal maps whoch I will call α' k = α a a.
Now, a wedge is a dinatural transformation from constant functor to this new thingie.

Parabolic dynamics and renormalization

Video

Quantifiers as adjoints

Consider S(x, y) ⊂ X × Y, as a relation that tells us when (x, y) is true.
We can then interpret ∀x, S(x, y) to be a subset of Y, that has all the elements such that this predicate holds. ie, the set { y : Y | ∀ x, S(x, y) }.
Similarly, we can interpret ∃x, S(x, y) to be a subset of Y given by { y : Y | ∃ x, S(x, y) }.
We will show that these are adjoints to the projection π: X × Y → Y.
Treat P(S) to be the boolean algebra of all subsets of S, and similarly P(Y).
Then we can view P(S) and P(Y) to be categories, and we have the functor π: P(S) → P(Y).
Recall that in this boolean algebra and arrow a → b denotes a subset relation a ⊆ b.

A first try: direct image, find right adjoint

Suppose we want to analyze when π T ⊆ Z, with the hopes of getting some condition when T ⊆ ? Z where ? is some to-be-defined adjoint to π.
See that π T ⊆ Z then means ∀ (x, y) ∈ T, y ∈ Z.

     T
   t t t
   t t t
    |
    v
---tttt---- π(T)
-zzzzzzzzz--Z

Suppose we build the set Q(Z) ≡ { (x, y) ∈ S : y ∈ Z }. That is to say, Q ≡ π⁻¹(Z). (Q for inverse of P).
Then, it's clear that we have π T ⊂ Z implies that T ⊆ Q(Z) [almost by definition].
However, see that this Q(Z) construction goes in the wrong direction; we want a functor from P(S) to P(Y), which projects out a variable via ∃ / ∀. We seem to have built a functor in the other direction, from P(Y) to P(S).
Thus, what we must actually do is to reverse the arrow π: S ⊆ X × Y → Y, and rather we must analyze π⁻¹ itself, because its adjoints will have the right type.
However, now that we've gotten this far, let's also analyze left adjoints to π.

Direct image, left adjoint

Suppose that Z ⊆ π T. This means that for every y ∈ Z, there is some x_y such that (x_y, y) ∈ T

     T
   t t t
   t t t
    |
    v
---tttt---- π(T)
----zz--------Z

I want to find an operation ? such that ? Z ⊆ T.
One intuitive operation that comes to mind to unproject, while still reminaing a subset, is to use π⁻¹(Z) ∩ T. This would by construction have that π⁻¹(Z) ∩ T ⊆ T.
Is this an adjoint? we'll need to check the equation :).

Inverse image, left adjoint.

Suppose we consider π⁻¹ = π* : P(Y) → P(S).
Now, imagine we have π*(Z) ⊆ T.

    S
    -
    -   
   tttt
   tztt
   tztt T
   tztt
    ^^
    || π*(Z)
----zz-------Z

In this case, we can say that for each z ∈ Z, for all x ∈ X such that (x, z) ∈ S, we had (x, z) ∈ T.
Consider the set ∀ T ≡ { y ∈ T: ∀ x, (x, y) ∈ S => (x, y) ∈ T}.
Thus, we can say that π*(Z) ⊂ T iff Z ⊂ ∀ T.
Intuitively, T ⊂ π*(π(T)), so it must be "hard" for the inverse image of a set Z (π*(Z)) to be contained in the set T, because inverse images cannot shrink the size.
Furthermore, it is the right adjoint to π*(Z) because the ???

Different types of arguments in Lean4:

(x: T) regular argument
[S: Functor f] typeclass argument / argument resolved by typeclass resolution
{x: T}: Maximally implicit argument, to be inferred.
⦃x: T⦄: Non-maximally-inserted implicit argument. It is instantiated if it can be deduced from context, and remains uninstantiated (ie, no metavariable is introduced) otherwise.

In Coq people shun away from this binder. I'm not sure why, I guess there are issues with it at a larger scale. We could get rid of it. For the paper it's utterly irrelevant in my opinion

Big list of lean tactics

split allows one to deal with the cases of a match pattern. This also allows one to case on an if condition.
cases H: inductive with | cons1 => sorry | cons2 => sorry is used to perform case analysis on an inductive type.
cases H; case cons1 => { ... }; case cons2 => { ... } is the same , but with slightly different syntax.
rewrite [rule] (at H)? performs the rewrite with rule. But generally, prefer simp [rule] (at H), because simp first runs the rewrite, and then performs reduction. But if simp does not manage to perform a rewrite, it does not perform reduction, which can lead to weird cases like starting with let H = x = true in match x | true => 1, false => 2. On running rewrite [H], we get match true | true => 1, false => 2. And now if we run simp, it performs no reduction. On the other hand, if we had run simp [H], it would rewrite to match true | true => 1 | false => 2 and then also perform reduction to give 1.

Hyperdoctrine

A hyperdoctrine equips a category with some kind of logic L.
It's a functor P: T^op -> C for some higher category C, whose objects are categories whose internal logic corresponds to L.
In the classical case, L is propositional logic, and C is the 2-category of posets. We send A ∈ T to the poset of subobjects Sub_T(A).
We ask that for every morphism f: A -> B, the morphism P(f) has left and right adjoints.
These left and right adjoints mimic existential / universal quantifiers.
If we have maps between cartesian closed categories, then the functor f* obeys frobenius reciprocity if it preserves exponentials: f*(a^b) ~iso~ f*(a)^f*(b).

Algebra of logic

Lindenbaum algebras : Propositional logic :: Hyperdoctrines : Predicate logic
Work in a first order language, with a mild type system.
Types and terms form a category $B$ (for base.
Interpretations are functors which map $B$ to algebras.

Syntax

e | X. e is untyped. X is typed.
e is a sequence of symbols. X is a set of variables, which intuitively are the free variable.
Every variable that has a free occurence in the untyped part should also occur in the typed pat.
eg. R x[1] x[2] | { x[1], x[2] }
Not every variable in the typing part needs to occur in the untyped part.
eg. R x[1] x[2] | { x[1], x[2], x[3] }. (dummy variable x[3]).
Variables: x[1], ...
constants: c[1], ...
Separators: |, {, }, ().
c[i] | {} is a unary term.
x[i] | {x[i]} is a unary term.
If t |X is nary and s | Y is m-ary, then ts | X U Y is a n+mary term.
if t|X is n-ary and y is a variable, then t | X U {y} is a n-ary term.

Formulas.

if R is a n-ary predicate and t|X is a n-ary term, then Rt|X is a formula.
if phi|X is a formula and x ∈ X, then ∀x, phi | X - {x} is a formula.

A category of types and terms.

Find a natural way to view terms in our language as arrows in our category.
s | Y. Y tells us card(Y) name shaped gaps. s is a len(s) name shaped things.
s:codomain, Y: domain.
Question: when should composition be defined? the types have to match for t|X . s|Y.
So we should have card(X) = len(s).
We want the composition to be substitution. t|X . s|Y = t[X/s]|Y.
eg. x3|{x3, x4} . x1a3 | {x1, x2} = x1|{x1, x2}. (substitute x3 by x1, and x4 by x2.)
eg. x3 x4|{x3, x4} . x1a3 | {x1, x2} = x1 a3|{x1, x2}. (substitute x3 by x1, and x4 by x2.)

Problem: we don't have identity arrows!

Left identities x1 x2 | {x1, x2} is not a right identity!
But in a category, we want two sided identity.
The workaround is to work with an equivalence class of terms.
Define equivalence as t|X ~= t(X/Y)|Y.
Arrows are equivalence classes of terms.

Reference

Hyperdoctrines and why you should care about them

Fungrim

https://fredrikj.net/math/fungrim2022.pdf
They want to integrate with mathlib to have formal definitions.

Category where coproducts of computable things is not computable

Modular lattices are an algebraic variety.
Consider the category of modular latties.
The free modular lattice on 2 elements and on 3 elements has dediable equality, by virtue of being finite.
The free modular lattice on 5 elements does not have decidable equality.
The coproduct of free modular lattice on 2 and 3 generators is the free modular lattice on 5 generators, because $F(2 \cup 3) = F(2) \sqcup F(3)$ (where $2, 3$ are two and three element sets), because free is left adjoint to forgetful, and the left adjoint $F$ preserve colimits!

Homotopy continuation

Rigorous arithmetic with approximate roots of polynomials --- CAG L16

Relationship between linearity and contradiction

https://xorshammer.com/2021/04/08/but-why-is-proof-by-contradiction-non-constructive/

Monads from Riehl

I'm having some trouble enmeshing my haskell intuition for monads with the rigor, so this
A category is said to be monadi is an expository note to bridge the gap.

What is a monad

A monad is an endofunctor T: C -> C equipped with two natural transformations:
(1) return/eta: idC => T [yeeta, since we are yeeting into the monad.]
(2) join/mu: T^2 => T, such that two laws are obeyed:
First law: mu, T commutation:

T^3(x) --T(@mu@x)--> T^2@x
|                   |
mu@(T@x)          mu@x
|                   |
v                   v
T^2(x)---mu@x-----> T(x)

Second law: mu, eta cancellation:

(Tx) --eta@(T@x)--> T^2(x)
|EQ                 |
|                   |
T@(eta@x)         mu@x
|                   |
v                 EQv
T^2(x)---mu@x---> T(x)

mu . eta T = mu . T eta = 1

Monad from adjunction

Any adjunction between a Free functor F: L -> H and a forgetfUl/Underlying functor U: H -> L F |- U gives a monad. The categories are named L, H for lo, high in terms of the amount of structure they have. We go from low structure to high structure by the free functor.
The monad on L is given by T := UF.
Recall that an adjunction gives us pullback: (F l -> h) -> (l -> U h) and pushfwd: (l -> U h) -> (F l -> h). The first is termed pullback since it takes a function living in the high space and pulls it back to the low space.
This lets us start with (F l -> F l), peel a F from the left via pullback to create (l -> U (F l)). That is we have return: l -> T l.
In the other direction, we are able to start with (U h -> U h), peel a U from the right via pushforward to create (F U h -> h). This allows us to create the counit as T^2 l = F U F U l = F (U F) U l -> F U l = T l.

Algebra for a monad $C^T$.

Any monad, given by (T: C -> C, return: 1C => T, join: T^2 => T) has a category of T-algebras associated to it.
The objects of T-alg are morphisms f: Tc -> c.
The morphisms of T-alg between f: Tc -> c and g: Td -> d are commuting squares, determined by an arr: c -> d

Tc -T arr-> Td
|           |
f           g
|           |
v           v
c   -arr->  d

The notation for the category as $C^T$ makes some sense, since it consists of objects of the form Tc -> c which matches somewhat with the function notation. We should have written $C^{TC}$ but maybe that's too unweildy.

Factoring of forgetful functor of adjunction

Any adjunction (F: L -> H, U: H -> L) with associated monad T allows us to factor U: H -> L as:

H -Stx-> L^T -forget-> L

So we write elements of H in terms of syntax/"algebra over L". We then forget the algebra structure to keep only the low form.
The way to think about this is that any object in the image of U in fact has a (forgotten) algebra structure, which is why we can first go to L^T and then forget the algebraic structure to go back to L. It might be that this transition from H to L^T is very lossy. This means that the algebra is unable to encode what is happening in H very well.

Monadic adjunction

Let us consider an adjunction (F: L -> H, U: H -> L) with monad T. Factor U via L^T as:

H -Stx-> L^T -forget-> L

The adjunction is said to be monadic if in the factoring of U via L^T, it happens that H ~= L^T. That is, Stx is an equivalence between H and L^T.
The way to think about this is that any object in the image of U in fact has a (forgotten) algebra structure, and this algebra structure actually correctly represents everything that was happening in H.
Another way to say the adjunction F: L -> H: U is monadic is to say that is that F is monadic over U. We imagine the higher category H and the free functor F lying over L and U.
Warning: This is a STRONGER condition than saying that UF is a monad. UF is ALWAYS a monad for ANY adjunction. This says that H ~= L^T, via the factoring H -Stx-> L^T -forget-> L.
We sometimes simply say that U is monadic, to imply that there exists an F such that UF is an adjunction and that U ~= L^T.

Category of models for an algebraic theory

A functor is finitary if it preserves filtered colimits.
In particular, a monad T : L -> L is finitary if it preserves filtered colimits in C.
If a right adjoint is finitary, then so is its monad because its left adjoint preserves all colimits. Thus, their composite preserves filtered colimits.
A category H is a category of models for an algebraic theory if there is a finitary monadic functor U : H -> Set.

Limits and colimits in categories of algebras

We say that H is monadic over L iff the adjunction F: L -> H: U such that the monad T: L -> L := UF gives rise to an equivalence of categories H ~= L^T.

Riehl: Limits and colimits in categories of algebras

Here, we learn theorems about limits and colimits in L^T.

Lemma 5.6.1: If `U` is monadic over `F`, then `U` reflects isos

That is, if for some f: h -> h', if Uf: Uh -> Uh' is an iso, then so is f.
Since the adjunction F |- U is a monadic adjunction (U is monadic over F), we know that H ~= L^T, and U equals the forgetful functor (H = L^T) -> L.
Write the arrow f: h -> h' as an arrow in L^T via the commuting square datum determined by g: h -> h':

Tl-Tg->Tl'
|      |  
a      a'
|      |
v      v
l--g-->l'

Since we assume that U(Tg) is iso, this means that g is iso. This means that there exists a g' which is the inverse of g. But this means that the diagram below commutes:

Tl<-Tg'-Tl'
|      |  
a      a'
|      |
v      v
l<-g'--l'

For a proof, we see that a' . Tg = g . a'. Composing by g' on left, giving: g' . a' . Tg = a'. Composing by Tg' on the right, we get: g'. a' = a' . Tg'. That's the statement of the above square.
This means we have created an inverse Tg', which reflects g' into L^T.

Corollary 5.6.2: Bijective continuous functions in CHaus are isos

When we forget to Set, we see that bijections are the isos. Thus, in CHaus (compact haussdorff spaces) which is monadic over Set, we have that the arrows that forget to become isos in set, ie, continuous bijections are also isos.

Corollary 5.6.4: Any bĳective homomorphism arising from a monadic adjunction which forgets to `Set` will be iso

Follow the exact same proof.

Thm 5.6.5.i A monadic functor `U: H -> L` creates any limits that `L` has.

Since the equivalence H ~= L^T creates all limits/colimits, it suffices to show the result for U^T: L^T -> L.
Consider a diagram D: J -> C^T with image spanned by (T(D[j]) -f[j]-> D[j]).
Consider the forgotten diagram U^TD: J -> C with image spanned by D[j]. Create the limit cone P (for product, since product is limit) with morphisms pi[j]: P -> D[j]. We know this limit exists since we assume that L has this limit that U^T needs to create.
We can reinterpret the diagram D: J -> C^T as a natural transformation between two functors Top, Bot: J -> C. These functors are Top(j) := T(D[j]), Bottom(j) := D[j].
The the natural transformation is given by eta: Top => Bottom, with defn eta(j) := D[j]-f[j]-> D[j] where f[j] is given by the image of (T(D[j]) -f[j]-> D[j]).
So we see that eta: Top => Bot can also be written as eta: TD => D since Top ~= TD and Bot ~= D.
Now consider the composition of natural transformations Const(TL) =Tpi=> TD =gamma=> D all in J -> C. This gives us a cone with summit TL.
This cone with summit TL factors through L via the unique morphism lambda: TL -> L. We wish to show that (TL -lambda-> L) is a T-algebra, and is the limit of D.
Diagram chase. Ugh.

Corollary 5.6.6: The inclusion of a reflective subcategory creates all limits

The inclusion of a reflective subcategory is monadic.
This lets us create all limits by the above proposition.

Corollary 5.6.7: Any category monadic over `Set` is complete

Set has all limits.
The forgetful functor U: H -> L creates all limits that L=Set has.
Thus H has all limits, ie. is complete.

Corollary 5.6.9: `Set` is cocomplete

The contravariant power set functor P: Set^op -> Set is monadic.
Set has all limits, and P creates all limits.
Thus all limits of Set^op exist, ie, all colimits of Set exist.

Category of models for alg. theory is complete

TODO

Category of algebras has coproducts

We show how to construct the free product of monoids via haskell. The same principle generalizes for any algebraic theory:

import Control.Monad(join)

-- |(a*|b*)* ~~simplify~~> (a|b)*
eval :: Monoid a => Monoid b => [Either [a] [b]] -> [Either a b]
eval = map (either (Left . mconcat) (Right . mconcat))

-- | a*|b* -> (a|b)* with no simplification
transpose :: Either [a] [b] -> [Either a b]
transpose = either (map Left) (map Right)

-- | (a*|b*)* -> (a|b)* with no simplification
flatten :: [Either [a] [b]] -> [Either a b]
flatten = join . map transpose


-- force: eval = flatten | via coequallizer

If $T: C \to C$ is finitary and $C$ is complete and cocomplete, then so is $C^T$

We have already seen that if $C$ is complete then so is $C^T$
We have also seen that $C^T$ contains coproducs
So is we show that $C^T$ has coequalizers, then we get cocomplete, since any colimit can be expressed as coproduct-coequalizer.
To show that all coequalizers exists is to show that there is an adjoint to the functor const: [C^T] -> [J -> C^T] where J := [a -f,g-> b] is the diagram category for coequalizers.
Recall that the adjoint sends a diagram [J -> C^T] to the nadir that is the coequalizer in C^T.
See that the constant functor trivially preserves limits.
To show that it possesses an adjunction, we apply an adjoint functor theorem (fuck me). In particular, we apply the general adjoint functor theorem, so we must show that the solution set condition is satisfied.
Recall that the solution set condition for $F: C \to D$ requires that for each $d \in D$, the comma category $const(d) \downarrow F$ admit weakly initial objects.
Unwrapping that definition: For each $d \in D$, there is a solution set. Tht is, there exists a small set $I$ and a family of objects $c_I$ and a family of morphisms $f_I: d \to F(c_i)$ such that any morphism $d \to F(c)$ in $D$ can be factored via some $f_i$ as $d \xrightarrow{f_I} F(c_i) \xrightarrow{g} F(c) = d \to F(c)$.
To apply the theorem, we must produce a solution set for every object in [J -> C^T], that is, for each parallel pair of morphisms $f, g: (A, \alpha) \to (B, \beta)$.
We will produce a solution set with a single element by creating a fork $(Q, u)$ such that any other fork factors through this fork (perhaps non uniquely!) So we create:

$$ (A, \alpha) \xrightarrow{f, g} (B, \beta) \xrightarrow{q} (Q, u) $$

If we know how to create coequalizers in $C^T$, then this would be easy: we literally just create a coequalizer.
Instead, we create some "approximation" of the coequalizer with $(Q, u)$.
To start with, we define $q_0: B \to Q_0$ in $C$ of the pair $(A \xrightarrow{f, g} B$).
If $Tq_0$ would be the coequalizer of $Tf, Tg$ then we are done. But this is unlikely, since a monad need not preserve coequalizers.
Instead, we simply calculate the coequalizer of $Tf, Tg$ and call this $q_1: B \to Q_1=TQ_0$.
Repeat inductively to form a directed limit (colimit).
Monad preserves filtered colimits, since in $UF$, $F$ the left adjoint preserves all colimits, and $U$ the right adjoint preserves colimits since it simply forgets the data in

Combinatorial Cauchy Schwarz

Version 1

Suppose you have r pigeons and n holes, and want to minimize the number of pairs of pigeons in the same hole.
This can easily be seen as equivalent to minimizing the sum of the squares of the number of pigeons in each hole: $\min_{h: i > j} (h[i] - h[j])^2$ where $h[i]$ is the hole of the $i$th pigeon.
Classical cauchy schwarz: $x_1^2 + x_2^2 + x_3^2 \geq 1/2(x_1 + x_2 + x_3)^2$
Discrete cauchy schwarz: On placing a natural number of pigeons in each hole, The number of pairs of pigeons in the same hole is minimized iff pigeons are distributed as evenly as possible.
Pigeonhole principle: When $r = m + 1$, the best split possible is $(2, 1, 1, \dots)$.

Version 2

I recently learned about a nice formulation of this connection from a version of the Cauchy–Schwarz inequality stated in Bateman's and Katz's article.
Proposition: Let $X$ and $Y$ both be finite sets and let f:X→Y be a function.
$|ker f| \cdot |Y| \geq |X|^2$. (Where ker f is the kernel of f, given as the equalizer of X*X-f*f-> X. More explicitly, it is the subset of X*X ker(f) := { (x, x') : f(x) = f(x') }).
Equality holds if and only if every fiber has the same number of elements.
This is the same as the version 1, when we consider $f$ to be the function $h$ which assigns pigeons to holes. Every fiber having the same number of elements is the same as asking for the pigeons to be evenly distributed.
Compare: $|ker(f)| \cdot |Y| \geq |X|^2$ with $(x_1^2 + x_2^2 + x_3^2) \cdot n \geq (x_1 + x_2 + x_3)^2$. Cardinality replaces the action of adding things up, and $|X|^2$ is the right hand side, $|ker(f)|$ is the left hand side, which is the sum of squares.

Bezout's theorem

On Bezout's theorem Mc coy
Let $k$ be algebraically closed.
Let $R \equiv k[x, y, z]$ be ring.
We wish to detect number of intersections betweeen $f, j \in k[x, y, z]$ counted upto multiplicity.
For any point $a \in k$, denote $R_a$ to be the localization of $R$ at the multiplicative subset $D_a \equiv { f \in R: f(a) \neq }$ ($D$ for does not vanish).
So $R_a \equiv D_a^{-1}(R)$, which concentrates attention around point $a$.

Intersection multiplicity $if \cap g$

Define the intersection multiplicyt of $f, g$ at $a$ by notation $if \cap g$.
Defined as $if \cap g \equiv dim_k(R_a/(f, g)_a)$.
That is, we localize the ring at $a$ and quotient by the ideal generated by $f, g$, and then count the dimension of this space as a $k$ vector space.

$f(a) \neq 0$ or $g(a) \neq 0$ implies $if \cap g \equiv 0$

WLOG, suppose $f(a) \neq 0$. Then localization at $a$ makes $f$ into a unit. The ideal $(f, g)_a \equiv R_a$ since the ideal explodes due to the presence of the local unit $f_a$. Thus, $R_a/(f, g)_a \equiv 0$.

$f(a) = 0$ and $g(a) = 0$ implies $if \cap g \neq 0$.

If both vanish, then $(f, g)_a$ is a real ideal of $R_a$.

Examples

$x-0$ and $y-0$ at $(0, 0)$ have multiplicity $D_{(0, 0)}^{-1}(k[x, y]/(x, y))$ which is just $k$, which has dimension $1$. So they intersect with dimension $1$.
$x-1$ and $y-1$ at $(0, 0)$ have multiplicity $D_{(0, 0)}^{-1}(k[x, y]/(x - 1, y - 1))$. The ideal $(x - 1, y - 1)$ blows up because $x - 1 \in D_{(0, 0)}$, and thus the quotient is $0$, making the dimension $0$.
$x^2-y$ and $x^3-y$ at $(0, 0)$ gives quotient ring $k[x, y]/(x^2-y, x^3-y)$, which is the same as $k[x, y]/(x^2 - y, x^3 - y, 0)$, which is equal to $k[x,y]/(x^2, x^3, y)$, which ix $k[x]/(x^2)$. This is the subring of the form ${ a + bx : a,b \in k }$ which has dimension $2$ as a $k$ vector space. So this machinery actually manages to captures the degree 2 intersection between $y=x^2$ and $y=x^3$ at $(0, 0)$.

Intersection cycle ($f \cap g$)

Define $f \cap g \equiv \sum_{a \in \texttt{space}} if \cap g \cdot a.$
It's a generating function with intersection multiplicity as coefficients hanging on the clothesline of points.

Intersection number $\hash(f \cap g)$

Given by $\hash(f \cap g) \equiv \sum_{a \in \texttt{space}} if \cap g$. This is the count of number of intersections.

Lemma: $f \cap g = g \cap f$

Mental note: replace $f \cap g$ with "ideal $(f, g)$" and stuff makes sense.
Follow immediately since $(f, g) = (g, f)$ and the definition of $if \cap g = R_a/(f, g)_a$ which is equal to $R_a/(g, f)_a = ig \cap f$

Lemma: $f \cap (g + fh) = f \cap g$

$(f, g + fh) \equiv (f, g)$.

$f \cap gh \equiv f \cap g + f \cap h$

Heuristic: if $f(a)$ and $gh(a)$ vanish, then either $f(a), g(a)$ vanish or $f(a), h(a)$ vanish, which can be counted by $f \cap g + f \cap h$

Lemma: if $f, g$ are nonconstant and linear then $\hash(f \cap g) = 1$.

Recall that we are stating this within the context of $k[x, y, z]$.
So $f, g$ are homogeneous linear polynomials $f(x, y, z) = ax + by$, $g(x, y, z) = cx + dy$.
Sketch: if they have a real solution, then they will meet at unique intersection by linear algebra.
if they do not have a unique solution, then they are parallel, and will meet at point at infinity which exists because we have access to projective solutions.

Lemma: homogeneous polynomial $g \in k[p, q]$ factorizes as $\alpha_0 p^t \prod_{i=1}{n-t}(p - \alpha_i q)$: $\alpha_0 \neq 0$ and $t > 0$

Key idea: see that if it were $g \in k[p]$, then it would factorize as $p^t \prod_i (p - \alpha_i)$
To live in $k[p, q]$, convert from $g(p, q) \in k[p, q]$ to $g(p/q, q/q) \in k[(p/q)]$, which is the same as $g(t, 1) \in k[t]$.
Since we are homogeneous, we know that $g(\lambda p, \lambda q) = \lambda^{deg(g)} g(p, q)$. This lets us make the above transform:
$g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i +k = n} (p/q - \alpha_i)$.
$g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i + k = n} (p - \alpha_i q)/q$.
$g(p/q, q/q) = g(p/q, 1) = p^k/q^k \cdot (1/q^{n-k}) \cdot \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p/q, q/q) = g(p/q, 1) = p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p, q) = q^n \cdot g(p/q, 1) = q^n \cdot p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p, q) = q^n \cdot g(p/q, 1) = p^k \prod_{i : i + k = n} (p - \alpha_i q)$.
This proves the decomposition that $g(p, q) = q^k \prod_i (p - \alpha_i q)$.

Lemma: homogeneous polynomial $g \in k[p, q]$ factorizes as $\alpha_0 q^t \prod_{i=1}{n-t}(p - \alpha_i q)$ with $t > 0$.

This is different from the previous step, since we are pulling out a factor of $q^t$ this time!
We cannot argue "by symmetry" since the other terms are $(p - \alpha_i q)$. If it really were symmetry, then we should have $(q - \alpha_i p)$ which we don't.
So this new lemma is in fact DIFFERENT from the old lemma!
Key idea: see that if it were $g \in k[p]$, then it would factorize as $p^t \prod_i (p - \alpha_i)$
To live in $k[p, q]$, convert from $g(p, q) \in k[p, q]$ to $g(p/q, q/q) \in k[(p/q)]$, which is the same as $g(t, 1) \in k[t]$.
Since we are homogeneous, we know that $g(\lambda p, \lambda q) = \lambda^{deg(g)} g(p, q)$. This lets us make the above transform:
$g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i +k = n} (p/q - \alpha_i)$.
$g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i + k = n} (p - \alpha_i q)/q$.
$g(p/q, q/q) = g(p/q, 1) = p^k/q^k \cdot (1/q^{n-k}) \cdot \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p/q, q/q) = g(p/q, 1) = p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p, q) = q^n \cdot g(p/q, 1) = q^n \cdot p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
$g(p, q) = q^n \cdot g(p/q, 1) = p^k \prod_{i : i + k = n} (p - \alpha_i q)$.
This proves the decomposition that $g(p, q) = q^k \prod_i (p - \alpha_i q)$.

Lemma: $f \in k[x, y, z]$ and $g \in [y, z]$ homogeneous have $def(f) deg(g)$ number of solutions

This is the base case for an induction on the degree of $x$ in $g$. here, the degree of $x$ in $g$ is zero.
to compute $i[f(x, y, z) \cap g(y, z)]$, we write it as $i[f(x, y, z) \cap z^k \prod_{i : i + k = n} (y - \alpha_i z)]$
This becomes $i[f(x, y, z) \cap y^k] + \sum_i i[f(x, y, y) \cap (y - \alpha_i z)]$.
Intersecting with $y^k$ gives us $k$ times the intersection of $y$ with $f(x, y, z)$, so we have the eqn $i[f(x, y, z) \cap z^k] = k i[f(x, y, z) \cap z]$.
The full eqn becomes $k i[f(x, y, z) \cap z] + \sum_i i[f(x, y, z) \cap (y - \alpha_i z)]$.

Solving for $i[f(x, y, z) \cap z]$

Let's deal with the first part.
See that $i[f(x, y, z) \cap z]$ equals $i[f(x, y, 0) \cap z]$, because we want a common intersection, thus can impose $z = 0$ on $f(x, y, z)$.
We now write $f(x, y, 0) = \mu y^t \prod_j (x - \beta_j y)$.

Solving for $i[f(x, y, z) \cap (y - \alpha_i z)]$

Here, we must impose the equation $y = \alpha_i z$.
Thus we are solving for $f(x, z, \alpha_i z)$. Once again, we have an equation of two variables, $x$ and $z$.
Expand $f(x, z, \alpha_i z) = \eta_i z^{l_i} \prod_{j=1}{m - l_i}(x - \gamma_{ij} z)$
This makes the cycles to be $l_i (z \cap (y - \alpha_i z)) + \sum_j (x - \gamma_{ij} z) \cap (y - \alpha_i z)$.
The cycle $(z \cap (y - \alpha_i z))$ corresponds to setting $z = 0, y - \alpha_i z = 0$, which sets $y=z=0$. So this is the point $[x:0:0]$.
The other cycle is $(x - \gamma_{ij} z) \cap (y - \alpha_i z)$, which is solved by $(\gamma_{ij} z : \alpha_i z : z)$.
In total, we see that we have a solution for every cycle.

Inductive step

Let $deg(f)$ denote total degree of $f$, $deg_x(f)$ denote $x$ degree.
Let $\deg_x(f) \gep deg_x(g)$.
We treat $f, g$ as polynomials in a single variable $x$, ie, elements $(k[y, z])[x]$.
We want to factorize $f$ as $f = Qg + R$. But to do this, we need to enlarge the coefficient ring $k[y, z]$ into the coefficient field $k(y, z)$ so the euclidean algorithm can work.
So we perform long division to get polynomials $Q, R \in (k(y, z)[x]$ such that $f = Qg + R$.
Since $f, g$ are coprime, we must have $R$ nonzero. Now these $Q, R$ are rational functions since they live in $k(y, z)$.
Take common denominator of $Q, R$ and call this $h \in k[y, z]$ (ie, it is the polynomial denominator).
Then $hf = (hQ)g + (hR)$ which is $hf = qg + r$ where $q \equiv hQ \in k[y, z]$ and $r \equiv hR \in k[y, z]$. So we have managed to create polynomials $q, r$ such that $hf = qg + r$.
Let $c = gcd(g, r)$. $c$ divides $g$ and $r$, thus it divides $qg + r$, ie, it divides $hf$.
Dividing through by $c$, we get $h'f = qg' + r'$, where $h = h'c$, $g = g'c$, $r = r'c$.
We assume (can be shown) that these are all homogeneous.
Furthermore, we started with $gcd(g, f) = 1$. Since $g'$ divides $g$, we have $gcd(g', f) = 1$.
$c$ cannot divide $f$, since $c = gcd(g, r)$, and $g, f$ cannot share nontrivial common divisors. Thus, $gcd(c, f) = 1$.
We have some more GCDs to check, at the end of which we write the intersection equation:

$$ f \cap g = () $$

Example for invariant theory

Consider $p(z, w) = p_1 z^2 + p_2 zw + p_3 w^2$ --- binary forms of degree two.
The group $SL(2, Z)$ acts on these by substituting $(z, w) \mapsto PSL(2, Z) (z, w)$.
We can write the effect on the coefficents explicitly: $(p_1', p_2', p_3') = M (p_1, p_2, p_3)$.
So we have a representation of $SL(2, Z)$.
An example
IAS lecture

Counterexample to fundamental theorem of calculus?

Integral of 1/x^2 from [-1, 1] should equal -1/x evaluated at (-1, 1) which gives -1/1 - (-(-1)/1), that is, -1 - 1 = -2.
But this is absurd since $1/x^2$ is always positive in $[-1, 1]$.
What's going wrong?

Why a sentinel of `-1` is sensible

See that when we have an array, we usually index it with an array index of 0 <= i < len.
If len = 0, then the only "acceptable" i is -1, since it's the greatest integer that is less that len=0.

Data structure to maintain mex

offline

Key idea: maintain a set of numbers that we have not seen, and maintain set of numbers we have seen. Update the set of unseen numbers on queries. The mex is the smallest number of this set.

online

Key idea: exploit cofinality. Decompose set of numbers we have not seen into two parts: a finitary part that we maintain, and the rest of the infinite part marked by an int that tells us where the infinite part begins.

set<int> unseen;
map<int, int> freq;
// unseen as a data structure maintains
// information about [0..largest_ever_seen]
int largest_ever_seen; 


void init() {
    unseen.insert(0);
}

void mex_insert(int k) {
    freq[k]++;
    for(int i = largest_ever_seen+1; i <= k; ++i) {
        unseen.insert(i);
    }
    unseen.erase(k);
    largest_ever_seen = max(largest_ever_seen, k);
}

void mex_delete(int k) {
    assert(freq[k] >= 1);
    freq[k]--;
    if (freq[k] == 0) {
        unseen.insert(k);
    }
}

int mex_mex() {
    assert(!unseen.empty());
    return *unseen.begin();
}

Scatted algebraic number theory ideas: Ramification

I've had Pollion on math IRC explain ramification to me.

15:17 <Pollion> Take your favorite dedekind domain.
15:17 <bollu> mmhm
15:17 <Pollion> For instance, consider K a number field
15:17 <Pollion> and O_K the ring of integers.
15:17 <Pollion> Then take a prime p in Z.
15:18 <Pollion> Since Z \subset O_K, p can be considered as an element of O_K, right ?
15:18 <bollu> yes
15:18 <Pollion> Ok. p is prime in Z, meaning that the ideal (p) = pZ is a prime ideal of Z.
15:18 <bollu> yep
15:18 <Pollion> Consider now this ideal, but in O_K
15:18 <bollu> right
15:19 <Pollion> ie the ideal pO_K
15:19 <bollu> yes
15:19 <Pollion> It may not be prime anymore
15:19 <bollu> mmhm
15:19 <Pollion> So it factors as a product of prime ideals *of O_K*
15:20 <Pollion> pO_K = P_1^e_1....P_r^e_r
15:20 <Pollion> where P_i are distinct prime ideals of O_K.
15:20 <bollu> yes
15:20 <Pollion> You say that p ramifies in O_K (or in K) when there is some e_i which is > 1
15:21 <Pollion> Example
15:21 <Pollion> Take Z[i], the ring of Gauss integers.
15:22 <Pollion> It is the ring of integers of the field Q(i).
15:22 <Pollion> Take the prime 2 in Z.
15:23 <bollu> (2) = (1 + i) (1 - i) in Z[i] ?
15:23 <Pollion> Yes.
15:23 <Pollion> But in fact
15:23 <Pollion> The ideal (1-i) = (1+i) (as ideals)
15:23 <Pollion> So (2) = (1+i)^2
15:23 <Pollion> And you can prove that (1+i) is a prime ideal in Z[i]
15:23 <bollu> is it because (1 - i)i = i + 1 = 1 + i?
15:24 <Pollion> Yes
15:24 <bollu> very cool
15:24 <Pollion> Therefore, (2) ramifies in Z[i].
15:24 <bollu> is it prime because the quotient Z[i]/(1 - i) ~= Z is an integral domain? [the quotient tells us to make 1 - i = 0, or to set i = ]
15:24 <Pollion> But you can also prove that primes that ramify are not really common
15:24 <bollu> it = (1 - i)
15:25 <Pollion> In fact, 2 is the *only* prime that ramifies in Z[i]
15:25 <Pollion> More generally, you only have a finite number of primes that ramify
15:25 <bollu> in any O_K?

Coreflection

A right adjoint to an inclusion functor is a coreflector.

Torsion Abelain Group -> Abelian Group

If we consider the inclusion of abelian groups with torsion into the category of abelian groups, this is an inclusoin functor.
This has right adjoint the functor that sends every abelian group into its torsion subgroup.
See that this coreflector somehow extracts a subobject out of the larger object.

Group -> Monoid

inclusion: send groups to monoids.
coreflection: send monoid to its group of units. (extract subobject).

Contrast: Reflective subcategory

To contrast, we say a category is reflective if the inclusion $i$ has a left adjoint $T$.
In this case, usually the inclusion has more structure, and we the reflector $T$ manages to complete the larger category to shove it into the subcategory.
Eg 1: The subcategory of complete metric spaces embeds into the category of metric spaces. The reflector $T$ builds the completion.
Eg 2: The subcategory of sheaves embeds into the category of presheaves. The reflector is sheafification.

General Contrast

$T$ (the left adjoint to $i$) adds more structure. Eg: completion, sheafification.
This is sensible because it's the left adjoint, so is kind of "free".
$R$ (the right adjoint to $i$) deletes structure / pulls out substructure. Eg: pulling out torsion subgroup, pulling out group of units.
This is sensible because it's the right adjoint, and so is kind of "forgetful", in that it is choosing to forget some global data.

Example from Sheaves

This came up in the context of group actions in Sheaves in geometry and logic.
Suppose $G$ is a topological group. Consider the category of $G$ sets, call it $BG$.
If we remove the topology on $G$ to become the discrete topology, we get a group called $G^\delta$. This has a category of $G^\delta$ sets, called $BG^\delta$.

Better `man` Pages via `info`

I recently learnt about info, and it provides so much more quality than man!
info pages about things like sed and awk are actually useful.

The Zen of juggling three balls

Hold one ball in the left hand A, two in the right hand B, C. This initial configuration is denoted [A;;B,C].
throw B from the right hand to the left hand. This configuration is denoted by [A;B←;C] where the B← is in the middle since it is in-flight, and has ← since that's the direction its travelling.
When the ball B is close enough to the left hand that it can be caught, throw ball A. Thus the configuration is now [;(A→)(B←);C].
Now catch ball B, which makes the configuration [B;A→;C].
With the right hand, throw C (to anticipate catching A). This makes the configuration [B;(A→)(C←);]
Now catch the ball A, which makes the configuration [B;C←;A].
See that this is a relabelling of the state right after the initial state. Loop back!

The Zen

The key idea is to think of it as (1) "throw (B)" (2) "throw (A), catch (B)", (3) "throw (C), catch (A)", and so on.
The cadence starts with a "throw", and then settles into "throw, catch", "throw catch", "throw, catch", ...
This cadence allows us to actually succeed in the act of juggling. It fuses the hard parts of actually freeing a hand and accurately catching the ball. One can then focus attention on the other side and solve the same problem again.

Example of lattice that is not distributive

Take a 2D vector space, and take the lattice of subspaces of the vector space.
Take three subspaces; a = x, b = y, c = x + y.
Then see that c /\ (a \/ b) = c, while c /\ a = c /\ b = 0, so (c /\ a) \/ (c /\ b) = 0.

Patat

Make slides that render in the terminal!
https://github.com/bollu/patat

Common Lisp LOOP Macro

Loop with index:

(loop for x in xs for i from 0 do ...)

Nested loop appending

(loop for x in `(1 2 3 4) append
      (loop for y in `(,x ,x) collect (* y y))

Mitchell-Bénabou language

<<<<<<< HEAD

Link

Hyperdoctrine

A hyperdoctrine equips a category with some kind of logic L.
It's a functor P: T^op -> C for some higher category C, whose objects are categories whose internal logic corresponds to L.
In the classical case, L is propositional logic, and C is the 2-category of posets. We send A ∈ T to the poset of subobjects Sub_T(A).
We ask that for every morphism f: A -> B, the morphism P(f) has left and right adjoints.
These left and right adjoints mimic existential / universal quantifiers.
If we have maps between cartesian closed categories, then the functor f* obeys frobenius reciprocity if it preserves exponentials: f*(a^b) ~iso~ f*(a)^f*(b). =======
https://ncatlab.org/nlab/show/Mitchell-B%C3%A9nabou+language

origin/master

Why is product in Rel not cartesian product?

Monoidal category

Intuitively, category can be equipped with $\otimes, I$ that makes it a monoid.

Cartesian Monoidal category

A category where the monoidal structure is given by the categorical product (universal property...).

Fox's theorem: Any Symmetric Monoidal Category with Comonoid is Cartesian.

Let C be symmetric monoidal under $(I, \otimes)$.
A monoid has signature e: () -> C and .: C x C -> C.
A comonoidal structure flips this, and gives us copy: C -> C x C, and delete: C -> ().
Fox's theorem tells us that if the category is symmetric monoidal, and has morphisms $copy: C \to C \otimes C$, and $delete: C \to I$ which obey some obvious conditions, then the monoidal product is the categorical product.

Rel doesn't have the correct cartesian product

This is because the naive product on Rel produces a monoidal structure on Rel.
However, this does not validate the delete rule, because we can have a relation that does not relate a set to anything in the image. Thus, A -R-> B -!-> 1 need not be the same as A -!-> 1 if R does not relate A to ANYTHING.
Similarly, it does not validate the copy rule, because first relating and then copying is not the same as relating to two different copies, because Rel represents nondeterminisim.

Locally Caretesian Categories

A category is locally cartesian if each of the slice categories are cartesian.
That is, all $n$-ary categorical products (including $0$-ary) exist in the slice category of each object.
MLTT corresponds to locally cartesian categories

`simp` in Lean4

Lean/Elab/Tactic/Simp.lean:

"simp " (config)? (discharger)? ("only ")? ("[" simpLemma,* "]")? (location)?
@[builtinTactic Lean.Parser.Tactic.simp] def evalSimp : Tactic := fun stx => do
  let { ctx, fvarIdToLemmaId, dischargeWrapper } ← withMainContext <| mkSimpContext stx (eraseLocal := false)
  -- trace[Meta.debug] "Lemmas {← toMessageData ctx.simpLemmas.post}"
  let loc := expandOptLocation stx[5]
  match loc with
  | Location.targets hUserNames simplifyTarget =>
    withMainContext do
      let fvarIds ← hUserNames.mapM fun hUserName => return (← getLocalDeclFromUserName hUserName).fvarId
      go ctx dischargeWrapper fvarIds simplifyTarget fvarIdToLemmaId
  | Location.wildcard =>
    withMainContext do
      go ctx dischargeWrapper (← getNondepPropHyps (← getMainGoal)) (simplifyTarget := true) fvarIdToLemmaId
where
  go (ctx : Simp.Context) (dischargeWrapper : Simp.DischargeWrapper) (fvarIdsToSimp : Array FVarId) (simplifyTarget : Bool) (fvarIdToLemmaId : FVarIdToLemmaId) : TacticM Unit := do
    let mvarId ← getMainGoal
    let result? ← dischargeWrapper.with fun discharge? => return (← simpGoal mvarId ctx (simplifyTarget := simplifyTarget) (discharge? := discharge?) (fvarIdsToSimp := fvarIdsToSimp) (fvarIdToLemmaId := fvarIdToLemmaId)).map (·.2)
    match result? with
    | none => replaceMainGoal []
    | some mvarId => replaceMainGoal [mvarId]

Big list of Lean4 TODOS

Hoogle for Lean4.
show source in doc-gen4.
mutual structure definitions.
Make Lean4 goals go to line number when pressing <Enter>
Convert lean book into Jupyter notebook?

`unsafePerformIO` in Lean4:

First do the obvious thing, actually do the IO:

unsafe def unsafePerformIO [Inhabited a] (io: IO a): a :=
  match unsafeIO io with
  | Except.ok a    =>  a
  | Except.error e => panic! "expected io computation to never fail"

Then wrap a "safe" operation by the unsafe call.

@[implementedBy unsafePerformIO]
def performIO [Inhabited a] (io: IO a): a := Inhabited.default

Big List of Lean4 FAQ

FVar: free variables
BVar: bound variables
MVar: metavariables [variables for unification].
Lean.Elab.Tactic.*: tactic front-end code that glues to Lean.Meta.Tactic.*.

Sheaves in geometry and logic 1.2: Pullbacks

Pullbacks are fiber bundles.
Pullbacks for presheaves are constructed pointwise.
The pullback of $f$ along itself in set is going to be the set of $(x, y)$ such that $f(x) = f(y)$.
The pullback of $f: X \to Y$ along itself in an arbitrary category is an object $P$ together parallel pair of arrows P -k,k'-> X called the kernel pair.
$f$ is monic iff both arrows in the kernel pair are identity X -> X.
Thus, any functor preserving pullbacks preserves monics, (because it preserves pullback squares, it sends the kernel pair with both arrows identity to another kernel pair with both arrows identity. This means that the image of the arrow is again a monic).
The pullback of a monic along any arrow is monic.
The pullback of an epi along any arrow is epi in set, but not necessarily always!

Sheaves in geometry and logic 1.3: Characteristic functions of subobjects

    !
 S --> 1
 v     |
 |     | true
 v     v
 X---->2
  phi(S)

true: 1 -> 2 is the unique monic such that true(1) = 1 (where 2 = {0, 1})
For all monic m: S -> X , there must be a unique phi(S): X -> 2 such that the diagram is a pullback.
Then 1 -true-> 2 is called as the subobject classifier. See that 1 is also determined (it is terminal object). So the only "choice" is in what 2 is and what the morphism 1 -true-> 2 is.
The definition says that every monic is the pullback of some universal monic true.

Subobject category

Define an equivalence relation between two monics m, m': S, S' -> X where m ~ m' iff there is an iso i: S -> S'' such that the triangle commutes:

  S --i--> S'
   \      /
   m\    /m'
     v  v
      X

$Sub_C(X)$ is the set of all subobjects of $X$.
to make the idea more concrete, let C = Set and let X = {1, 2}. This has subobjects [{}], [{1}], [{2}], [{1, 2}].
To be clear, these are given by the map m0: {} -> {1, 2} (trivial), m1: {*} -> {1, 2} where m1(*) = 1, m2: {*} -> {1, 2} where m2(*) = 2, and finally m3: {1, 2} -> {1, 2} given by id.
The category $C$ is well powered when $Sub_C(X)$ is a small set for all $X$. That is, the class of subobjects for all $X$ is set-sized.
Now given any arrow $f: Y \to X$, then pulllback of a monic $m: S -> X$ along $f$ is another monic $m': S' \to Y$. (recall that pullback of monic along any arrow is monic).
This means that we can contemplate a functor $Sub_C: C^{op} \to \texttt{Set}$ which sends an object $C$ to its set of subobjects, and a morphism $f: Y \to X$ to the pullback of the subobjects of $X$ along $f$.
If this functor is representable, that is,

$G$ bundles

If $E \to X$ is a bundle, it is a $G$-bundle if $E$ has a $G$ action such that $\pi(e) = \pi(e')$ iff there is a unique $g$ such that $ge = e'$. That is, the base space is the quotient of $E$ under the group, and the group is "just enough" to quotient --- we don't have redundancy, so we get a unique $g$.
Now define the space $GBund(X)$ to be the set of all $G$ bundles over $X$.
See that if we have a morphism $f: Y \to X$, we can pull back a $G$ bundle $E \to X$ to get a new bundle $E' \to Y$.
Thus we can contemplate the functor GBund: Space^op -> Set which sends a space to the set of bundles over it.
A bundle V -> B is said to be a classifying bundle if any bundle E -> X can be obtained as a pullback of the universal bundle V -> B along a unique morphism X -> B.
in the case of the orthogonal group Ok, let V be the steifel manifold. Consider the quotient V/Ok, which is the grassmanian. So the bundle V -> Gr is a G bundle. Now, some alg. topology tells us that is in fact the universal bundle for Ok.
The key idea is that this universal bundle V -> B represents the functor that sends a space to its set of bundles, This is because any bundle E -> X is uniquely determined by a pullback X -> B! So the base space B determines every bundle. We can recover the bundle V -> B by seeing what we get along the identity B -> B.

Sieves / Subobject classifiers of presheaves

Let P: C^op -> Set be a functor.
Q: C^op -> Set is a subfunctor of P iff Q(c) ⊂ P(c) for all c ∈ C and that Qf is a restriction of Pf.
The inclusion Q -> P is a monic arrow in [C^op, Set]. So each subfunctor is a subobject.
Conversely, all subobjects are given by subfunctors. If θ: R -> P is a monic natural transformation (ie, monic arrow) in the functor category [C^op, Set], then each θC: RC -> PC is an injection (remember that RC, PC live in Set, so it's alright to call it an injection)
For each C, let QC be the image of θC. So (QC = θC(RC)) ⊂ PC.
This Q is manifestly a subfunctor.
For an arbitrary presheaf C^ = [C^op, Set], suppose there is a subobject classifier O.
Then this O must at the very least classify yonedas (ie, must classify yC = Hom(-, C): [C^op, Set].
Recall that Sub_C(X) was the functor that sent X ∈ C to the set of subobjects of X, and that the category C had a subobject classifier O iff Sub_C(X) is represented by the subobject classifier O. Thus we must have that Sub_C(X) ~= Hom(X, O).
Let y(C) = Hom(-, C). Thus we have the isos Sub_C^(yC) = Hom_C^(yC, O) =[yoneda] O(C).
This means that the subobject classifier O: C^op -> Set, if it exists, must be defined on objects as O(C) = Sub_C^(yC). This means we need to build the set of all subfunctors of Hom(-, C).

Sieves

for an object c, a sieve on c is a set S of arrows with codomain c such that f ∈ S and for all arrows fh which can be defined, we have fh ∈ S.
If we think of paths f as things allowed to get through c, this means that some path to some other b (via a h) followed by an allowed path to c (via f is allowed). So if b -f-> c is allowed, so is a -h-> b -f-> c.
If C is a monoid, then a sieve is just a right ideal
For a partial order, a sieve on c is a set of elements that is downward closed/smaller closed. If b <f= c is in the sieve, then so too is any element a such that a <h= b <f= c.
So a sieve is a smaller closed subset: if a small object passes the sieve, then so does anything smaller!
Let Q ⊂ Hom(-, c) = yc be a subfunctor. Then define the set S_Q = { f | f: a -> c and f ∈ Q(a) }.
Another way of writing it maybe to say that we take S_q = { f ∈ Hom(a, c) | f ∈ Q(a) }.
This is a sieve because fh is pulling back f: a -> c along h: z -> a, and the action on the hom functor will pull back the set Hom(a, c) to Hom(z, c), which will maintain sieveiness, as if f ∈ Hom(a, c) then fh ∈ Hom(z, c).
This means that a sieve on c is the same as a subfunctor on yc = Hom(c, -).
this makes us propose a subobject classifier on [C^op, Set] to be defined as O(c) = set of sieves of Hom(c, -).

Common Lisp Debugging: Clouseau

Install the clouseau package to get GUI visualizations of common lisp code.
Use (ql:quickload 'clouseau) to use the package, and then use (clouseau:inspect (make-condition 'uiop:subprocess-error :code 42)) to inspect a variable.

Drawabox: Lines

Superimposed liens

Step 1: Draw a line with a ruler
Step 2: keep the pen at the beginning of the line
Step 3: Follow the line confidently, try to end at the endpoint of the line.

Ghosting lines

Step 1: Draw two endpoints
Step 2: Mimic drawing a line [ghosting].
Step 3: confidently draw a line. LIFT THE PEN UP to stop the pen, don't slow down!

Common Lisp Beauty: paths

; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[E {10034448F3}>
CL-USER> (pathname-type "/home/siddu_druid/**/*.mlir")
"mlir"
CL-USER> (pathname-type "/home/siddu_druid/**/foo")
NIL
CL-USER> (pathname "/home/siddu_druid/**/foo")
#P"/home/siddu_druid/**/foo"
CL-USER> (pathname-directory "/home/siddu_druid/**/foo")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS)
CL-USER> (pathname-directory "/home/siddu_druid/**/foo/**/bar")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS "foo" :WILD-INFERIORS)
CL-USER> (pathname-tu[ey "/home/siddu_druid/**/foo/**/bar")
; in: PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar"
;     (PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar")
; 
; caught STYLE-WARNING:
;   undefined function: COMMON-LISP-USER::PATHNAME-TU[EY
; 
; compilation unit finished
;   Undefined function:
;     PATHNAME-TU[EY
;   caught 1 STYLE-WARNING condition
; Debugger entered on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
[1] CL-USER> 
; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar")
NIL
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar.ty")
"ty"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/bar.ty")
"bar"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")
:WILD
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty"); Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[E {10034448F3}>
CL-USER> (pathname-type "/home/siddu_druid/**/*.mlir")
"mlir"
CL-USER> (pathname-type "/home/siddu_druid/**/foo")
NIL
CL-USER> (pathname "/home/siddu_druid/**/foo")
#P"/home/siddu_druid/**/foo"
CL-USER> (pathname-directory "/home/siddu_druid/**/foo")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS)
CL-USER> (pathname-directory "/home/siddu_druid/**/foo/**/bar")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS "foo" :WILD-INFERIORS)
CL-USER> (pathname-tu[ey "/home/siddu_druid/**/foo/**/bar")
; in: PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar"
;     (PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar")
; 
; caught STYLE-WARNING:
;   undefined function: COMMON-LISP-USER::PATHNAME-TU[EY
; 
; compilation unit finished
;   Undefined function:
;     PATHNAME-TU[EY
;   caught 1 STYLE-WARNING condition
; Debugger entered on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
[1] CL-USER> 
; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar")
NIL
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar.ty")
"ty"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/bar.ty")
"bar"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")
:WILD
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")

Logical Predicates (OPLSS '12)

$R_\tau(e)$ has three conditions:
(1) $e$ has type $\tau$
(2) $e$ has the property of interest ($e$ strongly normalizes / has normal form)
(3) The set $R\tau$ is closed under eliminators!
My intuition for (3) is that expressions are "freely built" under constructors. On the other hand, it is eliminators that perform computation, so we need $R_\tau$ to be closed under "computation" or "elimination"
Video

Logical Relations (Sterling)

Key idea is to consider relations $R_\tau$ between closed terms of types $\tau_l$ and $\tau_r$. That is, we have have a relation $R_\tau \subseteq { (t_l, t_r): (\cdot \vdash t_l : \tau_l), (\cdot \vdash t_r : \tau_r)$.
We write a relation between two closed terms $\tau_L$ and $\tau_R$ as: $R_{\tau} \equiv (\cdot \vdash \tau_L) \times (\cdot \vdash \tau_R)$.
A morphism of relations $f: R_\sigma to R_\tau$ is given by two functions $f_l: \sigma_l \to \tau_l$ and $f_r: \sigma_r \to \tau_r$ such that $aR_\sigma b \implies f_l(a) R_\tau f_r(b)$.

Logical relations for function spaces

Given this, we can build up logical relations for more complex cases like function types and quantified types. For example, given logical relations $R_\sigma$ and $R_\tau$, we build $R_{\sigma \to \tau}$ to be the relation between types $(\cdot \vdash \sigma_l \to \tau_l) \times (\cdot \vdash \sigma_r \to \tau_r)$, and given by the formula:

$$ (f_l: \sigma_l \to \tau_l, f_r: \sigma_r \to \tau_r) : R_{\sigma \to \tau} \equiv \forall (x_l : \sigma_l , x_r : \sigma_r) \in R_\sigma, (f_l(x_l), f_r(x_r)) \in R_\tau $$

This satisfies the universal property of functions in the category of logical relations, ie, there is an adjunction between $R_{\rho \to \sigma} \to R_{\tau}$ and $R_{\rho} \to R_{\sigma \to \tau}$.
Next, we can interpret a base type like bool by the logical relation that encodes equality on that type. so $R_{\texttt{bool}} : (\cdot \vdash \texttt{bool}) \times (\cdot \vdash \texttt{bool})$ and is given by:

Logical relations for data types

$$ R_{\texttt{bool}} \equiv { (\texttt{true, true}), (\texttt{false, false}) } $$

Logical relations for parametric types

for a type of the form $\tau(\alpha)$ that is parametric in $\alpha$, suppose we have a family of relations $R_{\tau \alpha} \subseteq { (\cdot \vdash \tau_l(\alpha_l) \times (\cdot \vdash \tau_r(\alpha_r) }{R\alpha}$ which vary in $R_\alpha$.
Then we define the logical relation for the type $R_{\forall \alpha, \tau(\alpha)} \subseteq (\cdot \vdash \forall \alpha \tau_l(\alpha)) \times (\cdot \vdash \forall \alpha \tau_r(\alpha))$ as:

$$ R_{\forall \alpha, \tau (\alpha)} \equiv { (f_l : \forall \alpha, \tau_l(\alpha), f_r: \forall \alpha, \tau_r(\alpha)) \mid \forall R_\alpha, (f_l(\alpha_l), f_r(\alpha_r)) \in R_{\tau(\alpha)} } $$

Proving things using logical relations

For $f: \forall \alpha, \alpha \to \texttt{bool}$, we have that $f @\texttt{unit} (()) = f @ \texttt{bool}(\texttt{true})$ That is, the function value at () : unit determines the value of the function also at true: bool (and more generally, everwhere).
To prove this, we first invoke that by soundness, we have that $(f, f) \in R_{\forall \alpha. \alpha \to \texttt{bool}}$. On unwrapping this, this means that:

$$ \forall R_\alpha, \forall (x_l, x_r) \in R_\alpha, ((f(x_l), f(x_r)) \in R_{\texttt{bool}}) $$

Plugging in $R_{\texttt{bool}}$, this gives us an equality:

$$ \forall R_\alpha, \forall (x_l, x_r) \in R_\alpha, (f(x_l) = f(x_r)) $$

We now choose $R_\alpha \subseteq (\cdot \vdash \texttt{unit}) \times (\cdot \vdash \texttt{bool})$, with the singleton element ${ ((), \texttt{true}) }$.
Jon Talk

$(x/p)$ is $x^{(p-1)/2}$

Since $x$ is coprime to $p$, we have that $1 \equiv x^{p-1}$
This can be written as $1^2 - x^{({p-1}/2)^2} = 0$. [$(p-1)$ is even when $p>2$].
That is, $(1 - x^{(p-1)/2})(1 + x^{(p-1)/2}) = 0$.
Since we are in an integral domain (really a field), this means that $x^{(p-1)/2} \equiv \pm 1 (\mod p)$.

Pointless topology: Frames

A frame is a lattice with arbitrary joins, finite meets, with distributive law: $A \cap \cup_i B_i = \cup_i A \cap B_i$.
A map of frames is a lattice map between frames.
A category of locales is the opposite category of frames.

Thm: Any locale has a smallest dense sublocale

For example, $\mathbb R$ has $\mathbb Q$.

Sober spaces

A space is sober iff every irreducible closed subset is the closure of a single point.
A sober space is one whose lattice of open subsets determine the topology of the space.

Introduction to substructural logics: Ch1

Terminology

Logic as talking about strings

The book gives a new (to me) interpretation of rules like $X \vdash A$. It says that this can be read as "the string $X$ is of type $A$", where type is some chomskian/grammarian sense of the word "type".
This means that we think of $X ; Y$ as "concatenate $X$ and $Y$".
This allows one to think of $X \vdash A \to B$ as the statement "$X$ when concatenated with a string of type $A$ produces a string of type $B$".
This is interesting, because we can have judgements like $X \vdash A$ and $X \vdash B$ with no problem, we're asserting that the string $X$ is of type $A$, $B$. Which, sure, I guess we can have words that are both nouns and verbs, for example.
Under this guise, the statement $X \vdash A \land B$ just says that "$X$ is both a noun and a verb".
Further, if I say $X \vdash A$ and $Y \vdash B$, then one wants to ask "what is type of $X; Y$ ? we want to say "it is the type $A$ next to $B$", which is given by $A \circ B$ (or, $A \otimes B$ in modern notation).
This is cool, since it gives a nice way to conceptualize the difference between conjunction and tensoring.

Tensor versus conjunction as vector spaces

What I got most out of this was the difference between what they call fusion (what we now call tensoring in linear logic) and conjunction.
Key idea: Let's suppose we're living in some huge vector space, and the statement $X \vdash A$ should be read as "the vector $X$ lives in the subspace $A$ of the large vector space.
Then, the rule $X \vdsh A$, $X vdash B$ entails $X \vdash A \land B$ means: if $X$ lives in subspace $A$ and $X$ lives in subspace $B$, then $X$ lives in the intersection $A \cap B$.
On the other hand, the rule $X \vash A$, $Y \vdash B$ entails $X ; Y \vdash A \circ B$ means: if $X$ lives in subspace $A$, $Y$ lives in subspace $B$, then the vector $X \otimes Y$ lives in subspace $A \otimes B$.
See that in the case of the conjunction, we are talking about the same $X$, just choosing to restrict where it lives ($A \cap B$)
See that in the case of tensor product, we have two elements $X$ and $Y$, which live in two different subspaces $A$ and $B$.

Cut and admissibility

Cut is the theorem that lets you have lemmas.
It says that if $X \vdash A$, and $Y(A) \vdash B$ then $Y(X) \vdash B$.
I don't understand what this means in terms of the interpretation of "left hand side as values, right hand side as types", or under "left side is strings, right side is types". The rule $Y(A) \vdash B$ is, at best, some kind of unholy dependently typed nonsense under this interpretation.
A theory is cut-admissible if the axioms let you prove cut.
In general, a theory is admissible to some axiom $A$ if the axioms of the theory allows one to prove $A$.

Integrating against ultrafilers

Let $X$ be a set.
Recall that a filter on $X$ is a collection of subsets $\Omega$ of $X$ that are closed under supersets and intersections (union comes for free by closure under supersets).
Recall that an ultrafilter $\Omega$ on $X$ is a maximal filter. That is, we cannot add any more elements into the filter.
Equivalently $\Omega$ is an ultrafilter if, for any $A \subseteq X$, either $A \in \Omega$ or $(X - A) \in \Omega$.
Intuitively, we are considering the set of subsets of $X$ that contains a single $x \in X$.
We can also say that ultrafilters correspond to lattice homomorphisms $2^X \to 2$.
A lemma will show that this is equivalent to the following: Whenever $X$ is expressed as the disjoint union of three subsets $S_1, S_2, S_3 \subseteq X$, then one of then will be in $\Omega$ (there exists some $i$ such that $S_i\in \Omega$).

Lemma: Three picking equivalent to ultrafilter

Integration by ultrafilter

Let $B$ a finite set, $X$ a set, $\Omega$ an ultrafilter on $X$.
Given $f: X \to B$, we wish to define $\int_X f d\Omega$.
See that the fibers of $f$ partition $X$ into disjoint subsets $f^{-1}(b_1), f^{-1}(b_2), \dots, f^{-1}(b_N)$.
The ultrafilter $X$ picks out one of these subsets, say $f^{-1}(b_i)$ ($i$ for "integration").
Then we define the integral to be $b_i$.

What does this integral mean?

We think of $\Omega$ as a probability measure. Subsets in $\Omega$ have measure 1, subsets outside have measure 0.
Since we want to think of $\Omega$ as some kind of probability measure, we want that $\int_X 1 \d \Omega = 1$, as would happen when we integrate a probability measure $\int d \mu = 1$.
Next, if two functions $f, g$ are equal almost everywhere (ie, the set of points where they agree is in $\Omega$), then their integral should be the same.

wegli: Neat tool for semantically grepping C++

https://github.com/googleprojectzero/weggli

Mostowski Collapse

Let $V$ be a set, let $U$ be a universe and let $R$ be a well founded relation on $V$.
Recall that a relation is well-founded iff every non-empty subset contains a minimal element. Thus, we can perform transfinite induction on $V$.
A function $\pi_R: V \to U$ defined via well founded induction as $\pi_R(x) \equiv { \pi(y): y \in V \land yRx }$ is called as the mostowski function on $R$. (We suppress $\pi_R$ to $\pi$ henceforth).
The image $\pi''V \equiv { \pi(x) : x \in V }$ is called as the Mostowski collapse of $R$.
Consider the well founded relation $R \subseteq N \times N$ such that $xRy$ iff $y = x + 1$

Image of collapse is transitive

Let $U$ be a universe, let $(V, <)$ be a well founded relation on $V$.
Let $\pi: V \to U$ be the mostowski function on $V$.
Suppose $a \in b \in \pi[V]$. We must show that $a \in \pi[V]$.
Since $b \in \pi[V]$, there is a $v_b \in V$ such that $\pi(v_b) = b$.
By the definition of the Mostowski function, $b = \pi(v_b) = { \pi(v) : v \in V \land (v < v_b) }$
Since $a \in b$, this implies that there exists a $v_a < v_b$ such that $\pi(v_a) = a$.
This implies that $a$ is in the image of $\pi[V]$: $a \in \pi[V]$.
Thus, the set $\pi[V]$ is transitive: for any $b \in \pi[V]$ and $a \in b$, we have shown that $a \in \pi[V]$.

Image of collapse is order embedding if $R$ is extensional

We already know that $\pi[V]$ is transitive from above.
We assume that $R$ is extentional. That is: $\forall a, aRx = aRy \iff x = y$. [ie, the fibers $R^{-1}(-)$ are distinct].
We want to show that $v_1 < v_2 \iff \pi(v_1) \in \pi(v_2)$.

Forward: $v_1 < v_2 \implies \pi(v_1) \in \pi(v_2)$:

$v_1 < v_2$, then $\pi(v_2) = { \pi(x): x < v_2 }$. This implies that $\pi(v_1) \in \pi(v_2)$.

Backward: $\pi(v_1) \in \pi(v_2) \implies v_1 < v_2$:

Let $\pi(v_1) < \pi(v_2)$.
By the definition of the Mostowski function, we have that $\pi(v_2) = { \pi(v'): v' < v_2 }$
Thus, there is some $v'$ such that $\pi(v') = \pi(v_1)$.
We wish to show that $v' = v_1$, or that the collapse function is injective.

Collapse is injective:

We will suppose that the collapse is not injective and derive a contradiction.
Suppose there are two elements $v_1, v_2$ such that $v_1 \neq v_2$ but $\pi(v_1) = \pi(v_2)$.
WLOG, suppose $v_1 < v_2$: the relation is well-founded, and thus the set ${v_1, v_2}$ ought to have a minimal element, and $v_1 \neq v_2$.
We must have $\pi(v_1) \subsetneq \pi(v_2)$,
Reference: book of proofs

Spaces that have same homotopy groups but not the same homotopy type

Two spaces have the same homotopy type iff there are functions $f: X \to Y$ and $g: Y \to X$ such that $f \circ g$ and $g \circ f$ are homotopic to the identity.
Now consider two spaces: (1) the point, (2) the topologists's sine curve with two ends attached (the warsaw circle).
See that the second space can have no non-trivial fundamental group, as it's impossible to loop around the sine curve.
So the warsaw circle has all trivial $\pi_j$, just like the point.
See that the map $W \to { \star }$ must send every point in the warsaw circle to the point $\star$.
See that the map backward can send $\star$ somewhere, so we are picking a point on $W$.
The composite smooshes all of $W$ to a single point. For this to be homotopic to the identity is to say that the space is contractible.

Fundamental group functor does not preserve epis

Epis in the category of topological spaces are continuous functions that have dense image.
Take a circle $S^1$ and pinch it in the middle to get $S^1 \lor S^1$. this map is an epi: $f: S^1 \to S^1 \lor S^1$.
See that this does not induce an epi $\pi(Z) \to \pi_(Z) \star \pi_1(Z)$.
Maybe even more simply, the map $f: [0, 1] \to S^1$ is an epi
Thus, fundamental group functor does not preserve epis.

Epi in topological spaces

Epis in the category of topological spaces are continuous functions that have dense image.
Proof: TODO

Permutation models

These are used to show create models of ZF + not(Choice).
Key idea: if we just have ZF without atoms, then a set has no non-trivial ∈ preserving permutations.
Key idea: if we have atoms, then we can permute the atoms to find non-trivial automorphisms of our model.
Key idea: in ZF + atoms, the ordinals come from the ZF fragment, where they live in the kernel [ie the universe formed by repeated application of powerset to the emptyset]. Thus, the "order theory" of ZF + atoms is controlled by the ZF fragment.
Crucially, this means that the notion of "well ordered" [ie, in bijection with ordinal] is determined by the ZF fragment.
Now suppose (for CONTRADICTION) that A is well ordered. This means that we Now suppose we have an injection f: ordinal -> A where A is our set of atoms.
Since A possesses non-trivial structure preserving automorphisms, so too must ordinal, since ordinal is a subset of A. But this violates the fact that ordinal cannot posses a non-trivial automorphism.
Thus, we have contradiction. Ths means that A cannot be well-ordered, ie, there cannot be an injection f: ordinal -> A.

Almost universal class

A universal class is one that contains all subsets as elements.
A class is almost universal if every subset of $L$ is a a subset of some element of $L$. But note that $L$ does not need to have all subsets as elements.
$L$ is almost universal if for any subset $A \subset L$ (where $A$ is a set), there is some $B \in L$ such that $A \subseteq B$, but $A$ in itself need not be in $L$.

Godel operations

A finite collection of operations that is used to create all constructible sets from ordinals.
Recall $V$, the von neumann universe, which we build by iterating powersets starting from $\emptyset$. That is, $f(V) = \mathcal P(V) \cup \mathcal P (\mathcal P(V))$
We construct $L$ sort of like $V$, but we build it by not taking $P(V)$ fully, but only taking subsets that are carved out by using subsets via first order formulas used to filter the previous stage.
This makes sure that the resulting sets are independent of the peculiarities of the surrounding model, by sticking to FOL filtered formulas.

Orthogonal Factorization Systems

For a category $C$, a factorization system consists of sets of morphisms $(E, M)$ such that:
$E, M$ contain all isos.
$E, M$ are closed under composition.
every morphism in $C$ can be factored as $M \circ E$
The factorization is functorial:
Reference: Riehl on factorization systems

Orthogonal morphisms

Two morphisms e: a -> b and m: x -> y are orthogonal iff for any (f, g) such that the square commutes:

a --e--> b
|        |
f        g
|        |
v        v
x --m--> y

then there exists a UNIQUE diagonal d: b -> x such that the the triangles commute: (f = d . e) and (m . d = g):

a --e--> b
|       / |
f      / g
|   /!d  |
v /      v
x --m--> y

Locally Presentable Category

A category is locally presentable iff it has a set $S$ of objects such that every object is a colimit over these objects. This definition is correct upto size issues.
A locally presentable category is a reflective localization $C \to Psh(S)$ of a category of presheaves over $S$. Since $Psh(S)$ is the free cocompletion, and localization imposes relations, this lets us write a category in terms of generators and relations.
Formally, $C$ :
1. is locally small
1. has all small colimits
1. <TECHNICAL SIZE CONDITIONS; TALK TO OHAD>

Localization

Let $W$

Reflective localization

Accessible Reflective localization

Remez Algorithm

link

Permission bits reference

I always forget the precise encoding of permissions, so I mkae a cheat sheet to remember what's what. It's read,write,execute which have values 2^2, 2^1, 2^0.

+-----+---+--------------------------+
| rwx | 7 | Read, write and execute  |
| rw- | 6 | Read, write              |
| r-x | 5 | Read, and execute        |
| r-- | 4 | Read,                    |
| -wx | 3 | Write and execute        |
| -w- | 2 | Write                    |
| --x | 1 | Execute                  |
| --- | 0 | no permissions           |
+------------------------------------+

+------------+------+-------+
| Permission | Octal| Field |
+------------+------+-------+
| rwx------  | 0700 | User  |
| ---rwx---  | 0070 | Group |
| ------rwx  | 0007 | Other |
+------------+------+-------+

Papers on Computational Group Theory

A practical model for computation with matrix groups.
A data structure for a uniform approach to computations with finite groups.
A fast implementatoin of the monster group.

Kan Extensions: Key idea

The key insight is to notice that when we map from $C \to E$ via $K$, then the $K(x)$ object that we get whose comma we form with $K \downarrow Kx$ also has an arrow $Kx \to Kx$ via the identity arrow. Thus we can think of $K \downarrow Kx$ as looking like (<stuff> -> Kx) -> Kx. So it's really the Kx in the <stuff> -> Kx that controls the situation.

Interleaved dataflow analysis and rewriting

fact: {} -> PROPAGATE
x = 1
fact: {x: 1}
y = 2
fact: {x: 1, y: 2}
~z = x + y~~
{x: 1, y : 2, z: 3} -> REWRITE + PROPAGATE
z = 3

-- :( rewrite, propagate. --

fact: {} -> PROPAGATE
x = 1
fact: {x: 1}
y = 2
fact: {x: 1, y: 2}
~z = x + y~~
{x: 1, y : 2} -> REWRITE
z = 3 <- NEW statement from the REWRITE;
fact: {x: 1, y: 2, z: 3}

x = 2 * 10 ; (x = 20; x is EVEN)
y = 2 * z; (y = UNK; y is EVEN)

-> if (y %2 == 0) { T } else { E }
T -> analysis

Central variable as `focal`

The NLTK code which breaks down a word into syllables inspects trigrams.
It names the variables of the trigrams prev, focal, and next.
I find the name focal very evocative for what we are currently focused on! It is free of the implications of a word like current.

Wilson's theorem

We get $p \equiv 1$ (mod $4$) implies $((p-1)/2)!$ is a square root of -1.
It turns that this is because from Wilson's theorem, $(p-1)! = -1$.
Pick $p = 13$. Then in the calculation of $(p-1)!$, we can pair off $6$ with $-6=7$, $5$ with $-5=8$ and so on.
So we get $(p-1)/2 \times (p-1)/2 = (p-1)!$.
This means that $(p-1)/2 = \sqrt{-1}$.
The condition $(p-1)/2$ is even is the same as saying that $p-1$ is congruent to $0$ mod $4$, or that $p$ is congruent to $1$ mod $4$.
It's really nice to be able to see where this condition comes from!

General enough special cases

Also, I feel thanks to thinking about combinatorial objects for a while I've gained some kind of "confidence", where I check a special case which I am confident generalizes well.

void editor_state_backspace_char(EditorState& s) {
    assert(s.loc.line <= s.contents.size());
    if (s.loc.line == s.contents.size()) { return; }
    std::string& curline = s.contents[s.loc.line];
    assert(s.loc.col <= curline.size());
    if (s.loc.col == 0) { return; }
    // think about what happens with [s.loc.col=1]. Rest will work.
    std::string tafter(curline.begin() + s.loc.col, curline.end());
    curline.resize(s.loc.col - 1); // need to remove col[0], so resize to length 0.
    curline += tafter;
    s.loc.col--;
}

XOR and AND relationship

a xor b = a + b - 2 (a & b)

Geometry of complex integrals

integral f(z) dz is work in real part, flux in imaginary part.
https://www.youtube.com/watch?v=EyBDtUtyshk

Green's functions

Can solve $L y(x) = f(x)$.
$f(x)$ is called as the forcing operator.
$L$ is a linear diffeential operator. That is, it's a differential operator lik $\partial_x$ or $\partial_t \partial_t$. See that $\partial_t \partial_t$ is linear, because $\partial_t \partial_t \alpha f + \beta g = \alpha (\partial_t \partial_t f) + \beta (\partial_t \partial_t g)$
Video reference

CP trick: writing exact counting as counting less than

If we can solve for number of elements <= k, say given by leq(k) where k is an integer, then we can also solve for number of elements = k, given by eq(k) := leq(k) - leq(k - 1).
While simple, this is hugely benificial in many situations because <=k can be implement as some kind of prefix sum data structure plus binary search, which is much less error prone to hack up than exact equality.

CP trick: Heavy Light Decomposition euler tour tree

To implement HLD, first define heavy edge to be edge to heaviest vertex.
To use segment tree over HLD paths, create a "skewed" DFS where each node visits heavy node first, and writes vertices into an array by order of discovery time (left paren time).
When implementing HLD, we can use this segment tree array of the HLD tree as an euler tour of the tree.
We maintain intervals so it'll be [left paren time, right paren time]. We find right paren time based on when we exit the DFS. The time we exit the DFS is the rightmost time that lives within this subtree.

Counting with repetitions via pure binomial coefficients

If we want to place $n$ things where $a$ of them are of kind a, $b$ are of kind b, $c$ of them are kind $c$. the usual formula is $n!/(a!b!c!)$.
An alternative way to count this is to think of it as first picking $a$ slots from $n$, and then picking $b$ slots from the leftover $(n - a)$ elements, and finally picking $c$ slots from $(n - a - b)$. This becomes $\binom{n}{a}\binom{n-a}{b}\binom{n - a - b}{c}$.
This is equal to $n!/a!(n -a)! \cdot (n-a)!/n!(n - a - b)! \cdot (n - a - b)! / c!0!$, which is equal to the usual $n!/a!b!c!$ by cancelling and setting $c = n - a - b$.
Generalization is immediate.

Fundamental theorem of homological algebra [TODO]

Let $M$ be an $R$ module.
A resolution of $M$ is an exact chain complex ... -> M2 -> M1 -> M0 -> M -> 0
A projective resolution of P* of M is a resolution such that all the P* are projective.

Fundamental theorem

1. Every R module has projective resolution.
1. Let P* be a chain complex of proj. R modules. Let Q* be a chain complex with vanishing homology in degree greater than zero. Let [P*, Q*] be the group of chain homotopoloy classes of chain maps from P* to Q*. We are told that this set is in bijection with maps [H0(P*), H0(Q*)]. That is, the map takes f* to H0[f*] is a bijection.

Corollary: two projective resolutions are chain homotopy equivalent

Let P1 -> P0 -> M and ... -> Q1 -> Q0 -> M be two projective resolutions.
H0(P*) has an epi mono factorization P0 ->> H0(P*) and H0(P*) ~= M.

Proof of existence of projective resolution

Starting with M there always exists a free module P0 that is epi onto M, given by taking the free module of all elements of M. So we get P0 -> M -> 0.
Next, we take the kernel, which gives us:

     ker e
        |
        |   e
        vP0 -> M -> 0

The next P1 must be projective, and it must project onto ker e for homology to vanish. So we choose the free module generated by elements of ker e to be P1!

    ker e
    ^   |
    |   v  e
P1---   P0 -> M -> 0

Composing these two maps gives us P1 -> P0 -> M. Iterate until your heart desires.

Chain homotopy classes of chain maps

Projective modules in terms of universal property

(1): Universal property / Defn

$P$ is projective iff for every epimorphism $e: E \to B$, and every morphism $f: P \to B$, there exists a lift $\tilde{f}: P \to E$.

     e
   E ->> B
   ^   ^
  f~\  | f
     \ |
       P

Thm: every free module is projective

Let $P$ be a free module. Suppose we have an epimorphism $e: M \to N$ and a morphism $f: P \to N$. We must create $\tilde f: M \to N$
Let $P$ have basis ${ p_i }$. A morphism from a free module is determined by the action on the basis. Thus, we simply need to define $\tilde f(p_i)$.
For each $f(p_i): N$, there is a pre-image $m_i \in M$ such that $e(m_i): N = f(p_i): N$.
Thus, define $\tilde{f}(p_i) = m_i$. This choice is not canonical since there could be many such $m_i$.
Regardless, we have succeeded in showing that every free module is projective by lifting $f: P \to N$ to a map $\tilde f: M \to N$.

(1 => 2): Projective as splitting of exact sequences

$P$ is projective iff every exact sequence $0 \to N \to M \xrightarrow{\pi} P \to 0$ splits.
That is, we have a section $s: P \to M$ such that $\pi \circ s = id_P$.
PROOF (1 => 2): Suppose $P$ solves the lifting problem. We wish to show that this implies that exact sequence splits.
Take the exact sequence:

            pi
0 -> N -> M -> P -> 0
               ^
               | idP
               P

This lifts into a map $P \to M$ such that the composition is the identity:

            pi
0 -> N -> M -> P -> 0
          ^   ^
       idP~\  | idP
            \ |
             P

This gives us the section s = idP~ such that pi . s = idP from the commutativity of the above diagram.

(2 => 3): Projective as direct summand of free module

$P$ is projective iff it is the direct summand of a free module. So there is a another module $N$ such that $P \oplus N \equiv R^n$.
We can always pick a surjective epi $\pi: F \to P$, where $F$ is the free module over all elements of $P$.
We get our ses $0 \to ker(\pi) \to F \to P \to 0$. We know this splits because as shown above, projective splits exact sequences where $P$ is the surjective image.
Since the sequence splits, the middle term $F$ is a direct sum of the other two terms. Thus $F \simeq \ker \pi \oplus P$.

Splitting lemma

If an exact sequence splits, then middle term is direct sum of outer terms.

(3 => 1): Direct summand of free module implies lifting

Let's start with the diagram:

  e
E ->>B
     ^
    f|
     P

We know that $P$ is the direct summand of a free module, so we can write a P(+)Q which is free:

  e
E ->>B
     ^
    f|
     P <<-- P(+)Q
         pi

We create a new arrow f~ = f . pi which has type f~: P(+)Q -> B. Since this is a map from a free module into B, it can be lited to E. The diagram with f~ looks as follows:

  e
E ->>B <--
     ^    \f~
    f|     \
     P <<-- P(+)Q
         pi

After lifting f~ to E as g~, we have a map g~: P(+)Q -> E.

--------g~--------
|                |
v e              |
E ->>B <--       g~
     ^    \f~    |
    f|     \     |
     P <<-- P(+)Q
         pi

From this, I create the map g: P -> E given by g(p) = g~((p, 0)). Thus, we win!

Non example of projective module

Z/pZ is not projective.
We have the exact sequence 0 -> Z -(xp)-> Z -> Z/kZ -> 0 of multiplication by p.
This sequence does not split, because Z (middle) is not a direct summand of Z (left) and Z/kZ (right), because direct summands are submodules of the larger module. But Z/pZ cannot be a submodule of Z because Z/pZ is torsion while Z is torsion free.

Example of module that is projective but not free

Let $R \equiv F_2 \times F_2$ be a ring.
The module $P \equiv F_2 \times {0}$ is projective but not free.
It's projective because it along with the other module $Q \equiv {0} \times F_2$ is isomorphic to $R$. ($P \oplus Q = R$).
It's not free because any $R^n$ will have $4^n$ elements, while $P$ has only two element.
Geometrically, we have two points, one for each $F_2$. The module $P$ is a vector bundle that only takes values over one of the points. Since the bundle different dimensions over the two points (1 versus 0), it is projective but not free.
It is projective since it's like a vector bundle. It's not free because it doesn't have constant dimension.

References

video

How ideals recover factorization [TODO]

consider $Z[-5]$. Here, we have the equation that $2 \times 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$.
Why are $2, 3, (1 + \sqrt 5), (1 - \sqrt 5)$ prime?
we can enumerate numbers upto a given absolute value. Since the absolute value is a norm and is multiplicative, we only need to check for prime factorization of a given number $n$ in terms of primes $p$ with smaller absolute value (ie, $|p| < |n|$).
If we list numbers in $Z[-\sqrt{5}]$ upto norm square $6$ (because $6$ is the norm square of $1 - \sqrt{5}$), we get:

This was generated from the python code:

class algnum:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __add__(self, other):
        return algnum(self.a + other.a, self.b + other.b)
    def __mul__(self, other):
        # (a + b \sqrt(-5)) (a' + b' \sqrt(-5))
        # aa' + ab' sqrt(-5) + ba' sqrt(-5) + bb' (- 5)
        # aa' - 5 bb' + sqrt(-5)(+ab' +ba')
        return (self.a * other.b - 5 * self.b * other.b,
                self.a * other.b + self.b * other.a)
    def __str__(self):
        if self.b == 0:
            return str(self.a)
        if self.a == 0:
            return f"{self.b}sqrt(-5)"
        return f"[{self.a}, {self.b} sqrt(-5)]"

    def normsq(self):
        # (a + b \sqrt(-5))(a - b \sqrt(-5))
        # = a^2 - (-5) b^2
        # = a^2 + 5 b^2
        return self.a * self.a + 5 * self.b * self.b
    def is_zero(self):
        return self.a == 0 and self.b == 0
    def is_one(self):
        return self.a == 1 and self.b == 0

    def is_minus_one(self):
        return self.a == -1 and self.b == 0



    __repr__ = __str__

nums = [algnum(a, b) for a in range(-10, 10) for b in range(-10, 10)]

def divisor_candidates(p):
    return [n for n in nums if n.normsq() < p.normsq() \
                  and not n.is_zero() \
                  and not n.is_one() \
                  and not n.is_minus_one()]

# recursive.
print("normsq of 2: ", algnum(2, 0).normsq());
print("normsq of 3: ", algnum(3, 0).normsq());
print("normsq of 1 + sqrt(-5):" , algnum(1, 1).normsq());
print("potential divisors of 2: ", divisor_candidates(algnum(2, 0)))
# candidates must be real. Only real candidate is 2.
print("potential divisors of 3: ", divisor_candidates(algnum(3, 0)))
# Candidate must be mixed.
print("potential divisors of (1 + sqrt(-5)): ", divisor_candidates(algnum(1, 1)))
print("potential divisors of (1 - sqrt(-5)): ", divisor_candidates(algnum(1, -1)))

Recovering unique factorization of ideals

In the above ring, define $p_1 \equiv (2, 1 + \sqrt(-5))$.
Define $p_2 \equiv (2, 1 - \sqrt(-5))$.
Define $p_3 \equiv (3, 1 + \sqrt(-5))$.
Define $p_4 \equiv (3, 1 - \sqrt(-5))$.
We claim that $p_1 p_2 = (2)$, $p_3 p_4 = (3)$, $p_1 p_3 = (1 + \sqrt(-5))$, $p_2 p_4 = (1 - \sqrt{-5})$.
This shows that the ideals that we had above are the products of "prime ideals".
We recover prime factorization at the ideal level, which we had lost at the number level.
Video lectures: Intro to algebraic number thory via fermat's last theorem

Centroid of a tree

Do not confuse with Center of a tree, which is a node $v$ that minimizes the distance to all other nodes: $max_{w \in V} d(v, w)$. This can be found by taking the node that is the middle of a diameter.
The centroid of a tree is a node such that no child has over floor(n/2) of the vertices in the tree.

Algorithm to find centroid of a tree

Root tree arbitrarily at $r$
Compute subtree sizes with respect to this root $r$.
Start from root. If all children of root $r$ have size less than or equal to floor(n/2), we are done. Root is centroid.
If not, some child $c$ [for child, contradiction] has size strictly greater than floor(n/2).
The total tree has $n$ vertices. $c$ as a subtree has greater than floor(n/2) vertices. Thus the rest of the tree (ie, the part under $r$ that excludes $c$) has strictly less than floor(n/2) vertices.
Let us in our imagination reroot the tree at this child $c$. The childen of $c$ continue to have the same subtree size. The old node $r$, as a subtree of the new root $c$, has size strictly ness than floor(n/2) vertices.
Now we recurse, and proceed to analyze c.
This analysis shows us that once we descend from r -> c, we do not need to analyze the edge c -> r if we make c the new candidate centroid.

int sz[N]; // subtree sizes
vector<int> es[N]; // adjacency list
int go_sizes(int v, int p) {
  sz[v] = 1;
  for (int w : es[v]) {
    if (w == p) { continue; }
    go_sizes(w, v);
    sz[node] += sz[i];
  }
}

int centroid(int v, int p) {
  for (int w : es[v]) {
    if (w != p && sz[w] > N/2)
      return centroid(w, v);
  }
  return v;
}

int main() {
  ...
  go_sizes(1, 1);
  centroid(1, 1);
};

Note that one does not need to write the code as follows:

int centroid(int v, int p) {
  for (int w : es[v]) {
    int wsz = 0;
    if (w == p) {
      // size of parent = total - our size
      wsz = n - sz[v];
    } else {
      wsz = sz[w];
    }
    assert(wsz);
    if (wsz > N/2) {
      return centroid(w, v);
    }
  }
  return v;
}

This is because we have already established that if p descends into v, then the subtree p [rooted at v] must have less than n/2 elements, since the subtree v [rooted at p] has more than n/2 elements.

Alternate definition of centroid

Let the centroid of a tree $T$ be a vertex $v$, such that when $v$ is removed and the graph splits into components $T_v[1], T_v[2], \dots, T_v[n]$, then the value $\tau(v) = \max(|T_v[1]|, |T_v[2]|, \dots, |T_v[n]|)$ is minimized.
That is, it is the vertex that on removal induces subtrees, such that the size of the largest component is smallest amongst all nodes.

Existence of centroid

Equivalence to size definition

Centroid decomposition

If we find the centroids of the subtrees that hang from the centroid, then we decompose the graph into a centroid decomposition.

Path query to subtree query

Model question: CSES counting paths
We have a static tree, and we wish to perform updates on paths, and a final query.
We can uniquely represent a path in a tree with an initial and final node. There are $O(n^2)$ paths in a tree, so we need to be "smart" when we try to perform path updates.

Pavel: bridges, articulation points for UNDIRECTED graphs

Two vertices are 2-edge connected if there are 2 paths between them. The two paths cannot share ANU edges.
Every bridge must occur as a DFS tree edge, because DFS connects all components together.
More generally, every spanning tree contains all bridge edges.
Now we check if each edge is a bridge or not.
To check, we see what happens when we remove the edge $(u, v)$. If the edge is not a bridge, then the subtree of $v$ must connect to the rest of the graph.
Because we run DFS, the subtree rooted at $v$ must go upwards, it cannot go cross. On an undirected graph, DFS only gives us tree edges and back edges.
This means that if the subtree rooted at $v$ is connected to the rest of the graph, it must have a backedge that is "above" $u$, and points to an ancestor of $u$.
Instead of creating a set of back edges for each vertex $v$, we take the highest /topmost back edge, since it's a safe approximation to throw away the other back-edges if all we care about is to check whether there is a backedge that goes higher than $u$.
To find the components, push every node into a list. When we find an edge that is a bridge, take the sublist from the vertex $v$ to the end of the list. This is going to be one connected component. We discover islands in "reverse order", where we find the farthest island from the root first and so on.

Vertex connectivity

The problem is that vertex connectivity is not an equivalence relation on vertices!
So we define it as an equivalence relation on edges.
More subtle, we cannot "directly" condense. We need to build a bipartite graph, with components on one side and articultion points on the other side.

Segtree: range update, point query [TODO]

To support this, impelement queries normally.
To implement point query, start at the leaf and then walk upward to the root, collecting all the update values.

Monadic functor

A fuctor $U: D \to C$ is monadic iff it has a left adjoint $F: C \to D$ and the adjunction is monadic.
An adjunction $C : F \vdash U: D$ is monadic if the induced "comparison functor" from $D$ to the category of algebras (eilenberg-moore category) $C^T$ is an equivalence of categories.
That is, the functor $\phi: D \to C^T$ is an equivalence of categories.
Some notes: We have $D \to C^T$ and not the other way around since the full order is $C_T \to D \to C^T$: Kleisli, to $D$, to Eilenberg moore. We go from "more semantics" to "less semantics" --- such induced functors cannot "add structure" (by increasing the amount of semantics), but they can "embed" more semantics into less semantics. Thus, there is a comparison functor from $D$ to $C^T$.
Eilenberg-moore is written $C^T$ since the category consists of $T$-algebras, where $T$ is the induced monad $T: C \to D \to C$. It's $C^T$ because a $T$ algebra consists of arrows ${ Tc \to c : c \in C }$ with some laws. If one wished to be cute, the could think of this as "$T \to C$".
The monad $T$ is $C \to C$ and not $D \to D$ because, well, let's pick a concrete example: Mon. The monad on the set side takes a set $S$ to the set of words on $S$, written $S^\star$. The other alleged "monad" takes a monoid $M$ to the free monoid on the element of $M$. We've lost structure.

Injective module

An injective module is a generalization of the properties of $\mathbb Q$ as an abelian group ($\mathbb Z$ module.)
In particular, given any injective group homomorphism $f: X \to Y$ and a morphism $q_X: X \to \mathbb Q$, then we induce a group homomorphism $q_Y: Y \to \mathbb Q$, where $X, Y$ are abelian groups.
We can think of this injection $f: X \to Y$ as identifying a submodule (subgroup)$X$ of $Y$.
Suppose we wish to define the value of $q_Y$ at some $y \in Y$. If $y$ is in the subgroup $X$ then define $q_y(y) \equiv q_x(y)$.
For anything outside the subgroup $X$, we define the value of $q_y$ to be $0$.
Non-example of injective module: See that this does not work if we replace $\mathbb Q$ with $\mathbb Z$.
Consider the injective map $Z \to Z$ given by $i(x) \equiv 3x$ Consider the quotient map $f: Z \to Z/3Z$. We cannot factor the map $f$ through $i$ as $f = ci$ [$c$ for contradiction]. since any map $c: Z \to Z/3Z$ is determined by where $c$ sends the identity. But in this case, the value of $c(i(x)) = c(3x) = 3xc(1)) = 0$. Thus, $\mathbb Z$ is not an injective abelian group, since we were unable to factor the homomorphism $Z \to Z/3Z$ along the injective $3 \times: Z \to Z$.
Where does non-example break on Q? Let's have the same situation, where we have an injection $i: Z \to Q$ given by $i(z) = 3z$. We also have the quotient map $f: Z \to Z/3Z$. We want to factor $f = qi$ where $q: Q \to Z/3Z$. This is given by $q(x) = $

Proof that $Spec(R)$ is a sheaf [TODO]

Give topology for $Spec(R)$ by defining the base as $D(f)$ --- sets where $f \in R$ does not vanish.
Note that the base is closed under intersection: $D(f) \cap D(g) = D(fg)$.
To check sheaf conditions, suffices to check on the base.
To the set $D(f)$, we associate the locally ringed space $f^{-1}(R)$. That is, we localize $R$ at the multiplicative monoid $S \equiv { f^k }$.
We need to show that if $D(f) = \cup D(f_i)$, and given solutions within each $D(f_i)$, we need to create a unique solution in $D(f)$.

Reduction 1: Replace $R$ by $R[f^{-1}]$

We localize at $f$. This allows us to assume that $D(f) = Spec(R)$ [ideal blows up as it contains unit], and that $f = 1$ [localization makes $f$ into a unit, can rescale?]
So we now have that ${ D(f_i) }$ cover the spectrum $Spec(R)$. This means that for each point $\mathfrak p$, there is some $f_i$ such that $f_i \not \equiv_\mathfrak p 0$. This means that $f_i \not \in \mathfrak p$.
Look at ideal $I \equiv (f_1, f_2, \dots, f_n)$. For every prime (maximal) ideal $mathfrak p$ , there is some $f_i$ such that $f_i \not in \mathfrak p$. This means that the ideal $I$ is not contained in any maximal ideal, or that $I = R$.
This immediately means that $1 \in R: = \sum_i f_i a_i \in I$ for arbitrary $a_i \in R$.
Recall that in a ring, the sums are all finite, so we can write $1$ as a sum of FINITE number of $f_i$, since only a finite number of terms in the above expression will be nonzero. [$Spec(R)$ is quasi-compact!]
This is a partition of unity of $Spec(R)$.

Separability

Given $r \in R = O(Spec(R))$, if $r$ is zero in all $D(f_i)$, then $r = 0$ in $R$.
$R$ being zero in each $D(f_i)$ means that $r = 0$ in $R[f_i^{-1}]$. This means that $f_i^{n_i} r = 0$, because something is zero on localization iff it is killed by the multiplicative set that we are localizing at.
On the other hand, we also know that $a_1 f_1 + \dots + a_n f_n = 1$ since $D(f_i)$ cover $R$.
We can replace $f_i$ by $f_i^{n_i}$, since $D(f_i) = D(f_i^{n_i})$. So if the $D(f_i)$ cover $R$, then so too do $D(f_i^{n_i})$.

Check sheaf conditions

Suppose $r_i/f_i^{n_i} \in R[f_i^{-1}]$ is equal to $r_j/f_j^{n_j}$

References

Borcherds

Projections onto convex sets

[Link](https://en.wikipedia.org/wiki/Projections_onto_convex_sets

BGFS algorithm for unconstrained nonlinear optimization

Link

LM algorithm for nonlinear least squares

Link

Backward dataflow and continuations

Forward dataflow deals with facts thus far.
Backward dataflow deals with facts about the future, or the rest of the program. Thus, in a real sense, backward dataflow concerns itself with continuations!

Coordinate compression with `set` and `vector`

If we have a std::set<T> that represents our set of uncompressed values, we can quickly compress it with a std::vector<T> and lower_bound without having to create an std::map<T, int> that holds the index!

set<int> ss; // input set to compress
vector<int> index(ss.begin(), ss.end());
int uncompressed = ...; //
int compressed = int(lower_bound(index.begin(), index.end(), key) - index.begin());
assert(xs[compressed] == uncompressed);

Hilbert polynomial and dimension

Think of non Cohen Macaulay ring (plane with line perpendicular to it). Here the dimension varies per point.
Let $R$ be a graded ring. Let $R^0$ be noetherian. $R$ is finitely generated as an algebra over $R^0$. This implies by hilbert basis theorem that $R$ is noetherian (finitely generated as a module over $R^0$).
Suppose $M$ is a graded module over $R$, and $M$ is finitely generated as a module over $R$.
How fast does $M_n$ grow? We need some notion of size.
Define the size of $M_n$ as $\lambda(M_n)$.Suppose $R$ is a field. Then $M_n$ is a vector space. We define $\lambda(M_n)$ to be the dimension of $M_n$ as a vector space over $R$.
What about taking dimension of tangent space? Doesn't work for cusps! (singular points). Can be used to define singular points.
TODO: show that at $y^2 = x^3$, we have dimension two (we expect dimension one)

Cost of looping over all multiples of $i$ for $i$ in $1$ to $N$

Intuitively, when I think of "looping over $i$ and all its multiples", I seem to have a gut feeling that its cost is $N$. Of course, it is not. It is $N/i$.
Thus, the correct total cost becomes $\sum_{i=1}^N N/i$ (versus the false cost of $\sum_{i=1}^N N = N^2$.
The correct total cost is a harmonic series $N\cdot \sum_{i=1}^N1/i \simeq N \log N$.
This is useful for number theory problems like 1627D

Stuff I learnt in 2021

I spent this year focusing on fundamentals, and attempting to prepare myself for topics I'll need during my PhD. This involved learning things about dependent typing, compiling functional programs, working with the MLIR compiler toolchain, and reading about the GAP system for computational discrete algebra.

Guitar

I've been meaning to pick up an instrument. I'd learnt the piano as a kid, but I'd soured on the experience as it felt like I was learning a lot of music theory and practising to clear the Royal school of music exams. I'd learnt a little bit of playing the guitar while I was an inten at Tweag.io; my AirBnB host had a guitar which he let me borrow to play with. I was eager to pick it back up.

Unfortunately, I wasn't able to play as consistenly as I had hoped I would. I can now play more chords, but actually switching between them continues to be a challenge. I also find pressing down on barre chords surprisingly hard. I've been told something about getting lighter strings, but I'm unsure about that.

I was also excited about learning the guitar well enough to play it while a friend sings along. This seems to require a lot more practice than I currently have, as the bottleneck is whether one can smoothly switch between chords.

Starting my PhD: Research on Lean4

I'm excited by proof assistants, and I'd like to work on them for my PhD. So the first order of business was to get an idea of the internals of Lean4, and to decide what exactly I would be working on. This made me read the papers written by the Lean team over the last couple years about their runtime, as well as made me learn how to implement dependently typed languages.

During this process, I also had calls with some of the faculty at the University of Edinburgh to pick a co-advisor. I enjoyed reading Liam O connor's thesis: Type Systems for Systems Types The thesis had a very raw, heartfelt epilogue:

If you will permit some vain pontification on the last page of my thesis, I would like to reflect on this undertaking, and on the dramatic effect it has had on my thinking. My once-co-supervisor Toby Murray said that all graduate students enter into a valley of despair where they no longer believe in the value of their work. Certainly I am no counter-example. I do not even know if I successfully found my way out of it

This, honestly, made me feel a lot better, since I'd begun to feel this way even before launching into a PhD!

Lean implementation details

I read the papers by the Lean researchers on the special features of the language.

Counting immutable beans describes an optimization that they perform in their IR (lambdarc) that optimizes memory usage by exploiting linear types.
Sealing pointer equality describes how to use dependent types to hide pointer manipulation in a referentially transparent fashion.

Writing a dependently typed language

I felt I had to know how to write a dependently typed language if I wanted to be successful at working on the Lean theorem prover. So I wrote one, it's at bollu/minitt. The tutorials that helped me the most were:

David Christiansen's tutorial on normalization by evaluation, where he builds a full blown, small dependently typed language type checker.
Normalization by evaluation by F. Faviona which explains why we need this algorithm to implement dependently typed languages, and other cute examples of normal forms in mathematics. For example, to check if two lists are equivalent upto permutation, we can first sort the two lists, and then check for real equality. So we are reducing a problem of "equivalence" to a problem of "reduction to sorted order" followed by "equality". We do something similar to type check a dependently typed language.
cubicaltt lectures by faviona, which get the point of cubical type theory across very well.
Bidirectional type checking by Pfenning These lecture notes explain bidirectional typing well, and provide an intuition for which types should be checked and which should be inferred when performing bidirectional typing.

Paper acceptance

Our research about writing a custom backend for Lean4 was accepted at CGO'22. I was very touched at how nice the programming languages community is. For example, Leonaro De Moura and Sebastian Ullrich, the maintainers of Lean4 provided a lot of constructive feedback. I definitely did not expect this to happen. I feel like I don't understand academia as a community, to be honest, and I'd like to understand how it's organized.

Statistics

As I was working on the paper, I realised that I didn't truly understand why we were taking the median of the runtimes to report performance numbers, or why averaging over ten runs was "sufficient" (sufficient for what?).

This led me on a quest to learn statistics correctly. My big takeaways were:

Frequentist type statistics via null hypotheses are hard to interpret and may not be well suited for performance benchmarking.
The High Performance Computing community does not use bayesian statistics, so using it would flag one's paper as "weird".
The best solution is to probably report all raw data, and summarize it via reasonable summary statistics like median, which is robust to outliers.

I must admit, I find the entire situation very unsatisfactory. I would love it if researchers in High performance computing wrote good reference material on how to bencharmk well. Regardless, here are some of the neat things I wound up reading in this quest:

Learning statistics with R

Learning statistics with R. This is a neat book which explains statistics and the R programming language. I knew basically nothing of statistics and had never used R, so working through the book was a blast. I was able to blaze through the first half of the book, since it's a lot of introductory programming and introductory math. I had to take a while to digest the ideas of p-values and hypothesis testing. I'm still not 100% confident I really understand what the hell a p value is doing. Regardless, the book was a really nice read, and it made me realize just how well the R language is designed.

Jackknife

The Jackknife paper "Bootstrap methods: another look at the jackknife" which introduces the technique of bootstrapping: drawing many samples from a small dataset to eventually infer summary statistics. I was impressed by the paper for three reasons. For one, it was quite easy to read as a non-statistician, and I could follow the gist of what was going on in the proofs. Secondly, I enjoyed how amenable it is to implementation, which makes it widely used in software. Finally, I think it's a great piece of marketing: labelling it a "Jacknife", and describing how to bootstrap is a rough-and-ready method that will save you in the jungles of statistical wilderness makes for a great title.

R language and tidy data

Due to the R lannguage quest, I was exposed to the idea of a data frame in a coherent way. The data frames in R feels designed to me, unlike their python counterpart in pandas.

I realised that I should probably learn languages that are used by domain experts, and not poor approximations of domain expertise in Python.

Tidyverse

This also got me interested to learn about the tidyverse, a collection of packages which define a notion of "tidy data", which is a precise philosophy of how data should be formatted when working on data science (roughly speaking, it's a dataset analogy of 3rd normal form from database theory.

In particular, I really enjoyed the tidy data paper which defines tidy data, explains how to tidy untidy data, and advocates for using tidy data as an intermediate representation for data analysis.

Starting a reading group: Fuzzing

I felt like I was missing out on hanging with folks from my research lab, so I decided to start a reading group. We picked the fuzzing book as the book to read, since it seemed an easy and interesting read.

I broadly enjoyed the book. Since it was written in a literate programming style, this meant that we could read the sources of each chapter and get a clear idea of how the associated topic was to be implemented. I enjoy reading code, but I felt that the other lab members thought this was too verbose. It did make judging the length of a particular section hard, since it was unclear how much of the section was pure implementation detail, and how much was conceptual.

Ideas learnt

Overall, I learnt some interesting ideas like delta debugging, concolic fuzzing, and overall, how to design a fuzzing library (for example, this section on grammar fuzzing provides a convenient class hierarchy one could choose to follow).

I also really enjoyed the book's many (ab)uses of python's runtime monkey-patching capabilities for fuzzing. This meant that the book could easily explain concepts that would have been much harder in some other setting, but this also meant that some of the techniques showcased (eg. tracking information flow by using the fact that python is dynamically typed) would be much harder to put into practice in a less flexible language.

Software bugs are real bugs?

The coolest thing I learnt from the book was STADS: software testing as species discovery, which models the problem of "how many bugs exist in the program?" as "how many bugs exist in this forest?". It turns out that ecologists have good models for approximating the total number of species in a habitat from the number of known species in a habitat. The paper then proceeds to argue that this analogy is sensible, and then implements this within AFL: american fuzzy lop. Definitely the most fun idea in the book by far.

Persistent data structures for compilers

My friend and fellow PhD student Mathieu Fehr is developing a new compiler framework based on MLIR called XDSL. This is being developed in Python, as it's meant to be a way to expose the guts of the compilation pipeline to domain experts who need not be too familiar with how compilers work.

Python and immutable data structures

I wished to convince Mathieu to make the data structures immutable by default. Unfortunately, python's support for immutable style programming is pretty poor, and I never could get libraries like pyrsistent to work well.

Immer

On a happier note, this made me search for what the cutting was in embedding immutable data structures in a mutable language, which led me to Immer: Persistence for the masses. It advocates to use RRB trees and describes how to design an API that makes it convenient to use within a language like C++. I haven't read the RRB trees paper, but I have been using Immer and I'm liking it so far.

`WARD` for quick blackboarding

I hang out with my friends to discuss math, and the one thing I was sorely missing was the lack of a shared blackboard. I wanted a tool that would let me quickly sketch pictures, with some undo/redo, but most importantly, be fast. I found no such tool on Linux, so I wrote my own: bollu/ward. I was great fun to write a tool to scratch a personal itch. I should do this more often.

Becoming a Demoscener

I've always wanted to become a part of the demoscene, but I felt that I didn't understand the graphics pipeline or the audio synthesis pipeline well enough. I decided to fix these glaring gaps in my knowledge.

Rasterization

I've been implementing bollu/rasterizer, which follows the tinyrenderer series of tutorials to implement a from-scratch, by-hand software rasterizer. I already knew all the math involved, so it was quite rewarding to quickly put together code that applied math I already knew to make pretty pictures.

Audio synthesis

Similarly, on the audio synthesis side, I wrote bollu/soundsynth to learn fundamental synthesis algorithms. I followed demofox's series of audio synththesis tutorials as well as the very pleasant and gently paced textbook [TODO}(). I particularly enjoyed the ideas in karlplus strong string synthesis. I find FM synthesis very counter-intuitive to reason about. I've been told that audio engineers can perform FM sound synthesis "by ear", and I'd love to have an intuition for frequency space that's so strong that I can intuit how to FM synthesize a sound. Regardless, the idea is very neat for sure.

Plucker coordinates

I also have long wanted to understand Plucker coordinates, since I'd read that they are useful for graphics programming. I eventually plonked down, studied them, and wrote down an expository note about them in a way that makes sense to me. I now feel I have a better handle on Projective space, Grassmanians, and schemes!

Category theory

A friend started a category theory reading group, so we've spent the year working through Emily Riehl's "Category theory in Context". I'd seen categorical ideas before, like colimits to define a germ, "right adjoints preserve limits", showing that the sheafification functor exists by invoking an adjoint functor theorem, and so on. But I'd never systematically studied any of this, and if I'm being honest, I hadn't even understood the statement of the Yoneda lemma properly.

Thoughts on the textbook

Working through the book from the ground-up was super useful, since I was forced to solve exercises and think about limits, adjoints, and so forth. I've uploaded my solutions upto Chapter 4.

I felt the textbook gets a little rough around the edges at the chapter on adjunctions. The section on the 'Calculus of Adjunctions' made so little sense to me that I rewrote it with proofs that I could actually grok/believe.

Curios

Regardless, it's been a fun read so far. I was also pointed to some other interesting content along the way, like Lawvere theories and the cohomology associated to a monad.

Computational mathematics

A Postdoc at our lab, Andres Goens comes from a pure math background. While we were discussing potential research ideas (since I'm still trying to formulate my plan for PhD), he mentioned that we could provide a formal semantics for the GAP programming language in Lean. This project is definitely up my alley, since it involves computational math (yay), Lean (yay), and formal verification (yay).

Learning GAP

I decided I needed to know some fundamental algorithms of computational group theory, so I skimmed the book Permutation group algorithms by Serees which explains the fundamental algorithms behind manipulating finite groups computationally, such as the Todd Coxeter coset enumeration algorithm and the Schrier Sims group decomposition algorithm. I loved the ideas involved, and implemented these at bollu/CASette.

I'd also half-read the textbook 'Cox, Little, OShea: Computational Algebraic Geometry' which I picked up again since I felt like I ought to revisit it after I had seen more algebraic geometry, and also because I wanted to be better informed about computational mathematics. I felt like this time around, I felt many of the theorems (such as the hilbert basis theorem) 'in my bones'. Alas, I couldn't proceed more than the second chapter since other life things took priorty. Perhaps I'll actually finish this book next year :).

Cardistry

For something completely different, I got interested in Cardistry and shuffling thanks to Youtube. I started learning interesting shuffles like the riffle shuffle, and soon got interested in the mathematics involved. I would up reading some of the book Group representations for probability and statistics by Persi Diaconis, a magician turned mathematician who publishes quite a bit on permutation groups, shuffling, and the like.

Symmetric group

I really enjoyed learning the detailed theory of the representation theory of the symmetric group, which I had read patchily before while studying Fourier analysis on the symmetric group. A lot of the theory still feels like magic to me; in particular, Specht modules are so 'magic' that I would find it hard to reconstruct them from memory.

Competitive peogramming

I need more practice at competitive programming. In fact, I'm downright atrocious, as I'm rated "pupil" on codeforces. If I had to debug, it's a combination of several factors:

I get discouraged if I can't solve a problem I think I "ought to be able to solve".
I consider myself good at math and programming, and thus being bad at problem solving makes me feel bad about myself.
I tend to overthink problems, and I enjoy using heavy algorithmic machinery, when in reality, all that's called for is a sequence of several observations.
Codeforces' scoring system needs one to be fast at solving problems and implementing them precisely. I don't enjoy the time pressure. I'd like a scoring system based on harder problems, but less emphasis on time-to-solve.

To get better, I've been studying more algorithms (because it's fun). I took the coursera course on string algorithms and read the textbook algorithms on strings. I loved the ideas of building a prefix automata in linear time. The algorithm is vey elegant, and involves a fundamental decomposition of regular grammar via the Myhill Nerode theorem. You can find my string algorithm implementations here.

Hardness of codeforces problems

Another thing I kept getting tripped up by was the fact that problems that were rated "easy" on codeforces tended to have intuitive solutions, but with non-trivial watertight proofs. An example of this was the question 545C on codeforces, where the tutorial gives a sketch of a exchange argument. Unfortunately, filling in all the gaps in the exchange argument is quite complicated. I finally did arrive at a much longer proof. This made me realize that competitive programming sometimes calls for "leaps" that are in fact quite hard to justify. This kept happening as I solved problems. To recitfy the state of affairs, I began documenting formal proofs to these problems. Here's a link to my competitive programming notes, which attempts to formally state and prove the correctness of these questions.

Discrete differential geometry

I love the research of Keenan Crane, who works on bridging old school differential geometry with computational techniques. All of his papers are lucid, full of beautiful figures and crazy ideas.

Replusive curves

Chris Yu, Henrik Schumacherm, and Keenan have a new paper on Repulsive Curves is really neat. It allows one to create curves that minimize a repulsive force, and can be subject to other arbitrary constraints. The actual algorithm design leads one to think about all sorts of things like fractional calculus. To be honest, I find it insane that fractional calculus finds a practical use. Definitely a cool read.

SAGE implementation

I have a work in progress PR that implements Keenan Crane's Geodesics in Heat algorithms within SAGE. Unfortunately, the problem was this implementing this requires heavy sparse numerical linear algebra, something that sage did not have at the time I attempted this.
This led to me opening an issue about sparse Cholesky decomposition on the SAGE issue tracker.
Happily, the issue was fixed late this year by SAGE pulling in cvxopt as a dependency!
I can get back to this now in 2022, since there's enough support within SAGE now to actually succeed!

Writing a text editor (dropped)

I started writing a text editor, because earlier tools that I'd written for myself such as ward for blackboarding, and my custom blog generator all worked really well for me, as they fit to my idiosyncracies. I tried writing a terminal based editor at bollu/edtr following the kilo tutorial. Unfortunately, building a text editor is hard work, especially if one wants modern convenienes like auto-complete.

I've postponed this project as one I shall undertake during the dark night of the soul every PhD student encounters when writing their thesis. I plan to write a minimal lazily evaluated language, and great tooling around that language as a means to while away time. But this is for future me!

DnD

My partner got me into playing dungeons and dragons this year. I had a lot of fun role-playing, and I plan to keep it up.

Nomic

Nomic is a neat game about changing the rules of the game. It takes a particular type of person to enjoy it, I find, but if you have the type of people who enjoy C++ template language lawyering, you'll definitely have a blast!

Continuum

I found the continuum RPG, a game about time travel very unique, due to the massive amount of lore that surrounds it, and game mechanics which revolve around creating time paradoxes to deal damage to those stuck in it. It appears to have a reputation of being a game that everybody loves but nobody plays.

Microscope

Microscope is a game about storytelling. I unfortunately was never able to host it properly because I was busy, and when I wasn't busy, I was unsure of my abilities as dungeon master :) But it definitely is a game I'd be stoked to play. I'm thnking of running it early 2022 with my group of friends.

Odds and ends

The portal group

I joined the portal group on discord, which consist of folks who follow Eric Weinstein's philosophy, broadly speaking. The discod is a strange mileu. I hung around because there were folks who knew a lot of math and physics. I would up watching the geometric anatomy of theoretical physics lectures on YouTube by Fredrick Schuller. The lectures are great expository material, though the hardness ramps up like a cliff towards the end, because it feels like he stops proving things and beings to simply state results. Regardless, I learnt a lot from it. I think my favourite takeaway was the Serre Swann theorem which makes very precise the idea that "projective modules are like vector bundles".

Differential geometry, again

Similarly, I would up realizing that my differential geometry was in fact quite weak, in terms of computing things in coordinates. So I wound up re-reading Do carmo: differential geometry of curves and surfaces, and I implemented the coordinate based computations in Jupyter notebooks. For example, here is a Jupyter notebook that calculates covariant derivatives explicitly. I found that this forced me to understand what was "really going on". I now know slogans like:

The Covariant Derivative is the projection of the global derivative onto the tangent space. The Christoffel Symbols measure the second dervative(acceleration) along the tangent space.

I got interested in the work of Elizaboth Polgreen. In particular, I found the idea of being able to extend an SMT solver with arbitrary black-box functions pretty great. I read their technical report on SMT modulo oracles and implemented the algorithm.

What I want for next year

I wish to learn how to focus on one thing. I'm told that the point of a PhD is to become a world expert on one topic. I don't have a good answer of what I wish to become a world expert on. I like the varied interessts I have, so it'll be interesting as to how this pans out. However, I have decided to place all my bets on the Lean ecosystem, and I plan on spending most of 2022 writing Lean code almost always (or perhaps even always). I wish to understand all parts of the Lean compiler, from the frontend with its advanced macro system, to the middle end with its dependent typing, to the back-end. In short, I want to become an expert on the Lean4 compiler :). Let's see how far along I get!

Cayley hamilton for 2x2 matrices in sage via AG

I want to 'implement' the zariski based proof for cayley hamilton in SAGE and show that it works by checking the computations scheme-theoretically.
Let's work through the proof by hand. Take a 2x2 matrix [a, b; c, d].
The charpoly is |[a-l; b; c; d-l]| = 0, which is p(l) = (a-l)(d-l) - bc = 0
This simplified is p(l) = l^2 - (a + d) l + ad - bc = 0.
Now, let's plug in l = [a; b; c; d] to get the matrix eqn
[a;b;c;d]^2 - (a + d)[a;b;c;d] + [ad - bc; 0; 0; ad - bc] = 0.
The square is going to be [a^2 +]
Let X be the set of (a, b, c, d) such that the matrices [a;b;c;d] satisfy their only charpoly.
Consider the subset U of the set (a, b, c, d) such that the matrix [a;b;c;d] has distinct eigenvalues.
For any matrix with distinct eigenvalues, it is easy to show that they satisfy their charpoly.
First see that diagonal matrices satisfy their charpoly by direct computation: [a;0;0;b] has eigenvalues (a, b). Charpoly is l^2 - l(a + b) + ab. Plugging in the matrix, we get [a^2;0;0;b^2] - [a(a+b);0;0;b(a+b)] + [ab;0;0;ab] which cancels out to 0.
Then note that similar matrices have equal charpoly, so start with |(λI - VAV')| = 0. rewrite as (VλIV' - VAV') = 0, which is V(λI - A)V' = 0, which is the same λI - A = 0.
Thus, this means that a matrix with distinct eigenvalues, which is similar to a diagonal matrix (by change of basis), has a charpoly that satisfies cayley hamilton.
Thus, the set of matrices with distinct eigenvalues, U is a subset of X.
However, it is not sufficient to show that the system of equations has an infinite set of solutions.
For example, xy = 0 has infinite solutions (x=0, y=k) and (x=l, y=0), but that does not mean that it is identically zero.
This is in stark contrast to the 1D case, where a polynomial p(x) = 0 having infinite zeroes means that it must be the zero polynomial.
Thus, we are forced to look deeper into the structure of solution sets of polynomials, and we need to come up with the notion of irreducibility.
See that the space K^4 is irreducible, where K is the field from which we draw coefficients for our matrix.
Next, we note that X is a closed subset of k^4 since it's defined by the zero set of the polynomial equations.
We note that U is an open subset of k^4 since it's defined as the non-zero set of the discriminant of the charpoly! (ie, we want non-repeated roots)
Also note that U is trivially non-empty, since it has eg. all the diagonal matrices with distinct eigenvalues.
So we have a closed subset X of k^4, with a non-empty open subset U inside it.
But now, note that the closure of U must lie in X, since X is a closed set, and the closure U of the subset of a closed set must lie in X.
Then see that since the space is irreducible, the closure of U (an open) must be the whole space.
This means that all matrices satisfy cayley hamilton!

LispWorks config

Looks like all emacs keybindings just work
https://www.nicklevine.org/declarative/lectures/additional/key-binds.html

Birkhoff Von Neumann theorem

By Frobenius Konig theorem, $A$ must have block structure:

   r
s [B|C]
  --+---
  [0|D]

Where $r + s = n + 1$
The column sum of $B$ is $1$ for all $j$. So $B^i_j 1^j = 1$
The row sum of $B$ is less than or equal to $1$ for all $j$. So $B^i_j 1_i \leq 1$
From the first sum, we get the total sum as $\sum_{i, j} B[i][j] = sk$
From the second sum, we get the total sum as $\sum_{i, j} B[i][j] \leq (n-r)k$.
In total, we get $(n-r)k \leq sk$ which implies $s + r \leq n$ which is a contradiction because $s + r = n + 1$.

Proof 1 of BVN (Constructive)

Let's take a 3x3 doubly stochastic matrix:

[#0.4  0.3  0.3]
[0.5   #0.2 0.3]
[0.1   0.5  #0.4]

By some earlier lemma, since permanant is greater than zero, the graph has a perfect matching.
Suppose we know how to find a perfect matching, which we know exists. Use flows (or hungarian?)
Take the identity matching as the perfect matching (1-1, 2-2, 3-3).
Take the minimum of the matches, min(0.4, 0.2, 0.4) = 0.2. So we write the original matrix as:

0.2 [1 0 0]    [0.2 0.3 0.3]
    [0 1 0] +  [0.5 0   0.3]
    [0 0 1]    [0.1 0.5 0.4]

Second matrix has row/col sums of 0.8. Rescale by dividing by 0.8 to get another doubly stochastic matrix.
Then done by induction on the number of zeroes amongst the matrix entries.

[0.2 0.3 0.3]
[0.5 0   0.3]
[0.1 0.5 0.4]

(2) Take the matching given by:

[#0.2  0.3   0.3]
[0.5   0    #0.3]
[0.1  #0.5   0.4]

(2) This can be written as:

   [1 0 0]   [0    0.3   0.3]
0.2[0 0 1] + [0.5  0     0.1]
   [0 1 0]   [0.1  0.3   0.4]

And so on.

Nice method to find permutation that makes progress

NxN doubly stochastic. We must have a permutation that forms a perfect matching. How to find it?
If all elements are 0/1, then it's already a permutation.
Otherwise, find a row which has an element a between 0/1. Then this means that the same row will have ANOTHER element b betwene 0/1.
Then the column of this element b will have another element c between 0/1. Keep doing this until you find a loop.
Then find the minimum of these elements, call it $\epsilon$.
Subtract $\epsilon$ at the element that had value $\epsilon$. Then add epislon to the element that was in the same row(column). Then continue, subtract $\epsilon$ for the pair of this.

Latin Square

A latin square of order $N$ is an $N \times N$ array in which each row and column is a permutation of ${ a_1, a_2, \dots, a_n }$.
Example latin square (to show that these exist):

[1 2 3 4]
[2 3 4 1]
[3 4 1 2]
[4 1 2 3]

A $k \times n$ ($k < n$) latin rectangle is a $k \times n$ matrix with elements ${ a_1, a_2, \dots, a_n }$ such that in each row and column, no element is repeated.
Can we always complete a Latin rectangle into a Latin square? (YES!)

Lemma

Let $A$ be a $k \times n$ latin rectangle with $k \leq n - 1$.
We can always augment $A$ into a $(k + 1) \times n$ latin rectangle.
If we thnk of it as a set system, then we can think of each column as telling us the missing sets. Example:

[1   2   3   4]
[4   1   2   3]
{2} {3} {1} {1}
{3} {4} {4} {2}

Let's think of the subsets as a 0/1 matrix, encoded as:

[0 1 1 0] {2, 3}
[0 0 1 1] {3, 4}
[1 0 0 1] {1, 4}
[1 1 0 0] {1, 2}

It's clear that each row will have sum $2$, since each set has 2 elements.
We claim that each column also has sum $2$.
For example, the first column has column sum $2$. This is because in the original matrix, $1$ is missing in two columns.
We can computea perfect matching on the permutation matrix, that tells us how to extend the latin square with no overlaps.

Assignment Problem

Let $A$ be an $n \times n$ non-negative matrix.
A permutation $\sigma$ of $[1, \dots, n]$ is called a simple assignment if $A[i][\sigma(i)]$ is positive for all $i$.
A permutation $\sigma$ is called as an optimal assignment if $\sum_i A[i][\sigma(i)]$ is minimized over all permutations in $S_n$. (weird? Don't we usually take max?)
Matrix $A[p][j]$ is the cost of assigining person $p$ the job $j$. Want to minimize cost.

4x4 example

Let the cost be:

[10 12 19 11]
[5  10 07 08]
[12 14 13 11]
[8  15 11  9]

First find some numbers $u[i]$ and $v[i]$ (these correspond to dual variables in the LP) such that $a[i][j] \leq u[i] + v[j]$ for all $i, j$

     v[1] v[2] v[3] v[4]
u[1] [10  12  19   11]
u[2] [5   10  07   08]
u[3] [12  14  13   11]
u[4] [8   15  11    9]

We can start by setting $u[r] = 0$, $v[c] = \minr$. (Can also take $v[c] = 0$ but this is inefficient)
Circle those positions where equality holds. This becomes:

     v[1] v[2] v[3] v[4]
u[1] [10  12   19    11]
u[2] [5#  10#  07#   08#]
u[3] [12  14   13    11]
u[4] [8   15   11     9]

Since $a[i][j] \leq u[i] + v[j]$, this implies that $a[i][\sigma(i)] \geq u[i] + v[\sigma(i)]$.
This means $\sum a[i][\sigma(i)] \geq \sum_i u[i] + v[\sigma(i)] = \sum_i u[i] + v[i]$ (the summation can be rearranged).
Now think of the bipartite graph where these circled positions correspond to $1$, the rest correspond to $0$s. If we have a perfect matching amongst the circled positions, then that is the solution (??)
If the circled positions DO NOT have a perfect matching, then by Fobenius Konig, we can write the matrix as:

    s  n-s
n-r[B | C]
r  [X | D]

r + s = n + 1

where in $X$, no entry is circled, because entries that are circled correspond to zeroes (conceptually?)
We add $1$ to $u[\geq (n-r)]$s D. We subtract $1$ for $v[\geq s]$. That is:

    -1
   B C
+1 X D

Nothing happens to $B$.
in $C$, $v$ goes down, so that keeps the inequality correct.
In $X$, there are no circles, which means everything was a strict ineuality, so we can afford to add 1s.
In $D$, $u$ goes up by $1$, $v$ goes down by $1$, thus total no change? [I don't follow].
The net change is going to be $+1(r) - 1 (n - s) = r + s - n = (n+1) - n = 1$.
The nonconstructive part is decomposing the matrix into $[B; C, X, D]$.

Hungarian algorithm

Take minimum in each row, subtract.
Take minimumin each col, subtract.

Interpolating homotopies

If we have kp + (1-k) q and a contractible space X which contracts to point c, where image of p is x and imagine of q is y, then send the above point to theta(x, 2k) : k <= 1/2 and theta (y, 1-2(k - 1/2))or theta (y, 2-2k)
This interpolates p---q to x--c--y by using bary coordinates to interpolate along homotopy.

Example where MIP shows extra power over IP

God tells us chess board is draw. Can't verify
If two Gods, can make one God play against the other. So if one says draw, other says win, can have them play and find out who is lying!
Hence, MIP has more power than IP? (Intuitively at least).

Lazy reversible computation?

Lazy programs are hard to analyze because we need to reason abot them backwards.
Suppose we limit ourselves to reversible programs. Does it then become easy?

Theorem coverage as an analogue to code coverage

Theorem coverage: how many lines of code are covered by correctness theorems?

Lazy GPU programming

All laziness is a program analysis problem, where we need to strictify.
Lazy vectorization is a program analysis problem where we need to find "strict blocks/ strict chains". Something like "largest block of values that can be forced at once". Seems coinductive?
Efficiency of lazy program is that of clairvoyant call by value, so we need to know how to force.
In the PRAM model, efficiency of parallel lazy program is that of clairvoyanl call by parallel or something. We need to know how to run in parallel, such that if one diverges, then all diverges. This means it's safe to run together!
What is parallel STG?
PRAM: try to parallelize writes as much as possible
PSTG: try to parallelize forcing as much as possible
Reads (free) ~ forces (conflicts in ||)
Write (conflict in ||) ~ create new data (free)
What are equivalents of common pram models?
Force is Boolean: either returns or does not return
We need a finer version. Something like returns in k cycles or does not return?
Old forced values: forced values with T; wait = Infinity; Old unobserved value = forced values with Twait = -1
Think of call by push value, but can allocate forces and call "tick" which ticks the clock. The Twait is clocked wrt this tick.
Tick controls live ranges. Maybe obviates GC.
Tick 1 is expected to be known/forced.
Optimize in space-time? Looking up a recurrence versus computing a recurrence. One is zero space infinite time, other is zero time infinite space.
Another way to think about it: application says how many people need to ask for thunk to get value. Unused values say infinity, used values say zero
Maybe think of these as deadlines for the compiler to meet? So it's telling the compiler to guarantee access in a certain number of ticks. This gives control over (abstract) time, like imperative gives control over abstract space?
TARDIS autodiff is key example. As is fib list. Maybe frac.
Thinking about design in the data constructor side:
Twait = 0 in data structure means is present at compile time. Twait = 1 is strict. Twait = infty is function pointer. What is Twait = 2? Mu
Can fuse kernels for all computations in the same parallel force. If one of them gets stuck, all of them get stuck. So parallel force is a syntactic way to ask for kernel fusion.
Can we use UB to express things like "this list will be finite, thus map can be safely parallelised" or something?
Have quantitative: 0,1,fin,inf?

Projections onto convex sets [TODO]

https://en.wikipedia.org/wiki/Projections_onto_convex_sets

BGFS algorithm for unconstrained nonlinear optimization [TODO]

https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm

LM algorithm for nonlinear least squares [TODO]

https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm

Backward dataflow and continuations

Forward dataflow deals with facts thus far.
Backward dataflow deals with facts about the future, or the rest of the program. Thus, in a real sense, backward dataflow concerns itself with continuations!

The tyranny of structurelessness

THE TYRANNY of STRUCTURELESSNESS by Jo Freeman aka Joreen

"Elitist" is probably the most abused word in the women's liberation movement. It is used as frequently, and for the same reasons, as "pinko" was used in the fifties. It is rarely used correctly. Within the movement it commonly refers to individuals, though the personal characteristics and activities of those to whom it is directed may differ widely: An individual, as an individual can never be an elitist, because the only proper application of the term "elite" is to groups. Any individual, regardless of how well-known that person may be, can never be an elite.

The inevitably elitist and exclusive nature of informal communication networks of friends is neither a new phenomenon characteristic of the women's movement nor a phenomenon new to women. Such informal relationships have excluded women for centuries from participating in integrated groups of which they were a part. In any profession or organization these networks have created the "locker room" mentality and the "old school" ties which have effectively prevented women as a group (as well as some men individually) from having equal access to the sources of power or social reward.

Although this dissection of the process of elite formation within small groups has been critical in perspective, it is not made in the belief that these informal structures are inevitably bad -- merely inevitable. All groups create informal structures as a result of interaction patterns among the members of the group. Such informal structures can do very useful things But only Unstructured groups are totally governed by them. When informal elites are combined with a myth of "structurelessness," there can be no attempt to put limits on the use of power. It becomes capricious.

Simple Sabotage Field Manual

(1) Insist on doing everything through "channels." Never permit short-cuts to be taken in order to expedite decisions.
(2) Make "speeches." Talk as frequently as possible and at great length. Illustrate your "points" by long anecdotes and accounts of personal experiences. Never hesitate to make a few appropriate "patriotic" comments.
(3) When possible, refer all matters to committees, for "further study and consideration." Attempt to make the committees as large as possible - never less than five.
(4) Bring up irrelevant issues as frequently as possible.
(5) Haggle over precise wordings of communications, minutes, resolutions.
(6) Refer back to matters decided upon at the last meeting and attempt to re-open the question of the advisability of that decision.
(7) Advocate "caution". Be "reasonable" and urge your fellow-conferees to be "reasonable" and avoid haste which might result in embarrassments or difficulties later on.
(8) Be worried about the propriety of any decision — raise the question of whether such action as is contemplated lies within the jurisdiction of the group or whether it might conflict with the policy of some higher echelon.

Counting permutations with #MAXSAT

Using #MAXSAT, you can count permutations, weird. Build a complete bipartite graph K(n,n), and then connect left to source, right to sink with unit capacity. Each solution to the flow problem is an assignment / permutation.

Coloring `cat` output with `supercat`

use spc -e 'error, red' to color all occurrences of string error with red.
I use this in lean-mlir to get colored output.

Reader monoid needs a hopf algebra?! [TODO]

5.1, eg (iii)
We actually get a free comonoid in a CCC.
having a splittable random supply in like having a markov category with a comonoid in it.

Monads mnemonic

multiplication is $\mu$ because Mu.
return is $\eta$ because return is unit is Yeta.

Card stacking

It's not about the idea, it's about the execution

The idea is indeed pedestrain: Let's stack cards!
The execution is awesome.
Link to homepage of insane card stacker

Representation theory for particle physics [TODO]

References

SSH into google cloud

Setup firewall rules that enable all SSH
Add SSH key into metadata of project.
ssh <ssh-key-username>@<external-ip> ought to just work.

Comma & Semicolon in index notation

A comma before an index indicates partial differentiation with respect to that index. A semicolon indicates covariate differentiation.

Thus, the divergence may be written as v_i,i

Spin groups

Spin group is a 2 to 1 cover of $SO(n)$.
We claim that for 3 dimensions, $Spin(3) \simeq SU(2)$. So we should have a 2 to 1 homomorphism $\rho: SU(2) \to SO(3)$.
We want to write the group in some computational way. Let's use the adjoint action (how the lie group acts on its own lie algebra).
What is the lie algebra $su(2)$? It's trace-free hermitian.
Why? Physicist: $UU^\dagger = I$ expanded by epsilon gives us $(I + i \epsilon H)(I - i \epsilon H) = I$, which gives $H = H^\dagger$.
Also the determinant condition gives us $det(1 + i \epsilon H) = 1$ which means $1 + tr(i \epsilon H) = 1$, or $tr(H) = 0$.
The adjoint action is $SU(2) \to Aut(H)$ given by $U \mapsto \lambda x. ad_U x$ which is $\lambda x. U X U^{-1}$. By unitarry, this is $U \mapsto \lambda x. U X U^{\dagger}$.
$SO(3)$ acts on $\mathbb R^3$. The trick is to take $\mathbb R^3$ and compare it to the lie algebra $su(2)$ which has 3 dimensions, spanned by pauli matrices.
Conjecture: There is an isomorphism $\mathbb R^3 \simeq H$ as an inner product space for a custom inner product $\langle, \rangle$ on $H$.
Reference

How to write in biblical style?

I'd like to write in the style of the bible!

Undefined behaviour is like compactification [TODO]

We compactify something like $\mathbb N$ into $\mathbb N^\infty$.
What does Stone Cech give us?
Read abstract stone duality!

God of areppo

One day, a farmer named Arepo built a temple at the edge of his field. It was a humble thing, with stone walls and a thatch roof. At the center of the room Arepo stacked some stones to make a cairn. Two days later, a god moved into Arepo's temple. "I hope you are a harvest god," Arepo said, as he set upon the altar two stalks of wheat which he burned. "It would be nice."

He looked down upon the ash that now colored the stone. "I know this isn't much of a sacrifice, but I hope this pleases you. It'd be nice to think there is a god looking after me."

The next day, he left a pair of figs. The day after that, he spent ten minutes in silent prayer. On the third day, the god spoke up.

"You should go to a temple in the city," said a hollow voice. Arepo cocked his head at the otherworldly sound, because it was strangely familiar. The god's voice was not unlike the rustling of wheat, or the little squeaks of fieldmice that run through the grass. "Go to a real temple. Find a real god to bless you, for I am not much myself, but perhaps I may put in a good word?"

The god plucked a stone from the floor and sighed, "Forgive me, I meant not to be rude. I appreciate your temple, and find it cozy and warm. I appreciate your worship, and your offerings, but alas it shall come to naught."

"Already I have received more than I had expected," Arepo said, "Tell me, with whom do I treat? What are you the patron god of?"

The god let the stone he held fall to the floor, "I am of the fallen leaves, and the worms that churn beneath he ground. I am the boundary of the forest and the field, and the first hint of frost before the snow falls," the god paused to touch Arepo's altar, "And the skin of an apple as it yields beneath your teeth. I am the god of a dozen different nothings, scraps that lead to rot, and momentary glimpses." He turned his gaze to Arepo, "I am a change in the air, before the winds blow."

The god shook his head, "I should not have come, for you cannot worship me. Save your prayers for the things beyond your control, good farmer," the god turned away, "You should pray to a greater thing than I,"

Arepo reached out to stay the entity, and laid his hand upon the god's willowy shoulder. "Please, stay."

The god turned his black eyes upon Arepo, but found only stedfast devotion. "This is your temple, I would be honored if you would stay." The god lowered himself to the floor. Arepo joined him. The two said nothing more for a great long while, until Arepo's fellow came calling.

The god watched his worshiper depart, as the man's warmth radiated across the entity's skin.

Next morning, Arepo said a prayer before his morning work. Later, he and the god contemplated the trees. Days passed, and then weeks. In this time the god had come to enjoy the familiarity of Arepo's presence. And then, there came a menacing presence. A terrible compulsion came upon the god, and he bid the air change, for a storm was coming. Terrified, the little god went to meet the god of storms to plead for gentleness, but it was no use.

Arepo's fields became flooded, as the winds tore the tiles from his roof and set his olive tree to cinder. Next day, Arepo and his fellows walked among the wheat, salvaging what they could. At the field's edge, the little temple was ruined. After his work was done for the day, Arepo gathered up the stones and pieced them back together. "Please do not labor," said the god, "I could not protect you from the god of storms, and so I am unworthy of your temple."

"I'm afraid I don't have an offering today," Arepo said, "But I think I can rebuild your temple tomorrow, how about that?"

The god watched Arepo retire, and then sat miserably amongst the ruined stones of his little temple.

Arepo made good on his promise, and did indeed rebuild the god's temple. But now it bore layered walls of stone, and a sturdy roof of woven twigs. Watching the man work, Arepo's neighbors chuckled as they passed by, but their children were kinder, for they left gifts of fruit and flowers.

The following year was not so kind, as the goddess of harvest withdrew her bounty. The little god went to her and passionately pleaded for mercy, but she dismissed him. Arepo's fields sprouted thin and brittle, and everywhere there were hungry people with haunted eyes that searched in vain for the kindness of the gods.

Arepo entered the temple and looked upon the wilted flowers and the shriveled fruit. He murmured a prayer.

"I could not help you," said the god. "I am only a burden to you,"

"You are my friend," said Arepo.

"You cannot eat friendship!" The god retorted.

"No, but I can give it." Arepo replied.

And so the man set his hand upon the altar and spent the evening lost in contemplation with his god.

But the god knew there was another god who would soon visit, and later that year came the god of war. Arepo's god did what he could. He went out to meet the hateful visage of the armored god, but like the others, war ignored the little god's pleas. And so Arepo's god returned to his temple to wait for his friend. After a worrying amount of time, Arepo came stumbling back, his hand pressed to his gut, anointing the holy site with his blood.

Behind him, his fields burned.

"I am so sorry, Arepo," said the god, "My friend. My only friend."

"Shush," said Arepo, tasting his own blood. He propped himself up against the temple that he made, "Tell me, my friend, what sort of god are you?"

The god reached out to his friend and lowered him to the cool soil, "I'm of the falling leaves," the god said, as he conjured an image of them. "And the worms that churn beneath the earth. The boundary of the forest and the field. The first hint of frost before the first snow. The skin of an apple as it yields beneath your teeth."

Arepo smiled as the god spoke. "I am the god of a dozen different nothings, the god of the petals in bloom that lead to rot, and of momentary glimpses, and a change in the air-" the god looked down upon his friend, "Before the winds blow everything away."

"Beautiful," Arepo said, his blood now staining the stones; seeping into the very foundations of his temple. "All of them, beautiful,"

"When the storm came, I could not save your wheat."

"Yes," Arepo said.

"When the harvest failed, I could not feed you."

"Yes,"

Tears blurred the god's eyes, "When war came, I could not protect you."

"My friend, think not yourself useless, for you are the god of something very useful,"

"What?"

"You are my god. The god of Arepo."

And with that, Arepo the sower lay his head down upon the stone and returned home to his god. At the archway, the god of war appeared. The entity looked less imposing now, for his armor had fallen onto the blackened fields, revealing a gaunt and scarred form.

Dark eyes flashed out from within the temple, 'Are you happy with your work?' They seemed to say. The god of war bowed his head, as the god of Arepo felt the presence of the greater pantheon appear upon the blackened fields.

"They come to pay homage to the farmer," war said, and as the many gods assembled near the archway the god of war took up his sword to dig into the earth beneath Arepo's altar. The goddess of the harvest took Arepo's body and blessed it, before the god of storms lay the farmer in his grave.

"Who are these beings, these men," said war, "Who would pray to a god that cannot grant wishes nor bless upon them good fortune? Who would maintain a temple and bring offerings for nothing in return? Who would share their company and meditate with such a fruitless deity?"

The god rose, went to the archway; "What wonderful, foolish, virtuous, hopeless creatures, humans are."

The god of Arepo watched the gods file out, only to be replaced by others who came to pay their respects to the humble farmer. At length only the god of storms lingered. The god of Arepo looked to him, asked; "Why do you linger? What was this man to you?"

"He asked not, but gave." And with that, the grey entity departed.

The god of Arepo then sat alone. Oft did he remain isolated; huddled in his home as the world around him healed from the trauma of war. Years passed, he had no idea how many, but one day the god was stirred from his recollections by a group of children as they came to lay fresh flowers at the temple door.

And so the god painted the sunset with yellow leaves, and enticed the worms to dance in their soil. He flourished the boundary between the forest and the field with blossoms and berries, and christened the air with a crisp chill before the winter came. And come the spring, he ripened the apples with crisp red freckles that break beneath sinking teeth, and a dozen other nothings, in memory of a man who once praised his work with his dying breath.

"Hello," said a voice.

The god turned to find a young man at the archway, "Forgive me, I hope I am not intruding."

"Hello, please come in."

The man smiled as he entered, enchanted the the god's melodic voice. "I heard tell of your temple, and so I have come from many miles away. Might I ask, what are you the god of?"

The god of Arepo smiled warmly as he set his hand upon his altar, "I am the god of every humble beauty in the world." -by Chris Sawyer

Classification of lie algebras, dynkin diagrams

Classification of complex lie algebras

$L$ is a complex vector space with a lie bracket $[., .]$.
For example, if $G$ is a complex Lie group. For a complex manifold, the transition functions are holomorphic.

Theorem (Leri)

Every finite dimensional complex Lie algebra $(L, [.,.])$ can be decomposed as $L = R \oplus_s (L_1 \dots \oplus L_n)$, where $\oplus$ is direct sum, $\oplus_s$ is the semidirect sum.
$R$ is a solvable lie algebra.
To define solvable, define $R_0 = R$, $R_1 = [R_0, R_0]$, $R_2 = [R_1, R_1]$, that is, $R_2 = [[R, R], [R, R]]$.
We have that $R_{i+1}$ is a strict subset of $R_i$.
If this sequence eventually stabilizes, ie, there is an $n$ such that $R_n = { 0 }$, then $R$ is solvable.
In the decomposition of $L$, the $R$ is the solvable part.
We have $L_1$, \dots, $L_n$ which are simple. This means that $L_i$ is non-abelian, and $L_i$ contains no non-trivial ideals. An ideal of a lie algebra is a subvevtor space $I \subseteq L$ such that $[I, L] \subseteq I$. (It's like a ring ideal, except with lie bracket).
The direct sum $L_1 \oplus L_2$ of lie algebras is the direct sum of vector spaces with lie bracket in the bigger space given by $[L_1, L_2] = 0$.
The semidirect sum $R \oplus_s L_2$ as a vector space is $R \oplus L_2$. The lie bracket is given by $[R, L_2] \subseteq R$, so $R$ is an ideal. (This looks like internal semidirect product).

Remarks

It is very hard to classify solvable Lie algebras.
A lie algebra that has no solvable part, ie can be written as $L = L_1 \dots \oplus L_n$ is called as semi-simple.
It is possible to classify the simple Lie algebras.
We focus on the simple/semi-simple Lie algebras. Simple Lie algebras are the independent building blocks we classify.

Adjoint Map

Let $(L, [., .])$ be a complex lie algebra. Let $h \in L$ be an element of the lie algebra.
Define $ad(h): L \to L$ as $ad(h)(l) \equiv [h, l]$. Can be written as $ad(h) \equiv [h, -]$. This is the adjoint map wrt $h \in L$.

Killing form

$K: L \times L \to \mathbb C$ is a bilinear map, defined as $K(a, b) \equiv tr(ad(a) \circ ad(b))$.
See that $ad(a) \circ ad(b): L \to L$. the trace will be complex because $L$ is complex.
Since $L$ is finite dimensional vector space, $tr$ is cyclic. So $tr(ad(a) \circ ad(b)) = tr(ad(b) \circ ad(a))$. This means that $K(a, b) = K(b, a)$, or that the killing form is symmetric!
Cartan criterion: $L$ is semi-simple iff the killing form $K$ is non-degenerate. That is, $K(a, b) = 0$ implies $b = 0$.

Calculation wrt basis: $ad$ map.

Consider for actual calculation the components of $ad(h)$ and $K$ with respect to a basis $E_1, \dots, E_{dim L}$.
Write down a dual basis $\epsilon^1, \epsilon^{dim L}$.
$ad(E_i)^j_k \equiv \epsilon^j (ad(E_i)(E_k))$.
We know that $ad(E_i)(E_k) = [E_i, E_k]$ by definition.
We write $[E_i, E_k] = C^m_{ik} E_m$ where the $C^m_{ik}$ are the structure constants.
This gives us $ad(E_i)^j_k = \epsilon^j (C^m_{ik} E_m)$
Pull out structure coefficient to get $ad(E_i)^j_k = C^m_{ik} \epsilon^j (E_m)$
Use the fact that $E_m$ and $\epsilon_j$ are dual to get $ad(E_i)^j_k = C^m_{ik} \delta^j_m$
Contract over repeated index $m$ to get $m=j$: $ad(E_i)^j_k = C^j_{ik}$
This makes sense, since the $ad$ map is just a fancy way to write the bracket in coordinate free fashion.

Calculation wrt basis: Killing form.

$K(E_i, E_j) = tr(ad(E_i) \circ ad(E_j))$
Plug in $ad$ to become $K(E_i, E_j) = tr(C^l_{im} C^m_{jk})$ [see that the thing inside the trace is a matrix]
Execute trace by setting $l = k = o$. This gives us: $K(E_i, E_j) = C^o_{im} C^m_{jo}$. This is also easy to calculate from structure coefficients.
Iff this matrix is non-degenerate, then the lie-algebra is semi-simple.

$ad$ is anti-symmetric with respect to the killing form.

Recall that $\phi$ is called as an anti-symmetric map wrt a non-degenerate bilinear form $B$ iff $B(\phi(v), w) = - B(v, \phi(w))$.
Fact: $ad(h)$ is anti-symmetric wrt killing form. For killing form to be non-degenerate we need $L$ to be semisimple.

Key Definition for classification: Cartan subalgebra

If $(L, [.,.])$ is a lie algebra, then the cartan subalgebra denoted by $H$ ($C$ is already taken for structure coeff.) is a vector space, and is a maximal subalgebra of $L$ such that there exists a basis $h_1, \dots, h_m$ of $H$ that can be extended to a basis of $L$: $h_1, \dots, h_m, e_1, \dots, e_{dim(L)-m}$ such that the extension vectors are eigenvectors for any $ad(h)$ for $h \in H$.
This means that $ad(h)(e_\alpha) = \lambda_\alpha(h) e_\alpha$.
This can be written as $[h, e_\alpha] = \lambda_\alpha(h) e_\alpha$.
Does this exist?

Existence of cartan subalgebra

Thm Any finite dimensional lie algebra possesses a cartan subalgebra.
If $L$ is simple, then $H$ is abelian. That is, $[H, H] = 0$.
Thus, the $ad(h)$ are simultaneously diagonalized by the $e_\alpha$ since they all commute.

Analysis of Cartan subalgebra.

$ad(h)(e_\alpha) = \lambda_\alpha(h) e_\alpha$.
$[h, e_\alpha] = \lambda_\alpha(h) e_\alpha$.
Since the LHS is linear in $h$, the RHS must also be linear in $H$. But in the RHS, it is only $\lambda_\alpha(h)$ that depends on $h$.
This means that $\lambda_\alpha: H \to \mathbb C$ is a linear map!
This is to say that $\lambda_\alpha \in H^*$ is an element of the dual space!
The elements $\lambda_1, \lambda_2, \lambda_{dim L - m}$ are called the roots of the Lie algebra.
This is called as $\Phi \equiv { \lambda_1, \dots, \lambda_{dim L - m} }$, the root set of the Lie algebra.

Root set is closed under negation

We found that $ad(h)$ is antisymmetric with respect to killing form.
Thus, if $\lambda \in \phi$ is a root, $-\lambda$ is also a root (somehow).

Root set is not linearly independent

We can show that $\Phi$ is not LI.

Fundamental roots

Subset of roots $\Pi \subseteq \Phi$ such that $\Pi$ is linearly independent.
Let the elements of $\Pi$ be called $\pi_1, \dots, \pi_r$.
We are saying that $\forall \lambda \in \Phi, \exists n_1, \dots, n_f \in \mathbb N, \exists \epsilon \in { -1, +1 }$ such that $\lambda = \epsilon \sum_{i=1}^f n_i \pi_i$.
That is, we can generate the $\lambda$ as natural number combinations of $\pi_i$, upto an overall global sign factor.
Fact: such a set of fundamental roots can always be found.

complex span of fundamental roots is the dual of the cartan subalgebra

In symbols, this is $span_{\mathbb C}(\Pi) = H^*$.
They are not a basis of $H^*$ because they are not $\mathbb C$ independent (?)
$\Pi$ is not unique, since it's a basis.

Defn: $H_{\mathbb R}^*$

Real span of fundamental roots: $span_{\mathbb R}(\Pi)$.
We have that $\Phi = span_{\pm \mathbb N}(\Pi)$.
Thus $\Phi$ is contained in $span_{\mathbb R}(\Pi)$, which is contained in $span_{\mathbb C}(\Pi)$.

Defn: Killing form on $H^*$

We restrict $K: L \times L \to \mathbb C$ to $K_H: H \times H \to \mathbb C$.
What we want is $K^: H^ \times H^* \to \mathbb C$.
Define $i: H \to H^*$ given by $i(h) = K(h, \cdot)$.
$i$ is invertible if $K$ is non-degenerate.
$K^*(\mu, \nu) \equiv K(i^{-1}(\mu), i^{-1}(\nu))$.

$K^$ on $H^_{\mathbb R}$

The restricted action of $K^$ on $H^_{\mathbb R}$ will always spit out real numbers.
Also, $K^*(\alpha, \alpha) \geq 0$ and equal to zero iff $\alpha = 0$.
See that $K$ was non-degenerate, but $K^*_{\mathbb R}$ is a real, bona fide inner product!
This means we can calculate length and angles of fundamental roots.

Recovering $\Phi$ from $\Pi$

How to recover all roots from fundamental roots?
For any $\lambda \in Phi$, define the Weyl transformation $s_\lambda: H^\star_R \to H^\star_R$
The map is given by $s_\lambda(\mu) = \mu - 2 \frac{K^(\lambda, mu)}{K^(\lambda, \lambda)} \lambda$.
This is linear in $\mu$, but not in $\lambda$.
Such $s_\lambda$ are called as weyl transformations.
Define a $W$ group generated by the $s_\lambda$. This is called as the Weyl group.

Theorem: Weyl group is generated by fundamental roots

It's enough to create $s_\Pi$ to generate $W$.

Theorem: Roots are prouced by action of Weyl group on fundamental roots

Any $\lambda \in \Phi$ can be produced by the action of some $w \in W$ on some $\pi \in \Pi$.
So $\forall \lambda \in \Phi, \exists \pi \in Pi, \exists w \in W$ such that $\lambda = w(\pi)$.
This means we can create all roots from fundamental roots: first produce the weyl group, then find the action of the weyl group on the fundamental roots to find all roots.
The Weyl group is closed on the set of roots, so $W(\Phi) \subseteq \Phi$.

Showdown

Consider $S_{\pi_i}(\pi_j)$ for $\pi_i, \pi_j \in \Pi$.

Weird free group construction from adjoint functor theorem

We wish to construct the free group on a set $S$. Call the free group $\Gamma S$.
Call the forgetful functor from groups to sets as $U$.
The defining property of the free group is that if we are given a mapping $\phi: S \to UG$, a map which tells us where the generators go, there is a unique map $\Gamma \phi: \Gamma S \to G$ which maps the generators of the free group via a group homomorphism into $G$. Further, there is a bijection between $\phi$ and $\Gamma \phi$.
Written differently, there is a bijection $\hom_\texttt{Set}(S, UG) \simeq \hom_\texttt{Group}(\Gamma S, G)$. This is the condition for an adjunction.
The idea to construct $\Gamma S$ is roughly, to take all possible maps $f_i: S \to UG$ for all groups $G$, take the product of all such maps, and define $\Gamma S \equiv im(\pi_i f_i)$. The details follow.
First off, we can't take all groups, that's too large. So we need to cut down the size somehow. We do this by considering groups with at most $|S|$ generators, since that's all the image of the maps $f_i$ can be anyway. We're only interested in the image at the end, so we can cut down the groups we consider to be set-sized.
Next, we need to somehow control for isomorphisms. So we first take isomorphism classes of groups with at most $|S|$ generators. Call this set of groups $\mathcal G$ We then construct all possible maps $f_i: S \to UG$ for all possible maps $f$, for all possible $G \in \mathcal G$.
This lets us construct the product map $f : S \to \prod_{G \in \mathcal G} UG$ given by $f(s) \equiv \prod_{G \in \mathcal G} f_i(s)$.
Now we define the free group $\gamma S \equiv im(f)$. Why does this work?
Well, we check the universal property. Suppose we have some map $h: S \to UH$. This must induce a map $\Gamma h: \Gamma S \to H$.
We can cut down the map, by writing the map as $h_{im}: S \to im(h)$. This maps into some subset of $UH$, from which we can generate a group $H_{im} \subseteq H$.
First off, there must be some index $k$ such that $f_k = h_{im}$, since the set of maps ${ f_i }$ covers all possible maps from $S$ into groups with those many generators.
This implies we can project the group $\Gamma S$ at the $k$th index to get a map from $\Gamma S$ into $H_{im}$.
We can then inject $H_{im}$ into $H$, giving us the desired map!

bashupload

curl bashupload.com -T your_file.txt

Super useful if one wants to quickly send a file from/to a server.

When are the catalan numbers odd

The catalan numbers $C_n$ count the number of binary trees on $n$ nodes.
For every binary tree, label the nodes in some standard ordering (eg. BFS).
Pick the lex smallest unbalanced node (node with different left and right subtree sizes).
The operation that swaps the left and right subtrees of the lex smallest unbalanced node is an involution.
This operation only fails when we have a complete binary tree, so the number of nodes is $n = 2^r - 1$, so we pair such a complete binary tree to itself.
This breaks the set $C_n$ into an even number of trees (pairs of unbalanced trees) and a potential "loner tree" (paired with itself) which is the complete binary tree.
Thus $C_n$ is odd iff $n = 2^r - 1$, which allows for us to have a complete binary tree, which is not paired by the involution.
Reference

Discrete probability [TODO]

The textbook discrete probability actually manages to teach "discrete probability" as a unified subject, laying out all the notation clearly, something that I literally have never seen before! This is me making notes of the notation.

Geodesic equation, Extrinsic

The geodesic on a sphere must be a great circle. If it's not, so say we pick a circle at some fixed azimuth, then all the velocities point towards the center at this azimuth, not at the center of the sphere! But towards the center of the sphere is the real normal plane. So we get a deviation from the normal.

How do we know if a path is straight?

Velocity remains constant on a straight line.
So it has zero acceleration.
If we think of a curved spiral climbing a hill (or a spiral staircase), the acceleration vector will point upward (to allow us to climb the hill) and will the curved inward into the spiral (to allow us to turn as we spiral).
On the other hand, if we think of walking straight along an undulating plane, the acceleration with be positive/negative depending on whether the terrian goes upward or downward, but we won't have any left/right motion in the plane.
If the acceleration is always along the normal vectors, then we have a geodesic.

Geodesic curve

Curve with zero tangential acceleration when we walk along the curve with constant speed.
Start with the $(u, v)$ plane, and map it to $R(u, v) \equiv (R_x, R_y, R_z)$. Denote the curve as $c: I \to \mathbb R^3$ such that $c$ always lies on $R$. Said differently, we have $c: I \to UV$, which we then map to $\mathbb R^3$ via $R$.
So for example, $R(u, v) = (\cos(u), \sin(u)\cos(v), \sin(u)\sin(v))$ and $c(\lambda) = (\lambda, \lambda)$. Which is to say, $c(\lambda) = (\cos(\lambda), \sin(\lambda)\cos(\lambda), \sin(\lambda)\sin(\lambda))$.
Recall that $e_u \equiv \partial_u R, e_v \equiv \partial_v R \in \mathbb R^3$ are the basis of the tangent plane at $R_{u, v}$.
Similarly, $\partial_\lambda c$ gives us the tangent vector along $c$ on the surface.
Write out:

$$ \begin{aligned} &\frac{dc}{d \lambda} = \frac{du}{d\lambda}\frac{dR}{du} + \frac{dv}{d\lambda}\frac{dR}{dv} &\frac{d}{d\lambda}(\frac{dc}{d \lambda})\ &=\frac{d}{d\lambda}(\frac{du}{d\lambda}\frac{dR}{du} + \frac{dv}{d\lambda}\frac{dR}{dv}) \ &=\frac{d}{d\lambda}(\frac{du}{d\lambda}\frac{dR}{du}) + \frac{d}{d\lambda}(\frac{dv}{d\lambda}\frac{dR}{dv}) \ &= \frac{d^2 u}{d\lambda^2}\frac{dR}{du} + (\frac{du}{d\lambda} \frac{d}{d\lambda} \frac{dR}{du}) \frac{d^2 v}{d\lambda^2}\frac{dR}{dv} + (\frac{dv}{d\lambda} \frac{d}{d\lambda} \frac{dR}{dv}) \end{aligned} $$

How to calculate $\frac{d}{d\lambda} \frac{dR}{ddu}$? Use chain rule, again!
$\frac{d}{d\lambda} = \frac{du}{d \lambda}\frac{\partial}{\partial u} + \frac{dv}{d \lambda}\frac{\partial}{\partial v}$

Geodesic curve with notational abuse

Denote by $R(u, v)$ the surface, and by $R(\lambda)$ the equation of the curve. So for example, $R(u, v) = (\cos(u), \sin(u)\cos(v), \sin(u)\sin(v))$ while $R(\lambda) = R(\lambda, \lambda) = (\cos(\lambda), \sin(\lambda)\cos(\lambda), \sin(\lambda)\sin(\lambda))$.
EigenChris videos

Connections, take 2

I asked a math.se question about position, velocity, acceleration that recieved a great answer by peek-a-boo. Let me try and provide an exposition of his answer.
Imagein a base manifold $M$, say a circle.
Now imagine a vector bundle over this, say 2D spaces lying above each point on the circle. Call this $(E, \pi, M)$
What is a connection? Roughly speaking, it seems to be a device to convert elements of $TM$ into elements of $TE$.
We imagine the base manifold (circle) as horizontal, and the bundle $E$ as vertical. We imagine $TM$ as vectors lying horizontal on the circle, and we imagine $TE$ as vectors lying horizontal above the bundle. So something like:

So the connection has type $C: E \times TM \to TE$. Consider a point $m \in M$ in the base manifold.
Now think of the fiber $E_m \subseteq E$ over $x$.
Now think of any point $e \in E_m$ in the fiber of $m$.
This gives us a map $C_e: T_e M \to T_e E$, which tells us to imagine a particle $e \in E$ following its brother in $m \in M$. If we know the velocity $\dot m \in T_m M$, we can find the velocity of the sibling upstrairs with $C_e(\dot m)$.
In some sense, this is really like path lifting, except we're performing "velocity lifting". Given a point in the base manifold and a point somewhere upstairs in the cover (fiber), we are told how to "develop" the path upstairs given information about how to "develop" the path downstairs.
I use "develop" to mean "knowing derivatives".

Differentiating vector fields along a curve

Given all of this, suppose we have a curve $c: I to M$ and a vector field over the curve $v: I \to E$ such that the vector field lies correctly over the curve; $\pi \circ v = c$. We want to differentiate $v$, such that we get another $v': TI \to E$.
That's the crucial bit, $v$ and $v'$ have the same type, and this is achieved through the connection. So a vector field and its derivative are both vector fields over the curve.
How do we do this? We have the tangent mapping $Tv: TI \mapsto TE$.
We kill off the component given by pushing forward the tangent vector $Tc(i): TI$ at the bundle location $v(i)$ via the connection. This kills of the effect of the curving of the curve when measuring the change in the vector field $v$.
We build $[z(t_i: TI) \equiv Tv(ti) - C_{v(i)}(Tc(i))]: TI \to TE$.
We now have a map from $I$ to $TE$, but we want a map to $E$. What do?
Well, we can check that the vector field we have created is a vertical vector field, which means that it lies entirely within the fiber. Said differently, we check that it pushes forward to the zero vector under projection, so $TM: TE \to TM$ will be zero for the image of $w$.
This means that $z$ lies entirely "inside" each fiber, or it lies entirely in the tangent to the vector space $\pi^{-1}(m)$ (ie, it lives in $T\pi^{-1}(m)$), instead of living in the full tangent bundle $E_m$ where it has access to the horizontal components.
But for a vector space, the tangent space is canonically isomorphic to the vector space itself! (parallelogram law/can move vectors around/...). Thus, we can bring down the image of $w$ from $TE$ down to $E$!
This means we now have a map $z: TI \to E$.
But we want a $w: I \to E$. See that the place where we needed a $TI$ was to produce

Dropping into tty on manjaro/GRUB

Acces grub by holding down <ESC>
add a suffix rw 3 on the GRUB config line that loads linux ...

Why the zero set of a continuous function must be a closed set

Consider the set of points $Z = f^{-1}(0)$ for some function $f: X \to \mathbb R$.
Suppose we can talk about sequences or limits in $X$.
Thus, if $f$ is continuous, then we must have $f(\lim x_i) = \lim f(x_i)$.
Now consider a limit point $l$ of the set $Z$ with sequence $l_i$ (that is, $\lim l_i = l$). Then we have $f(l) = f(\lim l_i) = \lim f(l_i) = \lim 0 = 0$. Thus, $f(l) = 0$.
This means that the set $Z$ contains $l$, since $Z$ contains all pre-images of zero. Thus, the set $Z$ is closed.
This implies that the zero set of a continuous function must be a closed set.
This also motivates zariski; we want a topology that captures polynomial behaviour. Well, then the closed sets must be the zero sets of polynomials!

Derivatives in diffgeo

A function of the form $f: \mathbb R^i \to \mathbb R^o$ has derivative specified by an $(o \times i)$ matrix, one which says how each output varies with each input.
Now consider a vector field $V$ on the surface of the sphere, and another vector field $D$. Why is $W \equiv \nabla_D V$ another vector field? Aren't we differentiating a thing with 3 coordinates with another thing with 3 coordinates?
Well, suppose we consider the previous function $f: \mathbb R^i \to \mathbb R^o$, and we then consider a curve $c: (-1, 1) \to \mathbb R^i$. Then the combined function $(f \circ c): (-1, 1) \to \mathbb R^o$ needs only $o$ numbers to specify the derivative, since there's only one parameter to the curve (time).
So what's going on in the above example? Well, though the full function we're defining is from $\mathbb R^i$ to $\mathbb R^o$, composing with $c$ "limits our attention" to a 1D input slice. In this 1D input slice, the output is also a vector.
This should be intuitive, since for example, we draw a circle parameterized by arc length, and then draw its tangents as vectors, and then we draw the normal as vectors to the tangents! Why does that work? In both cases (position -> vel, vel -> accel) we have a single parameter, time. So in both cases, we get vector fields!
That's somehow magical, that the derivative of a thing needs the same "degrees of freedom" as the thing in itself. Or is it magical? Well, we're used to it working for functions from $\mathbb R$ to $\mathbb R$. It's a little disconcerting to see it work for functions from $\mathbb R$ to $\mathbb R^n$.
But how does this make sense in the case of diffgeo? We start with a manifold $M$. We take some curve $c: (-1, 1) \to M$. It's derivative must live as $c': (-1, 1) \to TM$. Now what about $c''$? According to our earlier explanation, this too should be a vector! Well... it is and it isn't, right? but how? I don't understand this well.
Looping back to the original question, $W \equiv \nabla_D V$ is a vector field because the value of $W(p)$ is defined as taking $D(p) \in T_p M$, treating it as a curve $d_p: [-1, 1] \to M$ such that $d_p(0) = p$ and $d_p'(0) = D(p)$, and then finally taking $V()$.

Building stuff with Docker

create Dockerfile, write docker build ..
File contains shell stuff to run in RUN <cmd> lines. <cmd> can have newlines with backslash ala shell script.
docker run <image/layer sha> <command> to run something at an image SHA (ie, not in a running container). Useful to debug. protip: docker run <sha-of-layer-before-error> /bin/bash to get a shell.
docker exec <container-sha> <command> to run something in a container.
to delete an image: docker image ls, docker rmi -f <image-sha>
docker prune all unused stuff: docker system prune -a
docker login to login
docker build -t siddudruid/coolname . to name a docker image.
docker push siddudruid/coolname to push to docker hub.
docker pull siddudruid/coolname to pull from docker hub.

Lie derivative versus covariant derivative

Lie derivative cares about all flow lines, covariant derivative cares about a single flow line.
The black vector field is X
The red vector field $Y$ such that $L_X Y = 0$. See that the length of the red vectors are compressed as we go towards the right, since the lie derivative measures how our "rectangles fail to commute". Thus, for the rectangle to commute, we first (a) need a rectangle, meaning we need to care about at least two flows in $X$, and (b) the flows (plural) of $X$ force the vector field $Y$ to shrink.
The blue vector field $Z$ is such that $\nabla_X Z = 0$. See that this only cares about a single line. Thus to conserve the vectors, it needs the support of a metric (ie, to keep perpendiculars perpendicular).
Reference question

The Tor functor

Let $A$ be a commutative ring, $P$ an $A$-module. The functors $Tor_i^A(-, P)$ are defined in such a way that

$Tor_0^A(-,P) = - \otimes_A P$
For any short exact sequence of $A$-modules $0 \to L \to M \to N \to 0$, you get a long exact sequence.

$$ \dots \to Tor_{n+1}^A(L,P) \to Tor_{n+1}^A(M,P) \to Tor_{n+1}^A(N,P) \to Tor_n^A(L,P) \to Tor_n^A(M,P) \to Tor_n^A(N,P) \to \dots $$

which, on the right side, stops at

$$ \dots \to Tor_1^A(L,P) \to Tor_1^A(M,P) \to Tor_1^A(N,P) \to L \otimes_A P \to M \otimes_A P \to N \otimes_A P \to 0 $$

23:44 <bollu> isekaijin can you describe the existence proof of Tor? :)
23:45 <isekaijin> A projective resolution is a chain complex of projective A-modules “... -> P_{n+1} -> P_n -> ... -> P_1 -> P_0 -> 0” that is chain-homotopic to “0 -> P -> 0”.
23:45 <isekaijin> And you need the axiom of choice to show that it exists in general.
23:45 <isekaijin> Now, projective A-modules behave much more nicely w.r.t. the tensor product than arbitrary A-modules.
23:46 <isekaijin> In particular, projective modules are flat, so tensoring with a projective module *is* exact.
23:47 <isekaijin> So to compute Tor_i(M,P), you tensor M with the projective resolution, and then take its homology.
23:47 <isekaijin> To show that this is well-defined, you need to show that Tor_i(M,P) does not depend on the chosen projective resolution of P.
23:48 <Plazma> bollu: just use the axiom of choice like everyone else
23:48 <bollu> why do you need to take homology?
23:48 <isekaijin> That's just the definition of Tor.
23:49 <isekaijin> Okay, to show that Tor does not depend on the chosen projective resolution, you use the fact that any two chain-homotopic chains have the same homology.
23:49 <bollu> right
23:49 <isekaijin> Which is a nice cute exercise in homological algebra that I am too busy to do right now.
23:49 <bollu> whose proof I have seen in hatcher
23:49 <bollu> :)
23:49 <isekaijin> Oh, great.
23:49 <bollu> thanks, the big picture is really useful

Sum of quadratic errors

Consider the function $(x - a)^2 + (x - b)^2$
Minimum error is at $2(x - a) + 2(x - b)$, or at (a + b)/2.
As we move away towards either end-point, the error always increases!
So the "reduction in error" by moving towards b from (a + b)/2 is ALWAYS DOMINATED by the "increase in error" by moving towards a from (a + b)/2.

Hip-Hop and Shakespeare

For whatver reason, it appears like iambie pentameter allows one to rap shakespeaker sonnets to 80bmp / 150bpm.
TedX talk by Akala

Write thin to write well

Set column width to be absurdly low which forces your writing to get better (?!)
That is, when you write, say in vim or emacs, you put one clause per line. Then when you are done, you can use pandoc or something similar to convert what you wrote into standard prose. But the artificial line breaks, which results in thin lines, make it easier to edit, and also easier to comprehend diffs if you use git to track changes.
The vast majority on book typography agrees on 66 characters per line in one-column layouts and 45 characters per line in multi-column layouts as being the optimal numbers for reading. The text-block should also be placed assymetrically on the page, with the margins in size order being inner<top<outer<bottom. The line height should be set at 120% of the highest character hight for normal book typefaces, but should be increased for typewriter typefaces and can be decreased slightly with shorter lines. A small set of typefaces are economic without losing readability, and if you use them you can increase these numbers slightly. But any more than 80 characters and anything less than 40 characters is suboptimal for texts that are longer than a paragraph or so.
If you adhere to these very simple principles, you will have avoided like 95% of the typographic choices that can make texts hard or slow to read.
Try 36 letters per column.

Also see VimPencil
set wrap linebreak nolist
call plug#begin('~/.vim/plugged')
Plug 'junegunn/goyo.vim'
call plug#end()

"Goyo settings
let g:goyo_width = 60
let g:goyo_height = 999
let g:goyo_margin_top = 0
let g:goyo_margin_bottom = 0

Write thin to write fast

Hidden symmetries of alg varieties

Given equations in $A$, can find solutions in any $B$ such that we have $\phi: A \to B$
Can translate topological ideas to geometry.
Fundamental theorem of riemann: fundamental group with finitely many covering becomes algebraic (?!)
So we can look at finite quotients of the fundamental group.
As variety, we take line minus one point. This can be made by considering $xy - 1 = 0$ in $R[x, y]$ and then projecting solutions to $R[x]$.
If we look at complex solutions, then we get $\mathbb C - {0 } = C^\times$.
The largest covering space is $\mathbb C \xrightarrow{\exp} \mathbb C^\times$. The fiber above $1 \in C^\times$ (which is the basepont) is $2 \pi i$.
Finite coverings are $C^\times \xrightarrow{z \mapsto z^n} C^\times$. The subsitute for the fundamental group is the projective (inverse) limit of these groups.
The symmetry of $Gal(\overline{\mathbb Q} / \mathbb Q)$ acts on this fundamental group.
One can get not just fundamental group, but any finite coefficients!
Category of coverings is equivalent to category of sets with action of fundamental group.
Abel Prize: Pierre Delinge

`fd` for `find`

fd seems to be much, much faster at find than, well, find.

Thu Morse sequence for sharing

Suppose A goes first at picking object from a collection of objects, then B.
B has an inherent disatvantage, since they went second.
So rather than repeating and allowing A to go third and B to go fourth (ie, we run ABAB), we should instead run AB BA, since giving B the third turn "evens out the disatvantage".
Now once we're done with 4 elements, what do we do? Do we re-run A B B A again? No, this would be argued as unfair by B. So we flip this to get the full sequence as ABBA BAAB.
What next? you guessed it... flip: ABBA BAAB|BAAB ABBA
And so on. Write the recurrence down :)
Reference

Elementary and power sum symmetric polynomials

Borcherds video on newton identites
Terry tao calls the power sum symmetric polynomials as 'moments'
Let us have $n$ variables $x[1], x[2], \dots, x[n]$.
Let $e_k$ be the elementary symmetric polynomial that is all products of all $k$ subsets of $x[:]$.
Let $p_k$ be the power sum symmetric polynomial that is of the form $p_k = \sum_i x[i]^k$.

Speedy proof when $k = n$ / no. of vars equals largest $k$ (of $e[k]$) we are expanding:

Let $P(x) = e[n] x^0 + e[n-1]x^1 + \dots + e[1]x^{n-1} + e[0]x^n$. That is, $P(x) = \sum_i e[n-i] x^{i}$
Let $r[1], r[2], \dots, r[n]$ be the roots. Then we have $P(r[j]) = \sum_i e[n-i] r[j]^{i} = 0$.
Adding over all $r[j]$, we find that:

$$ \begin{aligned} &\sum_{j=1}^k P(r[j]) = \sum_j 0 = 0\ &\sum_j \sum_i e[n-i] r[j]^{i} = 0 \ &\sum_j \sum_i e[n-i] r[j]^{i} = 0 \ &\sum_j e[n] \cdot 1 + \sum_j \sum_{i>0}^k e[n-1] r[j]^i = 0 \ k e[n] + &\sum_{i=1}^k e[i] P[n-i] = 0 \end{aligned} $$

Concretely worked out in the case where $n = k = 4$:

$$ \begin{aligned} &P(x) = 1 \cdot x^4 + e_1 x^3 + e_2 x^2 + e_3 x + e_4 \ &\texttt{roots: } r_1, r_2, r_3, r_4\ &P(x) = (x - r_1)(x - r_2)(x - r_3)(x - r_4)\ &e_0 = 1 \ &e_1 = r_1 + r_2 + r_3 + r_4 \ &e_2 = r_1r_2 + r_1r_3 + r_1r_4 + r_2r_3 + r_2r_4 + r_3r_4 \ &e_3 = r_1r_2r_3 + r_1r_2r_4 + r_2r_3r_4 \ &e_4 = r_1r_2r_3r_4\ \end{aligned} $$

Expanding $P(r_j)$:

$$ \begin{aligned} P(r_1) &= r_1^4 + e_1r_1^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \ P(r_2) &= r_2^4 + e_1r_2^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \ P(r_3) &= r_3^4 + e_1r_3^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \ P(r_4) &= r_4^4 + e_1r_4^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \ \end{aligned} $$

Adding all of these up:

$$ \begin{aligned} &P(r_1) + P(r_2) + P(r_3) + P(r_4) \ &=(r_1^4 + r_2^4 + r_3^4 + r_4^4) &+ e_1(r_1^3 + r_2^3 + r_3^3 + r_2^3) &+ e_2(r_1^2 + r_2^2 + r_3^2 + r_4^2) &+ e_3(r_1 + r_2 + r_3 + r_4) &+ 4 e_4 \ &= 1 \cdot P_4 e_1 P_3 + e_2 P_2 + e_3 P_1 + 4 e_4 \ &= e_0 P_4 + e_1 P_3 + e_2 P_2 + e_3 P_1 + 4 e_4 \ &= 0 \ \end{aligned} $$

When $k > n$ (where $n$ is number of variables):

We have the identity $k e_k + \sum_{i=0}^{k-1} e_i p_{k-i} = 0$. (when $i = k$, we get $p_{k-i} = p_0 = 1$, this gives us the $k e_i = k e_k$ term).
When $k > n$, this means that $e_k = 0$.
Further, when $k > n$, this means that $s_i$ when $i > n$ is zero.
This collapses the identity to $\sum_{i=0}^{k-1} e_i p_{k-i} = 0$ (we lose $e_k)$, which further collapses to $\sum_{i=0}^n e_i p_{k-1} = 0$ (we lose terms where $k - 1 > n$)
Proof idea: We add ($k-n$) roots into $f$ to bring it to case where $k = n$. Then we set these new roots to $0$ to get the identity $\sum_{i=0}^n s_i p_{k-i} = 0$.

When $k < n$ (where $n$ is number of variables):

Proof by cute notation

Denote by the tuple $(a[1], a[2], \dots, a[n])$ with $a[i] \geq a[i+1]$ the sum $\sum x[i]^a[i]$.
For example, with three variables $x, y, z$, we have:
$(1) = x + y + z$
$(1, 1) = xy + yz + xz$
$(2) = x^2 + y^2 + z^2$
$(2, 1) = x^2y + y^2z + z^2x$
$(1, 1, 1) = xyz$.
$(1, 1, 1, 1) = 0$, because we don't have four variables! We would need to write something like $xyzw$, but we don't have a $w$, so this is zero.
In this notation, the elementary symmetric functions are $(1)$, $(1, 1)$, $(1, 1, 1)$ and so on.
The power sums are $(1)$, $(2)$, $(3)$, and so on.
See that $(2)(1) = (x^2 + y^2 + z^2)(x + y + z) = x^3 + y^3 + z^3 + x^2y + x^2z + y^2x + y^2z + z^2x + z^2y = (3) + (2, 1)$.
That is, the product of powers gives us a larger power, plus some change (in elementary symmetric).
How do we simplify $(2, 1)$? We want terms of the form only of $(k)$ [power sum] or $(1, 1, \dots, 1)$ [elementary].
We need to simplify $(2, 1)$.
Let's consider $(1)(1, 1)$. This is $(x + y + z)(xy + yz + xz)$. This will have terms of the form $xyz$ (ie, $(1, 1, 1)$). These occur with multiplicity $3$, since $xyz$ can occur as $(x)(yz)$, $(y)(xz)$, and $(z)(xy)$. This will also have terms of the form $x^2y$ (ie, $(2, 1)$).
Put together, we get that $(1)(1, 1) = (2, 1) + 3 (1, 1, 1)$.
This tells us that $(2, 1) = (1)(1, 1) - 3(1, 1, 1)$.
Plugging back in, we find that $(2)(1) = (3) + (1)(1, 1) - 3 (1, 1, 1)$. That is, $p[3] - p[2]s[1] + p[1]s[2] - 3s[3] = 0$.

In general, we will find:

$$ (k-1)(1) = (k) + (k-1, 1) \ (k-2)(1, 1) = (k-1, 1) + (k-2, 1, 1) \ (k-3)(1, 1, 1) = (k-2, 1, 1) + (k-3, 1, 1, 1) \ (k-4)(1, 1, 1, 1) = (k-3, 1, 1, 1) + (k-4, 1, 1, 1, 1) \ $$

In general, we have:

(k-i)(replicate 1 i) = (k-i+1, replicate 1 [i-1]) + (k-i , replicate 1 i)

Projective spaces and grassmanians in AG

Projective space

Projective space is the space of all lines through $\mathbb R^n$.
Algebraically constructed as $(V - { 0 })/ \mathbb R^\times$.
We exclude the origin to remove "degenerate lines", since the subspace spanned by ${0}$ when acted on with $\mathbb R^\times$ is just ${ 0 }$, which is zero dimensional.

Grassmanian

$G(m, V)$: $m$ dimensional subspaces of $V$.
$G(m, n)$: $m$ dimensional subspaces of $V = k^n$.
$G(m+1, n+1)$ is the space of $m$ planes $\mathbb P^m$ in $\mathbb P^n$. Can projectivize $(n+1)$ eqns by sending $(x_0, x_1, \dots, x_n) \in k^{n+1}$ to $[x_0 : x_1 : \dots : x_n] \in \mathbb P^n$.
Duality: $G(m, V) ≃ G(dim(V)-m, V^\star)$. We map the subspace $W \subseteq V$ to the annihilator of $W$ in $V^\star$: That is, we map $W$ to the set of all linear functionals that vanish on $W$ [ie, whose kernel is $W$].
The above implies $G(1, V) = G(n-1, V)$. $G(1, V)$ is just projective space $\mathbb P^1$. $n-1$ subspaces are cut out by linear equations, $c_0 x_0 + \dots c_{n-1} x_{n-1} + c_n = 0$.

G(2, 4)

These are lines in $\mathbb P^3$. This will give us two pairs of points of the form $(x_0, y_0, z_0, w_0)$ and $(x_1, y_1, z_1, w_1)$. That is, we're considering "lines" between "points" (or "vectors") in $\mathbb R^3$. Exactly what we need to solve stabbing line problems for computer graphics :)
Start by taking a 2D plane. The line will pass through a point in the 2D plane. This gives us two degrees of freedom.
Then take a direction in ordinary Euclidean $\mathbb R^3$ (or $S^2$ to be precise). This gives us two degrees of freedom.
Can also be said to be a 2-dim. subspace of a 4-dim. vector space.
In total, $G(2, 4)$ should therefore have four degrees of freedom.
Take $W \subseteq V$ where $V \simeq k^4$, and $W$ is 2-dimensional subspace.
$W$ is spanned by two vectors $v_1, v_2$. So I can record it as a $2x1$ matrix: $\begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \ a_{21} & a_{22} & a_{23} & a_{24} \end{bmatrix}$. Vector $v_i$ has coordinates $a_i$.
If I had taken another basis $(v_1', v_2')$, there would be an invertible matrix $B \in K^{2 \times 2}$ ($det(B) \neq 0$) that sends $(v_1, v_2)$ to $(v_1', v_2')$. Vice Versa, any invertible matrix $B$ gives us a new basis.
So the redundancy in our choice of parametrization of subspaces (via basis vectors) is captured entirely by the space of $B$s.
Key idea: compute $2 \times 2$ minors of the $2 \times 4$ matrix $(v_1, v_2)$.
This is going to be $(a_{11} a_{22} - a_{12} a_{21}, \dots, a_{13} a_{24} - a_{14} a_{23}) \in K^6$.
Note here that we are computing $2$ minors of a rectangluar matrix, where we take all possible $2 \times 2$ submatrices and calculate their determinant.
In this case, we must pick both rows, and we have $\binom{4}{2} = 6$ choices of columns, thus we live in $K^6$.
We represent this map as $m: K^{2 \times 4} \to K^6$ which sends $m((a_{ij})) \equiv (a_{11} a_{22} - a_{12} a_{21}, \dots, a_{13} a_{24} - a_{14} a_{23})$ which maps a matrix to its vector of minors.
The great advantage of this is that we have $m(B \cdot (a_{ij})) = det(B) \cdot m((a_{ij}))$, since the minor by definition takes a determinant of submatrices, and determinant is multiplicative.
Thus, we have converted a matrix redundancy of $B$ in $a_{ij}$ into a scalar redundancy (of $det(B)$) in $m(a_{ij})$ .
We know how to handle scalar redundancies: Live in projective space!
Therefore, we have a well defined map $G(2, 4) \to \mathbb P^5$. Given a subspace $W \in G(2, 4)$, compute a basis $v_1, v_2 \in K^4$ for $W$, then compute the minor of the matrix $m((v_1, v_2)) \in K^6$, and send this to $P^5$.

$G(2, 4)$, projectively

These are lines in $\mathbb P^3$.
So take two points in $P^3$, call these $[a_0 : a_1 : a_2 : a_3]$ and $[b_0 : b_1 : b_2 : b_3]$. Again, this gives us a matrix:

$$ \begin{bmatrix} a_0 &: a_1 &: a_2 &: a_3 \ b_0 &: b_1 &: b_2 &: b_3 \ \end{bmatrix} $$

We define $S_{ij} \equiv a_i b_j - a_j b_i$ which is the minor with columns $(i, j)$.
Then we compress the above matrix as $(S_{01} : S_{02} : S_{03} : S_{12} : S_{13} : S_{23}) \in \mathbb P^5$. See that $S_{ii} = 0$ and $S_{ji} = - S_{ij}$. So we choose as many $S$s as "useful".
See that if we change $a$ or $b$ by a constant, then all the $S_{ij}$ change by the constant, and thus the point itself in $\mathbb P^5$ does not change.
We can also change $b$ by adding some scaled version of $a$. This is like adding a multiple of the second row to the first row when taking determinants. But this does not change determinants!
Thus, the actual plucker coordinates are invariant under which two points $a, b$ we choose to parametrize the line in $\mathbb P^3$.
This gives us a well defined map from lines in $\mathbb P^3$ to points in $\mathbb P^5$.
This is not an onto map; lines in $\mathbb P^3$ have dimension 4 (need $3 + 1$ coefficiens, $ax + by + cz + d$), while $\mathbb P^5$ has dimension $5$.
So heuristically, we are missing "one equation" to cut $\mathbb P^5$ with to get the image of lines in $\mathbb P^3$ in $\mathbb P^5$.
This is the famous Plucker relation:

$$ S_{02} S_{13} = S_{01} S_{23} + S_{03} S_{12} $$

It suffices to prove the relationship for the "standard matrix":

$$ \begin{bmatrix} 1 &: 0 &: a &: b \ 0 &: 1 &: c &: d \ \end{bmatrix} $$

In this case, we get c (-b) = 1(ad - bc) + d (-a)
In general, we get plucker relations:

$$ S_{i_1 \dots i_k}S_{j_1 \dots j_k} = \sum S_{i_1' \dots i_k'} S_{j_1' j_k'}. $$

Observations of $G(2, 4)$

Suppose $m(v_1, v_2) = (a_{ij})$ has non-vanishing first minor. Let $B$ be the inverse of the first minor matrix. So $B \equiv \begin{bmatrix} a_{11} & a_{12} \ a_{21} & a_{22} \end{bmatrix}$. Set $(v_1', v_2') \equiv B (v_1, v_2)$.
Then $m(v_1', v_2') = \begin{bmatrix} 1 & 0 & y_{11} & y_{12} \ 0 & 1 & y_{21} & y_{22} \end{bmatrix}$.
So the first $2 \times 2$ block is identity. Further, the $y_{ij}$ are unique.
As we vary $y_{ij}$, we get we get different 2 dimensional subspaces in $V$. Thus, locally, the grassmanian looks like $A^4$. This gives us an affine chart!
We can recover grassmanian from the $\mathbb P^5$ embedding. Let $p_0, \dots, p_5$ be the coordinate functions on $\mathbb P^5$ ($p$ for plucker).
The equation $p_0 p_5 - p_1 p_4 + p_2 p_3$ vanishes on the grassmanian. We can show that the zero set of the equation is exactly the grassmanian.
Computation AG: Grassmanians

Computing cohomology of $G(2, 5)$

Take all points of the following form:

$$ \begin{bmatrix} &1 &:0 &:* &:* \ &0 &:1 &:* &:* \end{bmatrix} $$

Let's look at the first column: it is $\begin{bmatrix} 1 \ 0 \end{bmatrix}$. Why not $\begin{bmatrix} 1 \ 1 \end{bmatrix}$? Well, I can always cancel the second row by subtracting a scaled version of the first row! (this doesn't change the determinants). Thus, if we have a $1$ somewhere, the "complement" must be a $0$.
Next, we can have something like:

$$ \begin{bmatrix} &1 &:* &:0 &:* \ &0 &:0 &:1 &:* \end{bmatrix} $$

Here, at the second second column $\begin{bmatrix} * \ 0 \end{bmatrix}$, if we didn't have a $0$, then we could have standardized it and put it into the form of $\begin{bmatrix} 0 \ 1 \end{bmatrix}$ which makes it like the first case! Thus,we must have a $0$ to get a case different from the previous.
Continuing, we get:

$$ \begin{bmatrix} &1 &:* &:* &:0 \ &0 &:0 &:0 &:1 \end{bmatrix} $$

$$ \begin{bmatrix} &0 &:1 &:0 &:* \ &0 &:0 &:1 &:* \end{bmatrix} $$

$$ \begin{bmatrix} &0 &:1 &:* &:0 \ &0 &:0 &:0 &:1 \end{bmatrix} $$

$$ \begin{bmatrix} &0 &:0 &:1 &:0 \ &0 &:0 &:0 &:1 \end{bmatrix} $$

If we count the number of $\star$s, which is the number of degrees of freedom, we see that $1$ of them (the last one) has zero stars ($A^0$), $1$ of them has 1 star ($A^1$), two of them have 2 stars ($A^2$), one of them has 3 stars, and one of them as 4 stars.
This lets us read off the cohomology of the grassmanian: we know the cellular decomposition. Ie, we know the number of $n$ cells for different dimensions.
Alternatively, we can see that over a finite field $k$, we have $k^0 + k^1 + 2k^2 + k^3 + k^4$ points. On the other hand, $\mathbb P^4$ has $k^0 + k^1 + k^2 + k^3 + k^4$ points. Thus the grassmanian is different from projective space!
Borcherds

Mnemonic for why `eta` is unit:

Remember that given an adjunction $F \vdash G$, the unit of the adjunction is $\eta: 1 \to GF$.
We use the symbol eta because it's yunit, and eta is y in greek (which is why the vim digraph for eta is C-k y*)
$\eta$ is unit, since when you flip it, you get $\mu$, which is $\mu$-ltiplication (multiplication). Hence $\eta$ is the unit for the multiplication to form a monoidal structure for the monad.

Fundamental theorem of galois theory

Let $K \subseteq M$ is a finite galois extension (normal + separable), then there a 1:1 correspondence between intermediate fields $L$ and subgroups of the galois group $G = Gal(M/K)$.
Recall that a finite extension has finitely many subfields iff it can be written as an extension $K(\theta)/K$. This is the primitive element theorem.
We send $L \mapsto Gal(M/L)$, the subgroup of $Gal(M/K)$ that fixes $L$ pointwise.
We send $H$ to $fix(H)$, the subfield of $K$ that is fixed pointwise.

$H = Gal(M/Fix(H))$

It is clear that $H \subseteq Gal(M/Fix(H))$, by definition, since every element of $H$ fixes $Fix(H)$ pointwise.
To show equality, we simply need to show that they are the same size, in terms of cardinality.
So we will show that $|H| = |Gal(M/Fix(H))|$..

$L = Fix(Gal(L/K))$

It is clear that $L \subseteq Fix(Gal(M/L)))$, by definition, since every element of $Gal(M/L)$ fixes $L$ pointwise.
To show equality, we simply need to show that they are the same size.
Here, we measure size using $[M:L]$. This means that as $L$ becomes larger, the "size" actually becomes smaller!
However, this is the "correct" notion of size, since we will have the size of $L$ to be equal to $Gal(L/K)$.
As $L$ grows larger, it has fewer automorphisms.
So, we shall show that $[M:L] = [M:Fix(Gal(L/K))]$.

Proof Strategy

Rather than show that the "round trip" equalities are correct, we will show that the intermediates match in terms of size.
We will show that the map $H \to Fix(H)$ is such that $|H| = [M:H]$.
Similarly, we will show that the map $L \mapsto Gal(L/K)$ is such that $[M:K] = |L|$.
This on composing $Gal$ and $Fix$ show both sides shows equality.

Part 1: $H \to Fix(H)$ preserves size

Consider the map which sends $H \mapsto Fix(H)$. We need to show that $|H| = [M:Fix(H)]$.
Consider the extension $M/Fix(H)$. Since $M/K$ is separable, so in $M/Fix(H)$ [polynomials separable over $K$ remain separable over super-field $Fix(H)$]
Since the extension is separable, we have a $\theta \in M$ such that $M = Fix(H)(\theta)$ by the primitive element theorem.
The galois group of $M/Fix(H) = Fix(H)(\theta)/Fix(H)$ must fix $Fix(H)$ entirely. Thus we are trying to extend the function $id: Fix(H) \to Fix(H)$ to field automorphisms $\sigma: M \to M$.
Since $M/K$ is normal, so is $M/Fix(H)$, since $M/K$ asserts that automorphisms $\sigma: M \to \overline K$ that fix $K$ stay within $M$. This implies that automorphisms $\tau: M \to \overline K$ that fix $Fix(H)$ stay within $M$.
Thus, the number of field automorphisms $\sigma: M \to \overline M$ that fix $Fix(H)$ is equal to the number of field automorphisms $M \to M$ that fix $Fix(H)$.
The latter is equal to the field of the separable extension $[M:Fix(H)]$, since the only choice we have is where we choose to send $\theta$, and there are $[M:Fix(H)]$ choices.
The latter is also equal to the size of the galois group

Part 2: $L$ to $Gal(M/L)$ preserves size

We wish to show that $[M:L] = |Gal(M/L)|$
Key idea: Start by writing $M = L(\alpha)$ since $M$ is separable by primitive element theorem. Let $\alpha$ have minimal polynomial $p(x)$. Then $deg(p(x))$ equals $[M:L]$ equals number of roots of $p(x)$ since the field is separable.
Next, any automorphism $\sigma: M \to M$ which fixes $L$ is uniquely determined by where it sends $\alpha$. Further, such an automorphism $\sigma$ must send $\alpha$ to some other root of $p(x)$ [by virtue of being a field map that fixes $L$, $0 = \sigma(0) = \sigma(p(\alpha)) = p(\sigma(\alpha))$].
There are exactly number of roots of $p$ (= $[M:L]$) many choices. Each gives us one automorphism. Thus $|Gal(M/L)| = [M:L]$.

Counter-intuitive linearity of expectation [TODO]

I like the example of "10 diners check 10 hats. After dinner they are given the hats back at random." Each diner has a 1/10 chance of getting their own hat back, so by linearity of expectation, the expected number of diners who get the correct hat is 1.
Finding the expected value is super easy. But calculating any of the individual probabilities (other than the 8, 9 or 10 correct hats cases) is really annoying and difficult!
Imagine you have 10 dots scattered on a plane. Prove it's always possible to cover all dots with disks of unit radius, without overlap between the disks. (This isn't as trivial as it sounds, in fact there are configurations of 45 points that cannot be covered by disjoint unit disks.)
Proof: Consider a repeating honeycomb pattern of infinitely many disks. Such a pattern covers pi / (2 sqrt(3)) ~= 90.69% of the plane, and the disks are clearly disjoint. If we throw such a pattern randomly on the plane, any dot has a 0.9069 chance of being covered, so the expectation value of the total number of dots being covered is 9.069. This is larger than 9, so there must be a packing which covers all 10 dots.

Metis

So insofar as Athena is a goddess of war, what really do we mean by that? Note that her most famous weapon is not her sword but her shield Aegis, and Aegis has a gorgon's head on it, so that anyone who attacks her is in serious danger of being turned to stone. She's always described as being calm and majestic, neither of which adjectives anyone ever applied to Ares....

Let's face it, Randy, we've all known guys like Ares. The pattern of human behavior that caused the internal mental representation known as Ares to appear in the minds of the ancient Greeks is very much with us today, in the form of terrorists, serial killers, riots, pogroms, and agressive tinhorn dictators who turn out to be military incompetents. And yet for all their stupidity and incompetence, people like that can conquer and control large chunks of the world if they are not resisted....

Who is going to fight them off, Randy?

Sometimes it might be other Ares-worshippers, as when Iran and Iraq went to war and no one cared who won. But if Ares-worshippers aren't going to end up running the whole world, someone needs to do violence to them. This isn't very nice, but it's a fact: civilization requires an Aegis. And the only way to fight the bastards off in the end is through intelligence. Cunning. Metis.

Tooling for performance benchmarking

Optick and Tracy and flame graphs
https://github.com/wolfpld/tracy
https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Hotspot: https://www.kdab.com/hotspot-video/amp/
perf stat -x apparently gives CSV?

Normal field extensions

Normal extension

(1) For an extension $L/K$, if a polynomial $p(x) \in K[x]$ and has a root $\alpha \in L$ has all its roots in $L$. So $p(x)$ splits into linear factors $p(x) = (x - l_1)(x - l_2) \cdot (x - l_n)$ for $l_i \in L$.
(2) [equivalent] $L$ is the splitting field over $K$ of some set of polynomials.
(3) [equivalent] Consider $K \subseteq L \subseteq \overline K$. Then any automorphism of $\overline K/K$ (ie, aut that fixes $K$ pointwise) maps $L$ to $L$ [fixes $L$ as a set, NOT pointwise].
Eq: $Q(2^{1/3})$ is not normal
Eq: $Q(2^{1/3}, \omega_3)$ is a normal extension because it's the splitting field of $x^3 - 2$.

(1) implies (2)

(1) We know that $p$ has a root in $L$ implies $p$ has all rots in $L$.
For each $\alpha \in L$, we take the minimal polynomial $p(\alpha)$. Then $p$ splits over $L$, because $L$ contains a single root of $p$ ($\alpha$).
Thus, $L$ is the splitting field for the set of polynomials ${ minpoly(\alpha) \in K[x] : \alpha \in L }$.

(2) implies (3)

(2) says that $L$ is the splitting field for some set of polynomials.
An aut $\sigma: \overline K \to \overline K$ that fixes $K$ acts trivially on polynomials in $K[x]$.
$L$ is the set of all roots of polynomials ${ minpoly(\alpha) \in K[x] : \alpha \in L }$.
Since $\sigma$ fixes $K[x]$, it also cannot change the set of roots of the polynomials. Thus the set ${ minpoly(\alpha) \in K[x] : \alpha \in L }$ remains invariant under $\sigma$. ($\sigma$ cannot add elements into $L$). It can at most permute the roots of $L$.

(3) implies (1)

(3) says that any automorphism $\sigma$ of $\overline K/K$ fixes $L$ as a set.
We wish to show that if $p$ has a root $\alpha \in L$, $L$ has all roots of $p$.
we claim that for any root $\beta \in L$, there is an automorphism $\tau: \overline K/K$ such that $\tau(\alpha) = \beta$.
Consider the tower of extensions $K \subseteq K(\alpha) \subseteq \overline K$ and $K \subseteq K(\beta) \subseteq \overline K$. Both $K(\alpha)$ and $K(\beta)$ look like $K[x] / p$ because $p$ is the minimal polynomial for both $\alpha$ and $\beta$.
Thus, we can write an a function $\tau: K(\alpha) \to K(\beta)$ which sends $\alpha \mapsto \beta$.
Now, by uniqueness of field extensions, this map $\tau$ extends uniquely to a map $\overline K \to \overline K$ which sends $\alpha to \beta$. [TODO: DUBIOUS].
But notice that $\tau$ must fix $L$ (by (3)) and $\alpha in L$. Thus, $\tau(\alpha) \in \tau(L)$, or $\beta = \tau(\alpha \in \tau(L) = L$.
Thus, for a polynomial $p$ with root $\alpha in L$, and for any other root $\beta$ of $p$, we have that $\beta \in L$.

Alternative argument: Splitting field of a polynomial is normal

Let $L/K$ be the splitting field of $f \in K[x]$. Let $g \in K[x]$ have a root $\alpha \in L$.
Let $\beta \in \overline K$ be another root of $g$. We wish to show that $\beta \in L$ to show that $L$ is normal.
There is an embedding $i: K(\alpha) \hookrightarrow \overline K$ which fixes $K$ and sends $\alpha$ to $\beta$.
See that $i(L)$ is also a splitting field for $f$ over $K$ inside $\overline K$.
But splitting fields are unique, so $i(L) = L$.
Since $i(\alpha) = \beta$, this means $\beta \in L$ as desired.

Degree 2 elements are normal

Let us have a degree 2 extension $K \subseteq L$
So we have some $p(x) = x^2 + bx + c \in K[x]$, $L = K(\alpha)$ for $\alpha$ a root of $p$.
We know that $\alpha + \beta = b$ for $\alpha \in L, b \in K$. Thus $\beta = b - \alpha \in L$.
Thus, the extension is normal since $L$ contains all the roots ($\alpha, \beta$) of $p$ as soon as it contained one of them.

Is normality of extensions transitivte?

Consider $K \subseteq L \subseteq M$. If $K \subseteq L$ is normal, $L \subseteq M$ is normal, then is $K \subseteq M$ normal?
Answer: NO!
Counter-example: $Q \subseteq Q(2^{1/2}) \subseteq Q(2^{1/4})$.
Each of the two pieces are normal since they are degree two. But the full tower is not normal, because $Q(2^{1/4})/Q$ has minimial polynomial $x^4 - 2$.
On the other hand, $Q(2^{1/4})/Q(2^{1/2})$ has a minimal polynomial $x^2 - \sqrt{2} \in Q[2^{1/2}]$.
So, normality is not transitive!
Another way of looking at it: We want to show that $\sigma(M) \subseteq M$ where $\sigma: aut(\overline K/K)$. Since $L/K$ is normal, and $\sigma$ is an autormophism of $L/K$, we have $\sigma(L) \subseteq L$ [by normal]. Since $M/L$ is normal, we must have $\sigma(M) \subseteq M$. Therefore, we are done?
NO! The problem is that $\sigma$ is not a legal automorphism of $M/L$, since $\sigma$ fixes $L$ as a set ($\sigma L \subseteq L$), and not pointwise ($\sigma(l) = l$ for all $l \in L$.)

Eisenstein Theorem for checking irreducibility

Let $p(x) = a_0 + a_1 x + \dots + a_n x^n$
If $p$ divides all coefficients except for the highest one ($a_n$), $a_0$ is $p$-squarefree ($p^2$ does not divide $a_0$), then $p(x)$ is irreducible.
That is, $p | a0, p | a_1$, upto $p | a_{n-1}$, $p \not | a_n$, and finally $p^2 \not | a_0$.
Then we must show that $p(x)$ is irreducible.
Suppose for contradiction that $p(x) = q(x)r(x)$ where $q(x) = (b_0 + b_1 x+ \dots + b_k x^k)$ and $r(x) = (c_0 + c_1 x + \dots c_l x^l)$ (such that $k + l \geq n$, and $k > 0, l > 0$).
See that $a_0 = b_0 c_0$. Since $p | a_0$, $p$ must divide one of $b_0, c_0$. Since $p^2$ does not divide $a_0$, $p$ cannot divide both $b_0, c_0$. WLOG, suppose $p$ divides $b_0$, and $p$ does not divide $c_0$.
Also see that since $a_n = (\sum_{i + j = n} b_i c_j)$, $p$ does not divide this coefficient $\sum_{i + j = n} b_i c_j$. Thus, at least one term in $\sum_{i + j = n} b_i c_j$ is not divisible by $p$.
Now, we know that $p$ divides $b_0$, $p$ does not divide $c_0$. We will use this as a "domino" to show that $p$ divides $b_1$, $b_2$, and so on, all the way upto $b_k$. But this will imply that the final term $a_n$ will also be divisible by $p$, leading to contradiction.
To show the domino effect, start with the coefficient of $x$, which is $a_1 = b_0 c_1 + b_1 c_0$. Since $a_1$ is divisible by $p$, $b_0$ is divisible by $p$, and $c_0$ is not divisible by $p$, the whole equation reduces to $b_1 c_0 \equiv_p 0$, or $b_1 \equiv_p 0$ [since $c_0$ is a unit modulo $p$].
Thus, we have now "domino"'d to show that $p$ divides both $b_0, b_1$.
For induction, suppose $p$ divides everything $b_0, b_1, \dots, b_r$. We must show that $p$ divides $b_{r+1}$.
Consider the coefficient of the term $xri$, ie $a_r$. This is divisible by $p$, and we have that $a_r = b_0 c_r + b_1 c_{r-1} + \dots + b_r c_0$. Modulo $p$, the left hand side vanishes (as $a_r$ is divisible by $p$), and every term $b_0, b_1, \dots, b_{r-1}$ vanishes, leaving behind $0 \equiv_p b_r c_0$. Since $c_0$ is a unit, we get $b_r \equiv_p 0$.
Thus, every term ${ b_i }$ is divisible by $p$, implying $a_n$ is divisible by $p$, leading to contradiction.
Again, the key idea: (1) $b_0$ is divisible by $p$ while $c_0$ is not. (This uses $p | a_0$ and $p^2 \not | a_0$). (2) This allows us to "domino" and show that all $b_i$ are divisible by $p$ (This uses $p | a_i$). (3) This show that $a_n$ is divisible by $p$, a contradiction. (This uses $p \not | a_n$).

Gauss Lemma for polynomials

Let $z(x) \in Z[X]$ such that $z(x) = p(x) q(x)$ where $p(x), q(x) \in Q[X]$. Then we claim that there exists $p'(x), q'(x) \in Z[x]$ such that $z(x) = p'(x) q'(x)$.
For example, suppose $p(x) = a_0 / b_0 + a_1 x / b_1$ and $q(x) = c_0 / d_0 + c_1 x / d_1$, such that $p(x)q(x) \in \mathbb Z[x]$ and these fractions are in lowest form. So, $b_i \not | a_i$ and $d_i \not | c_i$.
Take common demoniator, so we can then find things the denominator divides to write as a product in $\mathbb Z$. For example, we know that $9/10 \cdot 20 / 3 = 6$. This can be obtained by rearranging the product as $(9/3) \cdot (20/10) = 3 \cdot 2 = 6$. We wish to perform a similar rearrangement, by first writing $9/10 \cdot 20 / 3$ as $(9 \cdot 20)/(10 \cdot 3)$, and then pairing up $10 \leftrightarrow 20$ and $3 \leftrightarrow 9$ to get the final integer $(9/3) (20/10) = 6$. After pairing up, each of the pairs $(9/3)$ and $(20/10)$ are clearly integers.
Take common demoniator in $p(x)$ and write it as a fraction: $p(x) = (a_0 b_1 + (a_1 b_0)x) / b_0 b_1$, and similarly $q(x) = (c_0 d_1 + (c_1 d_0)x)/d_0 d_1$.
We claim that the denominator of $p(x)$, $b_0 b_1$ does not divide the numerator of $p(x)$, $(a_0 b_1 + (a_1 b_0)x)$. This can be seen term-by-term. $b_0 b_1$ does not divide $a_0 b_1$ since $a_0 b_1 / b_0 b_1 = a_0 / b_0$ which was assumed to be in lowest form, and a real fraction. Similarly for all terms in the numerator.
Since the product $p(x)q(x)$ which we write as fractions as $(a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x) / (b_0 b_1)(d_0 d_1)$ is integral, we must have that $b_0 b_1$ divides the numerator. Since $b_0 b_1$ does not divide the first factor $(a_0 b_1 + (a_1 b_0)x)$, it must divide the second factor $(c_0 d_1 + (c_1 d_0)x)$. Thus, the polynomial $q'(x) \equiv (c_0 d_1 + (c_1 d_0)x)/b_0 b_1$ is therefore integral [ie, $q'(x) \in Z[x]$].
By the exact same reasoning, we must have $d_0 d_1$ divides the product $p(x)q(x)$. Since $d_0 d_1$ does not divide $(c_0 d_1 + (c_1 d_0)x)$, it must divide (a_0 b_1 + (a_1 b_0)x) and therefore $p'(x) \equiv (a_0 b_1 + (a_1 b_0)x)/(d_0 d_1)$ is integral.
Thus, we can write $z(x) = p'(x) q'(x)$ where $p'(x), q'(x) \in \mathbb Z[x]$.
This generalizes, since we never used anything about being linear, we simply reasoned term by term.

Alternate way to show that the factorization is correct.

Start at $p(x)q(x) = (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x) / (b_0 b_1)(d_0 d_1)$.
Rewrite as $ p(x)q(x) \cdot (b_0 b_1)(d_0 d_1) = (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x)$
Suppose $\alpha$ is a prime factor of $b_0$. Then reduce the above equation mod $\alpha$. We get $0 \equiv_\alpha (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x)$. Since $\mathbb Z/\alpha \mathbb Z[x]$ is an integral domain, we have that one of $(a_0 b_1 + (a_1 b_0)x)$ or $(c_0 d_1 + (c_1 d_0)x)$ vanishes, and thus $p$ divides one of the two.
This works for all prime divisors of the denominators, thus we can "distribute" the prime divisors of the denominators across the two polynomials.
Proof that $Z/\alpha Z[x]$ is an integral domain: note that $Z/\alpha Z$ is a field, thus $Z/ \alpha Z[x]$ is a Euclidean domain (run Euclid algorithm). This implies it is integral.

How GHC does typeclass resolution

As told to me by davean:
Its like 5 steps
Find all instances I that match the target constraint; that is, the target constraint is a substitution instance of I. These instance declarations are the candidates.
If no candidates remain, the search fails. Eliminate any candidate IX
for which there is another candidate IY such that both of the following hold: IY is strictly more specific than IX. That is, IY is a substitution instance of IX but not vice versa. Either IX is overlappable, or IY is overlapping. (This “either/or” design, rather than a “both/and” design, allow a client to deliberately override an instance from a library, without requiring a change to the library.)
If all the remaining candidates are incoherent, the search succeeds, returning an arbitrary surviving candidate.
If more than one non-incoherent candidate remains, the search fails.
Otherwise there is exactly one non-incoherent candidate; call it the “prime candidate”.
Now find all instances, or in-scope given constraints, that unify with the target constraint, but do not match it. Such non-candidate instances might match when the target constraint is further instantiated. If all of them are incoherent top-level instances, the search succeeds, returning the prime candidate. Otherwise the search fails.
GHC manual

Defining continuity covariantly

Real analysis: coavriant definition: $f(\lim x) = \lim (f x)$. Contravariant definition in analysis/topology: $f^{-1}(open)$ is open.
Contravariant in topology via sierpinski: $U \subseteq X$ is open iff characteristic function $f(x) = \begin{cases} T & x \in U \ \bot & \text{otherwise} \end{cases}$ is continuous.
A function $f: X \to Y$ is continuous iff every function $f \circ s$ is continuous for every continuous $s: Y \to S$. That is, a function is continuous iff the pullback of every indicator is an indicator.
A topological space is said to be sequential iff every sequentially open set is open.
A set $K \subseteq X$ is sequentially open iff whenever a sequence $x_n$ has a limit point in $K$, then there is some $M$ such that $x_{\geq M}$ lies in $K$. [TODO: check]
Now consider $\mathbb N_\infty$, the one point compactification of the naturals. Here, we add a point called $\infty$ to $\mathbb N$, and declare that sets which have a divergent sequences and $\infty$ in them are open.
More abstractly, we declare all sets that are complements of closed and bounded sets with infinity in them as open. So a set $U \subseteq \mathbb N_{\infty}$ is bounded iff there exists a closed bounded $C \subseteq \mathbb N$ such that $U = \mathbb N / C \cup { infty }$.
A function $x: \mathbb N_\infty to X$ is continuous [wrt above topology] iff the sequence $x_n$ converges to the limit $x_\infty$.
See that we use functions out of $\mathbb N_\infty$ [covariant] instead of functions into $S$ [contravariant].
Now say a function $f: X \to Y$ is sequentially continuous iff for every continuous $x: \mathbb N_\infty \to X$, the composition $f \circ x: \mathbb N_\infty \to Y$ is continuous. Informally, the pushforward of every convergent sequence is continuous.
Can show that the category of sequential spaces is cartesian closed.
Now generalize $\mathbb N_\infty$
https://twitter.com/EscardoMartin/status/1444791065735729155

Why commutator is important for QM

Suppose we have an operator $L$ with eigenvector $x$, eigenvalue $\lambda$. So $Lx = \lambda x$.
Now suppose we have another operator $N$ such that $[L, N] = \kappa N$ for some constant $\kappa$.
Compute $[L, N]x = \kappa Nx$, which implies:

$$ \begin{aligned} &[L, N]x = \kappa Nx \ &(LN - NL)x = \kappa Nx \ &L(Nx) - N(Lx) = \kappa Nx \ &L(Nx) - N(\lambda x) = \kappa Nx \ &L(Nx) - \lambda N(x) = \kappa Nx \ &L(Nx) = \kappa Nx + \lambda Nx \ &L(Nx) = (\kappa + \lambda)Nx \ \end{aligned} $$

So $Nx$ is an eigenvector of $L$ with eigenvalue $\kappa + \lambda$.
This is how we get "ladder operators" which raise and lower the state. If we have a state $x$ with some eigenvalue $\lambda$, the operator like $N$ gives us an "excited state" from $x$ which eigenvalue $\kappa + \lambda$.

Deriving pratt parsing by analyzing recursive descent [TODO]

Level set of a continuous function must be closed

Let $f$ be continuous, let $L \equiv f^{-1}(y)$ be a level set. We claim $L$ is closed.
Consider any sequence of points $s: \mathbb N \to L$. We must have $f(s_i) = y$ since $s(i) \in L$. Thus, $f(s_i) = y$ for all $i$.
By continuity, we therefore have $f(\lim s_i) = \lim f(s_i) = y$.
Hence, $\lim s_i \in L$.
This explains why we build Zariski the way we do: the level sets of functions must be closed. Since we wish to study polynomials, we build our topology out of the level sets of polynomials.

HPNDUF - Hard problems need design up front!

Norvig v/s some TDD due try to solve sudoku

Separable Extension is contained in Galois extension

Recall that an extension is galois if it is separable and normal.
Consider some separable extension $L/K$.
By primitive element, can be written as $L = K(\alpha)$
Since $L$ is separable, the minimal polynomial of $\alpha$, $p(x) \in K[x]$ is separable, and so splits into linear factors.
Build the splitting field $M$ of $p(x)$. This will contain $L$, as $L = K(\alpha) \subseteq K(\alpha, \beta, \gamma, \dots)$ where $\alpha, \beta, \gamma, \dots$ are the roots of $p(x)$.
This is normal (since it is the splitting field of a polynomial).
This is separable, since it is generated by separable elements $\alpha$, $\beta$, $\gamma$, and so on.

Primitive element theorem

Let $E/k$ be a finite extension. We will characterize when a primitive element exists, and show that this will always happen for separable extensions.

Part 1: Primitive element iff number of intermediate subfields is finite

Forward: Finitely many intermediate subfields implies primitive

If $k$ is a finite field, then $E$ is a finite extension and $E^\times$ is a cyclic group. The generator of $E^\times$ is the primitive element.
So suppose $k$ is an infinite field. Let $E/k$ have many intermediate fields.
Pick non-zero $\alpha, \beta \in E$. As $c$ varies in $k$, the extension $k(\alpha + c\beta)$ varies amongst the extensions of $E$.
Since $E$ only has finitely many extensions while $k$ is infinite, pigeonhole tells us that there are two $c_1 \neq c_2$ in $E$ such that $k(\alpha + c_1 \beta) = k(\alpha + c_2 \beta)$.
Define $L \equiv k(\alpha + c_1 \beta)$. We claim that $L = E$, which shows that $\alpha + c_1 \beta$ is a primitive element for $E$.
Since $k(\alpha + c_2 \beta) = k(\alpha + c_1 \beta) = L$, this implies that $\alpha + c_2 \beta \in L$.
Thus, we find that $\alpha + c_1 \beta \in L$ and $\alpha + c_2 \beta \in L$. Thus, $(c_1 - c_2) \beta in L$. Since $c_1, c_2 \in k$, we have $(c_1 - c_2)^{-1} \in K$, and thus $\beta \in L$, which implies $\alpha \in L$.
Thus $L = k(\alpha, \beta) = k(\alpha + c_1 \beta)$.
Done. Prove for more generators by recursion.

Backward: primitive implies finitely many intermediate subfields

Let $E = k(\alpha)$ be a simple field (field generated by a primitive element). We need to show that $E/k$ only has finitely many subfields.
Let $a_k(x) \in k[x]$ be the minimal polynomial for $\alpha$ in $k$. By definition, $a$ is irreducible.
For any intermediate field $k \subseteq F \subseteq E$, define $a_F(x) \in F[x]$ to be the minimal polynomial of $\alpha$ in $F$.
Since $a_k$ is also a member of $F[x]$ and $a_k, a_F$ share a common root $\alpha$ and $a_F$ is irreducible in $F$, this means that $a_F$ divides $a_k$.
Proof sketch that irreducible polynomial divides any polynomial it shares a root with (Also written in another blog post): The GCD $gcd(a_F, a_k) \in F[x]$ must be non constant since $a_F, a_k$ share a root). But the irreducible polynomial $a_F$ cannot have a smaller polynomial ($gcd(a_F, a_k)$) as divisor. Thus the GCD itself is the irreducible polynomial $a_F$. This implies that $a_F$ divides $a_k$ since GCD must divide $a_k$.
Since $a_k$ is a polynomial, it only has finitely many divisors (upto rescaling, which does not give us new intermediate fields).
Thus, there are only finitely many intermediate fields if a field is primitive.

Interlude: finite extension with infinitely many subfields

Let $F = F_p(t, u)$ where $t, u$ are independent variables. This has finite degree since it has a vector space basis ${ t^iu^j }$.
Let $\alpha, \beta$ be roots of $x^p - t$ and $x^p - u$. Define $L \equiv F(\alpha, \beta)$.
Consider intermediate subfields $F_\lambda \equiv F(\alpha + \lambda \beta)$ for $\lambda \in F$.
Suppose $\lambda \neq \mu$ for two elements in $F$. We want to show that $F_\lambda \neq F_\mu$. This gives us infinitely many subfields as $F$ has infinitely many elements. TODO
Reference

Part 2: If $E/k$ is finite and separable then it has a primitive element

Let $K = F(\alpha, \beta)$ be separable for $\alpha, \beta \in K$. Then we will show that there exists a primitive element $\theta \in K$ such that $K = F(\theta)$.
By repeated application, this shows that for any number of generators $K = F(\alpha_1, \dots, \alpha_n)$, we can find a primitive element.
If $K$ is a finite field, then the generator of the cyclic group $K^\times$ is a primitive element.
So from now on, suppose $K$ is infinite, and $K = F(\alpha, \beta)$ for $\alpha, \beta \in F$.
Let $g$ be the minimal polynomial for $\alpha$, and $h$ the minimal polynomial for $\beta$. Since the field is separable, $g, h$ have unique roots.
Let the unique roots of $g$ be $\alpha_i$ such that $\alpha = \alpha_1$, and similarly let the unique roots of $h$ be $\beta_i$ such that $\beta = beta_1$.
Now consider the equations $\alpha_1 + f_{i, j} \beta_1 = \alpha_i + f_{i, j} \beta_j$ for $i \in [1, deg(g)]$ and $j \in [1, deg(h)]$.
Rearranging, we get $(\alpha_1 - \alpha_j) = f_{i, j} (\beta_j - \beta_1)$. Since $\beta_j \neq \beta_1$ and $\alpha_1 \neq \alpha_j$, this shows that there is a unique $f_{i, j} \equiv (\alpha_1 - \alpha_j)/(\beta_j - \beta_1)$ that solves the above equation.
Since the extension $F$ is infinite, we can pick a $f_*$ which avoids the finite number of $f_{i, j}$.
Thus, once we choose such an $f_$, let $\theta \equiv a_1 + f b_1$. Such a $\theta$ can never be equal to $\alpha_i + f \beta_j$ for any $f$, since the only choices of $f$ that make $\alpha_1 + f \beta_1 = \alpha_i + f \beta_j$ true are the $f_{i, j}$, and $f_$ was chosen to be different from these!
Now let $F_\theta \equiv F(\theta)$. Since $\theta \in K$, $E$ is a subfield of $K$.
See that $K = F(\alpha, \beta) = F(\alpha, \beta, \alpha + f \beta) = F(\beta, \alpha + f \beta) = F(\theta, \beta) = F_\theta(\beta)$.
We will prove that $K = F_\theta$.
Let $p(x)$ denote the minimal polynomial for $\beta$ over $F_\theta$. Since $K = F_\theta(\beta)$, if $p(x)$ is trivial, the $K = F_\theta$.
By definition, $\beta$ is a root of $h(x)$. Since $p(x)$ is an irreducible over $F_\theta$, we have that $p(x)$ divides $h(x)$ [proof sketch: irreducible polynomial $p(x)$ shares a root with $h(x)$. Thus, $gcd(p(x), h(x))$ must be linear or higher. Since $gcd$ divides $p(x)$, we must have $gcd = p(x)$ as $p(x)$ is irreducible and cannot have divisors. Thus, $p(x)$, being the GCD, also divides $h(x)$].
Thus, the roots of $p(x)$ must be a subset of the roots ${ \beta_j }$ of $h(x)$.
Consider the polynomial $k(x) = g(\theta - f_* \cdot x)$. $\beta$ is also a root of the polynomial $k(x)$, since $k(\beta) = g(\theta - f_* \beta)$, which is equal to $g((\alpha + f_* \beta) - f_* \beta) = g(\alpha) = 0$. [since $\alpha$ is a root of $g$].
Thus, we must have $p(x)$ divides $k(x)$.
We will show that $\beta_j$ is not a root of $k(x)$ for $j \neq 2$. $k(\beta_j) = 0$ implies $g(\theta - f_* \beta_j) = 0$, which implies $\theta - f_* \beta_j = \alpha_i$ since the roots of $g$ are $\alpha_i$. But then we would have $\theta = \alpha_i + f_* \beta_j$, a contradiction as $\theta$ was chosen precisely to avoid this case!
Thus, every root of $p(x)$ must come from ${ \beta_j }$. Also, the roots of $p(x)$ must come from the roots of $k(x)$. But $k(x)$ only shares the root $\beta_1$ with the set of roots $\beta_2, \dots, \beta_j$. Also, $p(x)$ does not have multiple roots since it is separable. Thus, $p(x)$ is linear, and the degree of the field extension is 1. Therefore, $K = E = F(\theta)$.

References

Separable extension via embeddings into alg. closure

Defn by embeddings

Let $L/K$ be a finite extension.
It is separable iff a given embedding $\sigma: K \to \overline K$ can be extended in $[L:K]$ ways (This number can be at most $[L:K]$.)
We call the numbe of ways to embed $L$ in $\overline K$ via extending $\sigma$ to be the separability degree of $L/K$.

At most $[L:K]$ embeddings exist

We will show for simple extensions $K(\alpha)/K$ that there are at most $[K(\alpha): K]$ ways to extend $\sigma: K \to \overline K$ into $\sigma': K(\alpha) \to \overline K$.
We use two facts: first, $\sigma'$ is entirely determined by where it sends $\alpha$. Second, $\alpha$ can only go to another root of its minimal polynomial $p \in K[x]$. Thus, there are only finitely many choices, and the minimal polynomial has at most $degree(p)$ unique roots, and $[K(\alpha):K] = degree(p)$. Thus, there are at most $degree(p_\alpha) = [L:K]$ choices of where $\alpha$ can go to, which entirely determines $\sigma'$. Thus there are at most $degree(p) = [K(\alpha):K]$ choices for $\sigma'$.
Given a larger extension, write a sequence of extensions $L = K(\alpha_1)(\alpha_2)\dots(\alpha_n)$. Then, since $[L:K] = [K(\alpha):K][K(\alpha_1, \alpha_2):K(\alpha_1)]$ and so on, can repeatedly apply the same argument to bound the number of choices of $\sigma'$.
In detail, for the case $K(\alpha)/K$, consider the minimal polynomial of $\alpha$, $p(x) \in K[x]$. Then $p(\alpha) = 0$.
Since $\sigma$ fixes $K$, and $p$ has coefficients from $K$, we have that $\sigma(p(x)) = p(\sigma(x))$.
Thus, in particular, $\sigma(0) = \sigma(p(\alpha)) = p(\sigma(\alpha))$.
This implies that $p(\sigma(\alpha)) = 0$, or $\sigma(\alpha)$ is a root of $p$.
Since $\sigma': L \to \overline K$, $\sigma'$ can only map $\alpha$ to one of the other roots of $p$.
$p$ has at most $deg(p)$ unique roots [can have repeated roots, or some such, so could have fewer that that].
Further, $\sigma'$ is entirely determined by where it maps $\alpha$. Thus, there are at most $[K(\alpha):K]$ ways to extend $\sigma$ to $\sigma'$.

Separability is transitive

Given a tower $K \subseteq L \subseteq M \subseteq \overline K$, we fix an embedding $\kappa: K \to \overline K$. If both $L/K$ and $M/L$ are finite and separable, then $\kappa$ extends into $\lambda: L \to \overline K$ through $L/K$ in $[L:K]$ ways, and then again as $\mu: L \to \overline K$ in $[M:L]$ ways.
This together means that we have $[L:K] \cdot [M:L] = [M:K]$ ways to extend $\kappa$ into $\mu$, which is the maximum possible.
Thus, $M/K$ is separable.

Separable by polynomial implies separable by embeddings

Let every $\alpha \in L$ have minimal polynomial that is separable (ie, has distinct roots).
Then we must show that $L/K$ allows us to extend any embedding $\sigma: K \to \overline K$ in $[L:K]$ ways into $\sigma': L \to K$
Write $L$ as a tower of extensions. Let $K_0 \equiv K$, and $K_{i+1} \equiv K_i(\alpha_i)$ with $K_n = L$.
At each step, since the polynomial is separable, we have the maximal number of choices of where we send $\sigma'$. Since degree is multiplicative, we have that $[L:K] = [K_1:K_0][K_2:K_1]\dots[K_{n-1}:K_n$.
We build $\sigma'$ inductively as $\sigma'_i: K \to K_i$ with $\sigma'_0 \equiv \sigma$.
Then at step $i$, $\sigma'{i+1}: K \to K(i+1)$ which is $\sigma'{i+1}: K \to K_i(\alpha_{i+1})$ has $[K_{i+1}:K_i]$ choices, since $\alpha_{i+1}$ is separable over $K_i$ since its minimal polynomial is separable.
This means that in toto, we have the correct $[L:K]$ number of choices for $\sigma_n: K \to K_n = L$, which is what it means to be separable by embeddings.

Separable by embeddings implies separable by polynomial

Let $L/K$ be separable in terms of embeddings. Consider some element $\alpha \in L$, let its minimal polynomial be $p(x)$.
Write $L = K(\alpha)(\beta_1, \dots, \beta_n)$. Since degree is multiplicative, we have $[L:K] = [K(\alpha):K][K(\alpha, \beta_i):K(\alpha)]$.
So given an embedding $\sigma: K \to \overline K$,we must be able to extend it in $[L:K]$ ways.
Since $\sigma$ must send $\alpha$ to a root of $\alpha$, and we need the total to be $[L:K]$, we must have that $p(x)$ has no repeated roots.
If $p(x)$ had repeated roots, then we will have fewer choices of $\sigma(\alpha)$ thatn $[K(\alpha):K]$, which means the total count of choices for $\sigma'$ will be less than $[L:K]$, thereby contradicting separability.

Finite extensions generated by separable elements are separable

Let $L = K(\alpha_1, \dots, \alpha_n)$ be separable, so there are $[L: K]$ ways to extend a map $\kappa: K \to \overline K$ into $\lambda: L \to \overline L$.
Since we have shown that separable by polyomial implies separable by embedding, we write $L = K(\alpha_1)(\alpha_2)\dots(\alpha_n)$. Each step is separable by the arguments given above in terms of counting automorphisms by where they send $\alpha_i$. Thus, the full $L$ is separable.

References

https://math.stackexchange.com/questions/2227777/compositum-of-separable-extension
https://math.stackexchange.com/questions/1248781/primitive-element-theorem-without-galois-group

Separable extensions via derivation

Let $R$ be a commutative ring, $M$ an $R$-module. A derivation is a map such that $D(a + b) = D(a) + D(b)$ and $D(ab) = aD(b) + D(a)b$ [ie, the calculus chain rule is obeyed].
Note that the map does not need to be an $R$-homomorphism (?!)
The elements of $R$ such that $D(R) = 0$ are said to be the constants of $R$.
The set of constants under $X$-differentiation for $K[X]$ in char. 0 is $K$, and $K[X^p]$ in char. p
Let $R$ be an integral domain with field of fractions $K$. Any derivation $D: R \to K$ uniquely extends to $D': K \to K$ given by the quotient rule: $D'(a/b) = (bD(a) - aD(b))/b^2$.
Any derivation $D: R \to R$ extends to a derivation $(.)^D: R[x] \to R[x]$. For a $f = \sum_i a_i x^i \in R[x]$, the derivation is given by $f^D(x) \equiv \sum_i D(a_i) X^i$. This applies $D$ to $f(x)$ coefficientwise.
For a derivation $D: R \to R$ with ring of constants $C$, the associated derivation $(.)^D: R[x] \to R[x]$ has ring of constants $C[x]$.
Key thm: Let $L/K$ be a field extension and let $D: K \to K$ be a derivation. $D$ extends uniquely to $D_L$ iff $L$ is separable over $K$.

If $\alpha$ separable, then derivation over $K$ lifts uniquely to $K(\alpha)$

Let $D: K \to K$ be a derivation.
Let $\alpha \in L$ be separable over $K$ with minimal polynomial $\pi(X) \in K[X]$.
So, $\pi(X)$ is irreducible in $K[X]$, $\pi(\alpha) = 0$, and $\pi'(\alpha) \neq 0$.
Then $D$ has a unique extension $D': K(\alpha) \to K(\alpha)$ given by:

\begin{aligned} D'(f(\alpha)) \equiv f^D(\alpha) - f'(\alpha) \frac{\pi^D(\alpha)}{pi'(\alpha)} \end{aligned}

To prove this, we start by assuming $D$ has an extension, and then showing that it must agree with $D'$. This tells us why it must look this way.
Then, after doing this, we start with $D'$ and show that it is well defined and obeys the derivation conditions. This tells us why it's well-defined.

Non example: derivation that does not extend in inseparable case

Consider $F_p(u)$ as the base field, and let $L = F_p(u)(\alpha)$ where $\alpha$ is a root of $X^p - u \in F_p(u)[x]$. This is inseparable over $K$.
The $u$ derivative on $F_p(u)$ [which treats $u$ as a polynomial and differentiates it] cannot be extended to $L$.
Consider the equation $\alpha^p = u$, which holds in $L$, since $\alpha$ was explicitly a root of $X^p - u$.
Applying the $u$ derivative gives us $p \alpha^{p-1} D(\alpha) = D(u)$. The LHS is zero since we are in characteristic $p$. The RHS is 1 since $D$ is the $u$ derivative, and so $D(u) = 1$. This is a contradiction, and so $D$ does not exist [any mathematical operation must respect equalities].

Part 2.a: Extension by inseparable element $\alpha$ does not have unique lift of derivation for $K(\alpha)/K$

Let $\alpha \in L$ be inseparable over $K$. Then $\pi'(X) = 0$ where $\pi(X)$ is the minimal polynomial for $\alpha \in L$.
In particular, $\pi'(\alpha) = 0$. We will use the vanishing of $\pi'(\alpha)$ to build a nonzero derivation on $K(\alpha)$ which extends the zero derivation on $K$.
Thus, the zero derivation on $K$ has two lifts to $K(\alpha)$: one as the zero derivation on $K(\alpha)$, and one as our non-vanishing lift.
Define $Z: K(\alpha) \to K(\alpha)$ given by $Z(f(\alpha)) = f'(\alpha)$ where $f(x) \in K[x]$. By doing this, we are conflating elements $l \in K(\alpha)$ with elements of the form $\sum_i k_i \alpha^i = f(\alpha)$. We need to check that this is well defined, that if $f(\alpha) = g(\alpha)$, then $Z(f(\alpha)) = Z(g(\alpha))$.
So start with $f(\alpha) = g(\alpha)$. This implies that $f(x) \equiv g(x)$ modulo $\pi(x)$.
So we write $f(x) = g(x) + k(x)\pi(x)$.
Differentiating both sides wrt $x$, we get $f'(x) = g'(x) + k'(x) \pi(x) + k(x) \pi'(x)$.
Since $\pi(\alpha) = \pi'(\alpha) = 0$, we get that $f'(\alpha) = g'(\alpha) + 0$ by evaluating previous equation at $\alpha$.
This shows that $Z: K(\alpha) \to K(\alpha)$ is well defined.
See that the derivation $Z$ kills $K$ since $K = K \alpha^0$. But we see that $Z(\alpha) = 1$, so $Z$ extends the zero derivation on $K$ while not being zero itself.
We needed separability for the derivation to be well-defined.

Part 2.b: Inseparable extension can be written as extension by inseparable element

Above, we showed that if we have $K(\alpha)/K$ where $\alpha$ inseparable, then derivations cannot be uniquely lifted.
We want to show that if we have $L/K$ inseparable, then derivation cannot be uniquely lifted. But this is not the same!
$L/K$ inseparable implies that there is some $\alpha \in L$ which is inseparable, NOT that $L = K(\alpha)/K$ is inseparable!
So we either need to find some element $\alpha$ such that $L = K(\alpha)$ [not always possible], or find some field $F$ such that $L = F(\alpha)$ and $\alpha$ is inseparable over $F$.
Reiterating: Given $L/K$ is inseparable, we want to find some $F/K$ such that $L = F(\alpha)$ where $\alpha$ is inseparable over $F$.
TODO!

Part 1 + Part 2: Separable iff unique lift

Let $L/K$ be separable. By primitive element theorem, $L = K(\alpha)$ for some $\alpha \in L$, $\alpha$ separable over $K$.
Any derivation of $K$ can be extended to a derivation of $L$ from results above. Thus, separable implies unique lift.
Suppose $L/K$ is inseparable. Then we can write $L = F(\alpha)/K$ where $\alpha$ is inseparable over $F$, and $K \subseteq F \subseteq L$.
Then by Part 2.a, we use the $Z$ derivation to non-zero derivation on $L$ that is zero on $F$. Since it is zero on $F$ and $K \subseteq F$, it is zero on $K$.
This shows that if $L/K$ is inseparable, then there are two ways to lift the zero derivation, violating uniqueness.

Lemma: Derivations at intermediate separable extensions

Let $L/K$ be a finite extension, and let $F/K$ be an intermediate separable extension. So $K \subseteq F \subseteq L$ and $F/K$ is separable.
Then we claim that every derivation $D: F \to L$ that sends $K$ to $K$ has values in $F$. (ie, it's range is only $F$, not all of $L$).
Pick $\alpha \in F$, so $\alpha$ is separable over $K$. We know what the unique derivation looks like, and it has range only $F$.

Payoff: An extension $L = K(\alpha_1, \dots, \alpha_n)$ is separable over $K$ iff $\alpha_i$ are separable

Recursively lift the derivations up from $K_0 \equiv K$ to $K_{i+1} \equiv K_i(\alpha_i)$. If the lifts all succeed, then we have a separable extension. If the unique lifts fail, then the extension is not separable.
The lift can only succeed to uniquely lift iff the final extension $L$ is separable.

Irreducible polynomial over a field divides any polynomial with common root

Let $p(x) \in K[x]$ be an irreducible polynomial over a field $K$. Let $p$ it share a common root $\alpha$ with another polynomial $q(x) \in K[x]$. Then we claim that $p(x)$ divides $q(x)$.
Consider the GCD $g \equiv gcd(p, q)$. Since $p, q$ share a root $\alpha$, we have that $(x - \alpha)$ divides $g$. Thus $g$ is a non-constant polynomial.
Further, we have $g | p$ since $g$ is GCD. But $p$ is irreducible, it cannot be written as product of smaller polynomials, and thus $g = p$.
Now, we have $g | q$, but since $g = p$, we have $g | q$. This implies $p | q$ for any $q$ that shares a root with $p$.

Galois extension

Let $M$ be a finite extension of $K$. Let $G = Gal(M/K)$. Then $M$ is said to be Galois iff:

$M$ is normal and separable (over $K$).
$deg(M/K) = |G|$. We will show that $|G| \leq deg(M/K)$. So $M$ is "symmetric as possible" --- have the largest possible galois group
$K = M^G$ [The fixed poits of $M$ under $G$]. This is useful for examples.
$M$ is the splitting field of a separable polynomial over $K$. Recall that a polynomial is separable over $K$ if it has distinct roots in the algebraic closure of $K$. Thus, the number of roots is equal to the degree.
$K \subseteq L \subseteq M$ and $1 \subseteq H \subseteq G$: There is a 1-1 correspondece between $L \mapsto Gal(M/L)$ [NOT $L/K$!], and the other way round, to go from $H$ to $M^H$. This is a 1-1 correspondence. $L$ is in the denominator because we want to fix $L$ when we go back.

We'll show (1) implies (2) implies (3) implies (4) implies (1)

(4) implies (1)

We've shown that splitting fields of sets of polynomials are normal, so this case is trivial.
Just to recall the argument, let $M$ be the splitting field of some separable polynomial $p \in K[x]$ over $K$. We need to show that $M$ is normal and separable.
It's separable because it only adds elements to new elements to $K$ which are the roots of $p$, a separable polynomial. Thus, the minimal polynomial of new elements will also be separable, and the base field is trivially separable.
We must now show that $M$ is normal. We proceed by induction on degree. Normality is trivial for linear polynomials, if $M$ contains one root it contains all of the roots (the only one).
Let $q \in K[x]$ have a root $\alpha \in M$. If $\alpha \in K$, then divide by $(x - \alpha)$ and use induction. So suppose $\alpha \not \in K$.
Then $\alpha$ is some element that is generated by the roots
Borcherds lecture

Separability of field extension as diagonalizability

Take $Q(\sqrt 2)$ over $Q$. $\sqrt(2)$ corresponds to the linear transform $[0 1][2 0]$ over the basis $a + b \sqrt 2$.
The chracteristic polynomial of the linear transform is $x^2 - 2$, which is indeed the minimal polynomial for $\sqrt(2)$.
Asking for every element of $Q(\sqrt 2)$ to be separable is the same as asking every element of $Q(\sqrt 2)$ interpreted as a linear opearator to have separable minimal polynomial.
Recall that the minimal polynomial is the lowest degree polynomial that annhilates the linear operator. So $minpoly(I) = x - 1$, $charpoly(I) = (x - 1)^n$.

Motivation for the compact-open topology

If $X$ is a compact space and $Y$ is a metric space, consider two functions $f, g: X \to Y$.
We can define a distance $d(f, g) \equiv \min_{x \in X} d(f(x), g(x))$.
The $\min_{x \in X}$ has a maximum because $X$ is compact.
Thus this is a real metric on the function space $Map(X, Y)$.
Now suppose $Y$ is no longer a metric space, but is Haussdorf. Can we still define a topology on $Map(X, Y)$?
Let $K \subseteq X$ be compact, and let $U \subseteq Y$ be open such that $f(K) \subseteq U$.
Since $Y$ is Hausdorff, $K \subseteq X$

Example of covariance zero, and yet "correlated"

$x$ and $y$ coordinates of points on a disk.
$E[X], E[Y]$ is zero because symmetric about origin.
$E[XY] = 0$ because of symmetry along quadrants.
Thus, $E[XY] - E[X] E[Y]$, the covariance, is zero.
However, they are clearly correlated. Eg. if $x = 1$, then $y$ must be zero.
If $Y = aX+b$ the $corr(X, Y) = sgn(a)$.

Hypothesis Testing

Mnemonic for type I versus type II errors

Once something becomes "truth", challenging the status quo and making it "false" is very hard. (see: disinformation).
Thus, Science must have high barriers for accepting hypothesis as true.
That is, we must have high barries for incorrectly rejecting the null (that nothing happened).
This error is called as type I error, and is denoted by $\alpha$ (more important error).
The other type of error, where something is true, but we conclude it is false is less important. Some grad student can run the experiment again with better experimental design and prove it's true later if need be.
Our goal is to protect science from entrenching/enshrining "wrong" facts as true. Thus, we control type I errors.
Our goal is to "reject" current theories (the null) and create "new theories" (the alternative). Thus, in statistics, we setup our tests with the goal of enabling us to "reject the null".

Mnemonic for remembering the procedure

$H_0$ is the null hypothesis (null for zero). They are presumed innocent until proven guilty.
If $H_0$ is judged guilty, we reject them (from society) and send them to the gulag.
If $H_0$ is judged not guilty, we retain them (in society).
We are the prosecution, who are trying to reject $H_0$ (from society) to send them to the gulag.
The scientific /statistical process is the Judiciary which is attempting to keep the structure of "innocent until proven guilty" for $H_0$.
We run experiments, and we find out how likely it is that $H_0$ is guilty based on our experiments.
We calculate an error $\alpha$, which is the probably we screw up the fundamental truth of the court: we must not send an innocent man to the gulag. Thus, $\alpha$ it the probability that $H_0$ is innocent (ie, true) but we reject it (to the gulag).

P value, Neyman interpretation

Now, suppose we wish to send $H_0$ to the gulag, because we're soviet union like that. What's the probability we're wrong in doing so? (That is, what is the probability that us sending $H_0$ is innocent and we are condemning them incorrectly to a life in the gulag)? that's the $p$ value. We estimate this based on our expeiment, of course.
Remember, we can never speak of the "probability of $H_0$ being true/false", because $H_0$ is true or is false [frequentist]. There is no probability.

P value, Fisher interpretation

The critical region of the test corresponds to those values of the test statistic that would lead us to reject null hypothesis (and send it to the gulag).
Thus, the critical region is also sometimes called the "rejection region", since we reject $H_0$ from society if the test statistic lies in this region.
The rejection region is usually corresponds to the tails of the sampling distribution.
The reason for that is that a good critical region almost always corresponds to those values of the test statistic that are least likely to be observed if the null hypothesis is true. This will be the "tails" / "non central tendency" if a test is good.
In this situation, we define the $p$ value to be the probability we would have observed a test statistic that is at least as extreme as the one we did get. P(new test stat >= cur test stat).
??? I don't get it.

P value, completely wrong edition

"Probability that the null hypothesis is true" --- WRONG
compare to "probability us rejecting the null hypothesis is wrong" -- CORRECT. The probability is in US being wrong, and has NOTHING to do with the truth or falsity of the null hypothesis itself.

Power of the test

The value $\beta$ is the probability that $H_0$ was guilty, but we chose to retain them into society instead.
The less we do this (ie, the larger is $1 - \beta$), the more "power" our test has.

Dumb mnemonic for remembering adjunction turnstile

The left side of the adjunction F wants to "push the piston" on the right side, so it must be F -| G where -| allows F to "crush" G with the flat surface |.

Delta debugging

Delta debugging from the fuzzing book
Start with a program that crashes.
Run reduce on it:

def reduce(inp: str, test: str -> bool):
  assert test(inp) == False
  # v remove 1/2 of the lines.
  n = 2 # Initial granularity
  while len(inp) >= 2:
    ix = 0
    found_failure = False
    skiplen = len(inp) / n

    while ix < len(inp):
      inp_noix = inp[:ix] +inp[ix+skiplen:]
      if not self.test(inp_noix):
          inp = inp_noix # use smaller input
          n = max(n - 1, 2) # decrease granularity by 1
          found_failure = True; break
      else:
        ix += skiplen

    if not found_failure:
      if n == len(inp): break
      n = min(n * 2, len(inp)) # double

  return inp

Tidy Data

The paper
Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data:

Each variable forms a column.

Each observation forms a row.

Each type of observational unit forms a table.

While the order of variables and observations does not affect analysis, a good ordering makes it easier to scan the raw values. One way of organising variables is by their role in the analysis: are values fixed by the design of the data collection, or are they measured during the course of the experiment? Fixed variables describe the experimental design and are known in advance. Computer scientists often call fixed variables dimensions, and statisticians usually denote them with subscripts on random variables. Measured variables are what we actually measure in the study. Fixed variables should come first, followed by measured variables, each ordered so that related variables are contiguous. Rows can then be ordered by the first variable, breaking ties with the second and subsequent (fixed) variables. This is the convention adopted by all tabular displays in this paper.

Messy 1: Column headers are values, not variable names

eg. columns are religion |<$10k |$10-20k |$20-30k |$30-40k |$40-50k |$50-75k.
melt dataset to get molten stacked data.

Messy 2: Multiple variables stored in one column

This often manifests after melting.
eg. columns are country | year | m014 | m1524 | .. | f014 | f1524...
columns represent both sex and age ranges. After metling, we get a single column sexage with entries like m014 or f1524
The data is still molten, so we should reshape it before it sets into tidy columnlar data. We do this by splitting the column into two, one for age and one for sex.

Messy 3: Variables are stored in both rows and columns

Original data:

id      year month element d1 d2 d3 d4 d5 ...
MX17004 2010 1     tmax    — — — — — — — —
MX17004 2010 1     tmin    — — — — — — — —
MX17004 2010 2     tmax    — 27.3 24.1 — — — — —
MX17004 2010 2     tmin    — 14.4 14.4 — — — — —
MX17004 2010 3     tmax    — — — — 32.1 — — —
MX17004 2010 3     tmin    — — — — 14.2 — — —
MX17004 2010 4     tmax    — — — — — — — —
MX17004 2010 4     tmin    — — — — — — — —
MX17004 2010 5     tmax    — — — — — — — —
MX17004 2010 5     tmin    — — — — — — — —

Some variables are in individual columns (id, year, month)
Some variables are spread across columns (day is spread as d1–d31)
Some variables are smearted across rows (eg. tmax/tmin). TODO: what does this mean, really?
First, tidy by collating into date:

id      date       element value
MX17004 2010-01-30 tmax 27.8
MX17004 2010-01-30 tmin 14.5
MX17004 2010-02-02 tmax 27.3
MX17004 2010-02-02 tmin 14.4
MX17004 2010-02-03 tmax 24.1
MX17004 2010-02-03 tmin 14.4
MX17004 2010-02-11 tmax 29.7
MX17004 2010-02-11 tmin 13.4
MX17004 2010-02-23 tmax 29.9
MX17004 2010-02-23 tmin 10.7

Dataset above is still molten. Must reshape along element to get two columns for max and min. This gives:

id      date       tmax tmin
MX17004 2010-01-30 27.8 14.5
MX17004 2010-02-02 27.3 14.4
MX17004 2010-02-03 24.1 14.4
MX17004 2010-02-11 29.7 13.4
MX17004 2010-02-23 29.9 10.7
MX17004 2010-03-05 32.1 14.2
MX17004 2010-03-10 34.5 16.8
MX17004 2010-03-16 31.1 17.6
MX17004 2010-04-27 36.3 16.7
MX17004 2010-05-27 33.2 18.2

Months with less than 31 days have structural missing values for the last day(s) of the month.
The element column is not a variable; it stores the names of variables.

Multiple types in one table:

data manipulation, relationship to `dplyr`:

Data transformation in R for data science
mutate() adds new variables that are functions of existing variables
select() picks variables based on their names.
filter() picks cases based on their values.
summarise() reduces multiple values down to a single summary.
arrange() changes the ordering of the rows.

Visualization

Most of R's visualization ecosystem is tidy by default.
base plot, lattice, ggplot are all tidy.

Modelling

Most modelling tools work best with tidy datasets.

Questions about performance benching in terms of tidy

Is runs of a program at different performance levels like O1, O2, O3 to be stored as separate columns? Or as a categorical column called "optimization level" with entries stored in separate rows of O1, O2, O3?
If we go by the tidy rule "Each variable forms a column", then this suggests that optimization level is a variable.
Then the tidy rule Each observation forms a row. makes us use rows like [foo.test | opt-level=O1 | <runtime>] and [foo.test | opt-level=O2 | <runtime>].
Broader question: what is the tidy rule for categorical column?
However, in the tidy data paper, Table 12, it is advocated to have two columns for tmin and tmax instead of having a column called element with choices tmin, tmax. So it seems to be preferred that if one has a categorical variable, we make its observations into columns.
This suggests that I order my bench data as [foo.test | O1-runtime=_ | O2-runtime=_ | O3-runtime=_ ].

Normal subgroups through the lens of actions

finite group is permutation subgroup
ghg' is relavelling by g
if gHg' = H, then H does not care about labelling
thus H treats everyone uniformly
prove that if H is normal, then if s in fix(H) then orb(s) in fix(H)
when is stab(S) normal?when Stab(gx) equals g Stab(x) g' ?
topology onS: closed sets are the common fixpoints of a set of group elements.

Writing rebuttals, Tobias style

Writing rebuttals, key take-aways:
Make your headings for all reviewers a sentence
Don't write in Q?A style
Write as a paragraph, where we write the strong answer first, and then point back to the question.
Use subclause to indicate that sentence is unfinished. Eg: the bug in our compiler has been fixed (bad!). The reader may see "the bug in our compiler..." and conclude something crazy. Ratther, we should write "While there was a bug in our compiler, we fixed it ...". The While makes it clear

LCS DP: The speedup is from filtration

I feel like I finally see where the power of dynamic programming lies.
Consider the longest common subsequence problem over arrays $A$, $B$ of size $n$, $m$.
Naively, we have $2^n \times 2^m$ pairs of subsequences and we need to process each of them.
How does the LCS DP solution manage to solve this in $O(nm)$?
Key idea 1: create "filtration" of the problem $F_{i, j} \subseteq 2^n\times2^m$. At step $(i, j)$, consider the "filter" $F_{i, j}$ containing all pairs of subsequences $(s \in 2^n, t \in 2^m)$ where maxix(s) \leq i and maxix(t) \leq j.
These filters of the filtration nest into one another, so $F_{i, j} \subseteq F_{i', j'}$ iff $i \leq i'$ and $j \leq j'$.
Key idea 2: The value of max LCS(filter) is (a) monotonic, and (b) can be computed efficiently from the values of lower filtration. So we have a monotone map from the space of filters to the solution space, and this monotone map is efficiently computable, given the values of filters below this in the filtration.
This gives us a recurrence, where we start from the bottom filter and proceed to build upward.
See that this really has nothing to do with recursion. It has to do with problem decomposition. We decompose the space $2^n \times 2^m$ cleverly via filtration $F_{i, j}$ such that max LCS(F[i, j]) was efficiently computable.
To find a DP, think of the entire state space, then think of filtrations, such that the solution function becomes a monotone map, and the solution function is effeciently computable given the values of filters below it.

Poisson distribution

Think about flipping a biased coin with some bias $p$ to associate a coin flip to each real number. Call this $b: \mathbb R \to {0, 1}$.
Define the count of an interval $I$ as $#I \equiv { r \in I | b(r) = 1 }$.
Suppose that this value $#I$ is finite for any bounded interval.
Then the process we have is a poisson process.
Since the coin flips are independent, all 'hits' of the event must be independent.
Since there is either a coin flip or there is not, at most one 'hit' of the event can happen at any moment in time.
Since the bias of the coin is fixed, the rate at which we see $1$s is overall constant.

F1 or Fun : The field with one element

Many combinatorial phenomena can be recovered as the "limit" of geometric phenomena over the "field with one element", a mathematical mirage.

Cardinality ~ Lines

Consider projective space of dimension $n$ over $F_p$. How many lines are there?
Note that for each non-zero vector, we get a 'direction'. So there are $p^n - 1$ potential directions.
See that for any choice of direction $d \in F_p - \vec 0$, there are $(p - 1)$ "linearly equivalent" directions, given by $1 \cdot d$, $2 \cdot d$, \dots, $(p - 1) \cdot d$ which are all distinct since field multiplication is a group.
Thus, we have $(p^n - 1)/(p - 1)$ lines. This is equal to $1 + p + p^2 + \dots + p^{n-1}$, which is $p^0 + p^1 + \dots + p^{n-1}$
If we plug in $p = 1$ (study the "field with one element", we recover $\sum_{i=0}^{n-1} p^i = n$.
Thus, "cardinality of a set of size $n$" is the "number of lines of $n$-dimensional projective space over $F_1$!
Since $[n] \equiv {1, 2, \dots, n}$ is the set of size $n$, it is only natural that $[n]_p$ is defined to be the lines in $F_p^n$. We will abuse notation and conflate $[n]_p$ with the cardinality, $[n]_p \equiv (p^n - 1)/(p - 1)$.

Permutation ~ Maximal flags

Recall that a maximal flag is a sequence of subspaces $V_1 \subseteq V_2 \subseteq \dots \subseteq V$. At each step, the dimension increases by $1$, and we start with dimension $1$. So we pick a line $l_1$ through the origin for $V_1$. Then we pick a plane through the origin that contains the line $l_1$ through the origin. Said differently, we pick a plane $p_2$ spanned by $l_1, l_2$. And so on.
How many ways can we pick a line? That's $[n]_p$. Now we need to pick another line orthogonal to the first line. So we build the quotient space $F_p^n/L$, which is $F_p^{n-1}$. Thus picking another line here is $[n-1]_p$. On multiplying all of these, we get $[n]_p [n-1]_p \dots [1]_p$.
In the case of finite sets, this gives us $1 \cdot 2 \cdot \dots n = n!$.

Combinations ~ Grassmanian

Recall that a grassmanian consists of $k$ dimensional subspaces of an $n$ dimensional space.
Reference: This week's finds 184 by baez

McKay's proof of Cauchy's theorem for groups [TODO]

In a group, if $gh = 1$ then $hg = 1$. Prove this by writing $hg = hg (h h^{-1}) = h(gh)h^{-1} = h \cdot 1 \cdot h^{-1} = 1$.
We can interpret this as follows: in the multiplication table of a group, firstly, each row contains exactly one $1$.
Also, when $g \neq h$ (ie, we are off the main diagonal of the multiplication table), each $gh = 1$ has a "cyclic permutation solution" $hg = 1$.
If the group as even order, then there are even number of $1$s on the main diagonal.
Thus, the number of solutions to $x^2 = 2$ for $x \in G$ is even, since each solution has another paired with it.
Let's generalize from pairs to
Reference

Odds versus probability [TODO]

https://www.youtube.com/watch?v=lG4VkPoG3ko
https://www.youtube.com/watch?v=HZGCoVF3YvM

ncdu for disk space measurement

I've started to use ncdu to get a quick look at disk space instead of baobab. It's quite handy since it's an ncurses based TUI.

nmon versus htop

I've switched to using nmon instead of htop for viewing system load. It's TUI looks much nicer than htop, and I find its process list much easier to parse.

Schrier sims --- why purify generators times coset

Let p = (0 3 4)(1 2). Let G = <p>. What is the stabilizer of k=0?
purify(p) = e so we would imagine we would have H = e.
But actually, consider orbit(k). We have 0 <-> id, 3 <-> p, 4 <-> p^2.
If I now consider p * orbit(k) then I get p, p^2, p^3, where purify(p) = id, purify(p^2) = id, purify(p^3) = p^3.
Thus we find the nontrivial generator p^3.

Vyn's feeling about symmetry

They are of the opinion that the correct definition of a symmetry of an object $S$ in space is that a transformation $T$ is a symmetry of $S$ iff $T(S) = S$ (as a set).
The above rules out things like translations of a cube.
Indeed, one can only recover translations by considering a line on the space and then considering the orbit of the line under a specific translation $T$.

Convergence in distribution is very weak

consider $X \sim N(0, 1)$. Also consider $-X$ which will be identically distributed (by symmetry of $-$ and $N$).
So we have that $-X \sim N(0, 1)$.
But this tells us nothing about $X$ and $-x$! so this type of "convergence of distribution" is very weak.
Strongest notion of convergence (#2): Almost surely. $T_n \xrightarrow{a.s} T$ iff $P({ \omega : T_n(\omega) \to T(\omega) }) = 1$. Consider a snowball left out in the sun. In a couple hours, It'll have a random shape, random volume, and so on. But the ball itself is a definite thing --- the $\omega$. Almost sure says that for almost all of the balls, $T_n$ converges to $T$.
#2 notion of convergence: Convergence in probability. $T_n \xrightarrow{P} T$ iff $P(|T_n - T| \geq \epsilon) \xrightarrow{n \to \infty} 0$ for all $\epsilon > 0$. This allows us to squeeze $\epsilon$ probability under the rug.
Convergence in $L^p$: $T_n \xrightarrow{L^p} T$ iff $E[|T_n - T|^p] \xrightarrow{n \to \infty} 0$. Eg. think of convergence in variance of a gaussian.
Convergence in distrbution: (weakest): $T_n \xrightarrow{d} T$ iff $P[T_n \leq x] \xrightarrow{n \to \infty} P[T \leq x]$ for all $x$.

Characterization of convergence in distribution

(1) $T_n \xrightarrow{d} T$
(2) For all $f$ continuous and bounded, we have $E[f(T_n)] \xrightarrow{n \to \infty} E[f(T)]$.
(2) we have $E[e^{ixT_n}] \xrightarrow{n \to \infty} E[e^{ixT}]$. [characteristic function converges].

Strength of different types of convergence

Almost surely convergence implies convergence in probability. Also, the two limits (which are RVs) are almost surely equal.
Convergence in $L^p$ implies convergence in probability and convergence in $L^q$ for all $q \leq p$. Also, the limits (which are RVs) are almost surely equal.
If $T$ converges in probability, it also converges in distribution (meaning the two sequences will have the same DISTRIBUTION, not same RV).
All of almost surely, probabilistic convergence, convergence in distribution (not $L^p$) map properly by continuous fns. $T_n \to T$ implies $f(T_n) \to f(T)$.
almost surely implies P implies distribution convergence.

Slutsky's Theorem

If $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{P} c$ (That is, the sequence of $Y_n$ is eventually deterministic),we then have that $(X_n, Y) \xrightarrow{d} (X, c)$. In particular, we get that $X_n + Y_n \xrightarrow{d} X + c$ and $X_n Y_n \xrightarrow{d} X c$.
This is important, because in general, convergence in distribution says nothing about the RV! but in this special case, it's possible.

References

MIT OCW stats

Class equation, P-group structure

Centralizer

The centralizer of a subset $S$ of a group $G$ is largest subgroup of $G$ which is the center of $S$. It's defined as $C_G(S) \equiv { g \in G : \forall s \in S, gs = sg }$. This can be written as $C_G(S) \equiv { g \in G : \forall s \in S, gsg^{-1} = s }$.

Conjucacy classes and the class equation

Define $g \sim g'$ if there exists a $h$ such that $g' =kgk^{-1}$. This is an equivalence relation on the group, and it partitions the group into conjugacy classes.
Suppose an element $z \in G$ is in the center (Zentrum). Now, the product $kzk^{-1} = z$ for all $k \in G$. Thus, elements in the center all sit in conjugacy classes of size $1$.
Let $Z$ be the center of the group, and let ${ J_i \subset G } $ (J for conJugacy) be conjugacy classes of elements other than the center. Let $j_i \in J_i$ be representatives of the conjugacy classes, which also generate the conjugacy class as orbits under the action of conjugation.
By orbit stabilizer, we have that $|J_i| = |Orb(j_i)| = |G|/|Stab(j_i)|$.
The stabilizer under the action of conjugation is the centralizer! So we have $|Orb(j_i) = |G|/|C(j_i)|$.
Thus, we get the class equation: $|G| = |Z| + \sum_{j_i} |G|/C(j_i)|$.

$p$-group

A $p$ group is a group where every element has order divisible by $p$.
Claim: a finite group is a $p$-group iff it has cardinality $p^N$ for some $N$.
Forward - $|G| = p^N$ implies $G$ is a $p$-group: Let $g \in G$. The $|\langle g \rangle|$ divides $|G| = p^N$ by Lagrange. Hence proved.
Backward - $p$ divides |\langle g \rangle| for all $g \in G$ implies $|G| = p^N$ for some $N$: Write $G$ as disjoint union of cyclic subgroups: $G = \langle g_1 \rangle \cup \langle g_2 \rangle \cup \dots \langle g_n \rangle$. Take cardinality on both sides, modulo $p$. Each of the terms on the RHS $|\langle g_i \rangle|$ is divisible by $p$, and thus vanish. Thus, $|G| =_p 0 + 0 + \dots + 0 = 0$ modulo $p$. Hence, $|G|$ is divisible by $p$.

Center of $p$ group

Let $G$ be a $p$-group. We know that $|G| = |Z(G)| + \sum_{g_i} |Orb(g_i)|$, where we are considering orbits under group conjugation.
See that $|Orb(g_i)| = |G|/|Stab(g_i)|$. The quantity on the right must be a power of $p$ (since the numerator is $p^N$). The quantity must be more than $1$, since the element $g_i$ is not in the center (and thus is conjugated non-trivially by some element of the group).
Thus, $|Orb(g_i)|$ is divisible by $p$.
Take the equation $|G| = |Z(G)| + \sum_{g_i} |Orb(g_i)|$ modulo $p$. This gives $0 =_p |Z(G)$. Hence, $Z(G) \neq { e }$ (Since that would give $|Z(G)| =_p 1 \neq 0$). So, the center is non-trivial.

Cauchy's theorem: order of group is divisible by $p$ implies group has element of order $p$.

Abelian case, order $p$: immediate, must be the group $Z/pZ$ which has generator of order $p$. Now induction on group cardinality.
Abelian case, order divisible by $p$: Pick an element $g \in G$ and let the cyclic subgroup be generated by it be $C_g$ and let the order of $g$ be $o$ (Thus, $|C_g| = o$).
Case 1: If $p$ divides $o$, then there is a power of $g$ with order $p$ (Let $o' \equiv o/p$. Consider $g^{o'}$; this has order $p$).
Case 2: If $p$ does not divide $o$. Then $p$ divides the order of the quotient $G' \equiv G / C_g$. Thus by induction, we have an element $h C_g \in G / C_g$ of order $p$.
Let $o$ be the order of $h$ in $G$. Then we have that that $(h C_g)^o = h^o C_g = e C_g$, where the last equality follows from the assumption that $o$ is the order of $h$. Thus we can raise $h C_g$ to $o$ get the identity in $G/C_g$. This implies $p$ (the order of $h G/C_g$) must divide $o$ (the order of $h$).
Thus, by an argument similar to the previous, there is some power of $h$ with order $p$. (Let $o' \equiv o/p$. Consider $h^{o'}$' this has order $p$)
General case: consider the center $Z$. If $p$ divides $|Z|$, then use the abelian case to find an element of order $p$ and we are done.
Otherwise, use the class equation: $|G| = |Z| + \sum_{j_i} |Orb(j_i)|$.
The LHS vanishes modulo $p$, the RHS has $|Z|$ which does not vanish. Thus there is some term $j_i$ whose orbit is not divisible modulo $p$.
We know that $Orb(j_i) = G/Stab(j_i)$ where the action is conjugacy. Since the LHS is not divisible by $p$, while $|G|$ is divisible by $p$, this means that $Stab(j_i)$ has order divisible by $p$ and is a subgroup of $G$.
Further, $Stab(j_i)$ is a proper subgroup as $Orb(j_i)$ is a proper orbit, and is thus not stabilized by every element of the group.
Use induction on $Stab(j_i)$ to find element of order $p$.

Subgroups of p-group

Let $G$ be a finite $p$ group. So $|G| = p^N$. Then $G$ has a normal subgroup of size $p^l$ for all $l \leq N$.
Proof by induction on $l$.
For $l = 0$, we have the normal subgroup ${ e }$.
Assume this holds for $k$. We need to show it's true for $l \equiv k + 1$.
So we have a normal subgroup $N_k$ of size $p^k$. We need to establish a subgroup $N_l$ of size $p^{k+1}$.
Consider $G/N_k$. This is a $p$-group and has cardinality $p^{N-k}$. As it is a $p$-group, it has non-trivial center. So, $Z(G/N_k)$ is non-trivial and has cardinality at least $p$.
Recall that every subgroup of the center is normal. This is because the center is fixed under conjugation, thus subgroups of the center are fixed under conjugation and are therefore normal.
Next, by Cauchy's theorem, there exists an element $z$ of order $p$ in $Z(G/N_k)$. Thus, there is a normal subgroup $\langle z \rangle \subset G/N_k$
We want to pull this back to a normal subgroup of $G$ of order $|\langle z \rangle \cdot N_k| = p^{k+1}$.
By correspndence theorem, the group $\langle z \rangle \cdot N_k$ is normal in $G$ and has order $p^{k+1}$. Thus we are done.

Sylow Theorem 1

I've always wanted a proof I can remember, and I think I've found one.

Let $G$ be a group such that $|G| = p^n m $ where $p$ does not divide $m$.
We start by considering the set of all subsets of $G$ of size $p^n$. Call this set $\Omega$.
We will prove the existence of a special subset $S \subseteq G$ such that $S \in \Omega$, and $|Stab(S)| = p^n$. That is, $|S| = p^n$ and $|Stab(S) = p^n$. This is somewhat natural, since the only way to get subgroups out of actions is to consider stabilizers.
We need to show the existence of an $S \in \Omega$ such that $Stab(S)$ has maximal cardinality.

Lemma: $\binom{pa}{pb} \equiv_p \binom{a}{b}$:

this is the coefficient of $x^{pb}$ in $(x + 1)^{pa}$. But modulo $p$, this is the same as the coefficient of $x^{pb}$ in $(x^p + 1^p)^a$. The latter is $\binom{a}{b}$. Thus, $\binom{ap}{bp} \equiv_p \binom{a}{b}$ (modulo $p$).

Continuing: Size of $\Omega$ modulo $p$:

Let us begin by considering $|\Omega|$. This is $\binom{p^n m}{p^n}$ since we pick all subsets of size $p^n$ from $p^n$ m. See that if we want to calculate $\binom{pa}{pb}$, this is the coefficient of $x^{pb}$ in $(x + 1)^{pa}$. But modulo $p$, this is the same as the coefficient of $x^{pb}$ in $(x^p + 1^p)^a$. The latter is $\binom{a}{b}$. Thus, $\binom{ap}{bp} \equiv_p \binom{a}{b}$ (modulo $p$). Iterating the lemma shows us that $\binom{p^n m}{p^n} = m$. Thus, $p$ does not divide $|\Omega|$, since $m$ was the $p$-free part of $|G|$.
This implies that there is some orbit $O \subset \Omega$ whose size is not divisible by $p$. --- Break $\Omega$ into orbits. Since the left hand side $|\Omega|$ is not divisible by $p$, there is some term in the orbits size that is not divisible by $p$.
Let the orbit $O$ be generated by a set $S \in \Omega$. So $O = Orb(S)$. Now orbit stabilizer tells us that $|Orb(S)| \cdot |Stab(S)| = |G|$. Since $|O = Orb(S)|$ is not divisible by $p$, this means that $Stab(S)$ must be of size at least $p^n$. It could also have some divisors of $m$ inside it.
Next, we will show that $Stab(S)$ can be at most $p^n$.

Lemma: size of stabilizer of subset when action is free:

Let a group $G$ act freely on a set $S$. This means that for all group elements $g$, if for any $s$ we have $g(s) = s$, then we must have $g = id$. In logic, this is: $\forall g, \exists s, g(s) = s \implies g = id$.
See that an implication of this is that for any two elements $s, t \in S$, we can have at most one $g$ such that $g(s) = t$. Suppose that we have two elements, $g, h$ such that $g(s) = t$ and $h(s) = t$. This means that $g^{-1}h(s) = s$. But we know that in such a case, $gh^{-1} = id$ or $g = h$.
What does this mean? it means that $Stab(s) = { e }$ for all $s$.
Now let's upgrade this to subsets of $S$. Let $P$ (for part) be a subset of $S$. What is $|Stab(P)|$? We want to show that it is at most $P$. Let's pick a unique basepoint $p_0 \in P$ [thus $p_0 \in S$ since $P \subseteq S$].
Let's suppose that $g \in Stab(P)$. This means that $g(p_0) \in P$. Say it sends $p_0$ to $p_g \in P$. Now no other element of $Stab(P)$ can send $p_0$ to $p_g$ since the action is free!
Thus, there are at most $|P|$ choices for $p_0$ to be sent to, one for each element of $Stab(P)$.
Thus, $|Stab(P)| \leq |P|$.

Continuing: Showing that $|Stab(S) = p^n$.

Since the action of $G$ on $G$ is free, and since we are considering the stabilizer of some subset $S \subseteq G$, we must have that $|Stab(S) \leq |S| = p^n$. Thus, since $|S| \geq p^n$ (from the orbit argument above) and $|S| \leq p^n$ (from the stabilizer argument), we have $|Stab(S) = p^n$. Thus we are done.
More explicitly perhaps, let us analyze $|Stab(S)|$. We know that $Stab(S) \cdot S = S$. Thus, for any $t \in S$, we know that $Stab(S) \cdot t \subseteq S$. Thus, $|Stab(s) \cdot t| \leq |S|$.
Also notice that $|Stab(S) \cdot t$ is a coset of $Stab(S)$. Thus, $|Stab(S) \cdot t| = |Stab(S)|$.

Combining the above, we find that $|Stab(S)| \leq |S|$. So the stabilizer of size $|S| = p^k$ it is in some sense "maximal": it has the largest size a stabilizer could have!

Fuzzing book

Statement coverage is different from branch coverage, since an if (cond) { s1; } s2 will say that s1 and s2 were executed when cond=True, so we have full statement coverage. On the other hand, this does not guarantee full branch coverage, since we have not exectuted the branch where cond=False. We can't tell that we haven't covered this branch since there is no statement to record that we have taken the else branch!
Branch distances: for conditions a == b, a != b, a < b, a <= b, define the "distance true/distance false" to be the number that is to be added/subtracted to a to make the condition true/false (for a fixed b). So, for example, the "distance true" for a == b is abs(b - a), while "distance false" is 1 - int(a == b).
What are we missing in coverage? The problem here is that coverage is unable to evaluate the quality of our assertions. Indeed, coverage does not care about assertions at all. However, as we saw above, assertions are an extremely important part of test suite effectiveness. Hence, what we need is a way to evaluate the quality of assertions.
Competent Programmer Hypothesis / Finite Nbhd Hypothesis: Mutation Analysis provides an alternative to a curated set of faults. The key insight is that, if one assumes that the programmer understands the program in question, the majority of errors made are very likely small transcription errors (a small number of tokens). A compiler will likely catch most of these errors. Hence, the majority of residual faults in a program is likely to be due to small (single token) variations at certain points in the structure of the program from the correct program (This particular assumption is called the Competent Programmer Hypothesis or the Finite Neighborhood Hypothesis).
Equivalent mutants: However, if the number of mutants are sufficiently large (say > 1000), one may choose a smaller number of mutants from the alive mutants randomly and manually evaluate them to see whether they represent faults. We choose the sample size by sampling theory of binomial distributions.
Chao's estimator: way to estimate the number of true mutants (and hence the number of equivalent mutants) is by means of Chao's estimator:

$$ hat M \equiv \begin{cases} M(n) + k_1^2 / (2 k_2) & \text{if } k_2 > 0 \ M(n) + k_1(k_1 - 1)/2 & \text{otherwise} \ \end{cases} $$

$k_1$ is the number of mutants that were killed exactly once, $k_2$ is the number of mutants that were killed exactly twice. $\hat M$ estimates the the true numbe of mutants.
If $T$ is the total mutants generated, then $T - M(n)$ represents immortal mutants.
$\hat M$ is the is the mutants that the testset can detect given an infinite amount of time.

Filtered Colimits [TODO]

Reference

Fisher Yates

We wish to generate a random permutation.
Assume we can generate a random permutation of $[a, b, c]$.
How do we extend this to a random permutation of $[w, x, y, z]$?
Idea: (0) Consider ${w, x, y, z}$ in a line. Our random permutation is $[_, _, _, _]$.
(1) Decide who goes at the rightmost blank. Suppose $y$. Then our random permutation state is $[_, _, _, y]$.
(2) What do we have left? We have ${w, x, z }$. Use recursion to produce a random permutation of length 3 with $[w, x, z]$.
(3) Stick the two together to get a full random permutation.
To save space, we can write this on a "single line", keeping a stick to tell us which part is the "set", and which part is the "array":

0. {w,  x,  y, z}[]
1. {w, x, y, [z}] (growing array)
1. {w,  x,  z, [y}] (swapping z<->y)
1. {w,  x,  z}, [y] (shrinking set)

Similarly, for the next round, we choose to swap w with z as follows:

1. {w,  x,  z}, [y]
2. {w, x,  [z}, y] (grow array)
2. {x,  z, [w}, y]  (swap w <-> x)
2. {x, z}, [w, y] (shrinking set)

For the next round, we swap z with z (ie, no change!)

2. {x,  z}, [w, y]
3. {x, [z}, w, y] (grow array)
3. {x, [z}, w, y] (swap z<->z)
3. {x},[z, w, y] (shrink set)

Finally, we swap x with x:

3. {x},  [z, w, y]
4. {[x}, z, w, y] (grow array)
3. {x}, [z, w, y] (swap x<->x)
3. {}[x, z, w, y] (shrink set)

This way, we generate a random permutation in place, by treating the left portion of the sequence as a set, and the right portion of the sequence as a sorted permutation. At each stage, we grow the array, and choose a random element from the set to "enter" into the array at the location of intersection between set and array.
In code, the index i tracks the location of the array border, where we must fix the value of the permutation (ie, the ordering of elements) at the ith location. Th index r is a random index chosen in [0, i] which is the element to be chosen as the value at the ith location.

@composite
def permutation(draw, n):
    # raw random
    # Fishes Yates: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
    xs = { i : xs[i] for i in range(n) }

    i = n-1 # from (n-1), down to zero.
    while i >= 0:
        r = draw(integers(0, i)) # r ∈ [0, i]
        temp = xs[i]; xs[i] = xs[r]; xs[r] = temp; # swap
        i -= 1
    return xs

Bucchberger algorithm

multidegree: term of maximum degree, where maximum is defined via lex ordering.
Alternatively, multidegree is the degree of the leading term.
If multideg(f) = a and multideg(g) = b, define c[i] = max(a[i], b[i]). Then $\vec x^c$ is the LCM of the leading monomial of $f$ and the leading monomial of $g$.
The S-polynomial of $f$ and $g$ is the combination $\vec (x^c/LT(f)) f - (\vec x^c/LT(g)) g$
The S-polynomial is designed to create cancellations of leading terms.

Bucchberger's criterion

Let $I$ be an ideal. Then a basis $\langle g_1, \dots, g_N \rangle$ is a Groebner basis iff for all pairs $i \neq j$, $S(g_i, g_j) = 0$.
Recall that a basis is an Grober basis iff $LT(I) = \langle LT(g_1), \dots, LT(g_N) \rangle$. That is, the ideal of leading terms of $I$ is generated by the leading terms of the generators.
for a basis $F$, we should consider $r(i, j) \equiv rem_F(S(f_i, f_j))$. If $r(i, j) \neq 0$, then make $F' \equiv F \cup { S(f_i, f_j }$.
Repeat till we find that

GAP permutation syntax

The action of permutation on an element is given by $i^p$. This is the "exponential notation" for group actions.
See that we only ever write permutations multiplicatively, eg (1) (23) is the composition of permutations [written multiplicatively].
Thus the identity permutation must be 1, and it's true that any number n^1 = n, so the identity permutation 1 fixes everything.

Why division algorithm with multiple variables go bad

In C[x, y], defining division is complicated, and needs grobner bases to work.
It's because they don't obey the GCD property. Just because gcd(a, b) = g does not mean that there exist k, l such that ak + bl = g
For example, in C[x, y], we have gcd(x, y) = 1 but we don't have polynomials k, l such that kx + ly = 1.
Proof: suppose for contradiction that there do exist k, l such that kx + ly = 1. Modulo x, this means that ly = 1 which is absurd, and similarly modulo x it means kx = 1 which is also absurd.

Integral elements of a ring form a ring [TODO]

An integral element of a field $L$ (imagine $\mathbb C$) relative to an integral domain $A$ (imagine $\mathbb Z$) is the root of a monic polynomial in $A$.
So for example, in the case of $\mathbb C$ over $\mathbb Z$, the element $i$ is integral as it is a root of $p(x) = x^2 + 1$.
On the other hand, the element $1/2$ is not integral. Intuitively, if we had a polynomial of which it is a root, such a polynomial would be divisible by $2x - 1$ (which is the minimal polynomial for $1/2$). But $2x - 1$ is not monic.
Key idea: take two element $a, b$ which are roots of polynomial $p(x), q(x) \in A[x]$.
Create the polynomial $c(x)$ (for construction) given by $c(x) \equiv p(x)q(x) \in A[x]$. See that $c(x)$ has both $a$ and $b$ as roots, and lies in $A[x]$.

"Cheap" proof of euler characteristic

If we punch a hole in a sphere, we create an edge with no vertex or face. This causes $V - E + F$ to go down by 1.
If we punch two holes, that causes $V - E + F$ to go down by two. But we can glue the two edges together. This gluing gives us a handle, so each hole/genus reduces the euler characteristic by two!

Siefert Algorithm [TODO]

Algorithm to find surface that a knot bounds.
If we find a surface, then the genus of the boundary is one minus the genus of the surface.
Compute genus via classification of surfaces.

Cap product [TODO]

https://www.youtube.com/watch?v=oxthuLI8PQk
We need an ordered simplex, so there is a total ordering on the vertices. This is to split a chain apart at number $k$.
Takes $i$ cocahins and $k$ chains to spit out a $k - i$ chain given by $\xi \frown \gamma \equiv \sum_a \gamma_a \xi (a_{\leq i}) a_{\geq i}$.
The action of the boundary on a cap product will be $\partial (\xi \frown \gamma) \equiv (-1)^i [(\xi \frown \partial \gamma) - (\partial \gamma \frown \gamma)]$.
Consequence: cocycle cap cycle is cycle.
coboundary cap cycle is boundary.
cocyle cap boundary is boundary.
Cap product will be zero if the chain misses the cochain.
Cap product will be nonzero if the chain must always intersect the cochain.
This is why it's also called as the intersection product, since it somehow counts intersections.

Cup product [TODO]

We need an ordered simplex, so there is a total ordering on the vertices. This is to split a chain apart at number $k$.
Can always multiply functions together. This takes a $k$ chain $\xi$ and an $l$ chain $\eta$ and produces $\xi \cup \eta$ which is a $k + l$ cochain. The action on a $(k+l)$ chain $\gamma$ acts by $(\xi \cup \eta)(\gamma) \equiv \xi (\gamma_{\leq k}) \cdot \eta (\gamma_{> k})$.
No way this can work for chains, can only ever work for cochains.
This cup product "works well" with coboundary. We have $\partial (\xi \cup \eta) \equiv (\partial \xi \cup \eta) + (-1)^k (\xi \cup \partial \eta)$.
We get cocycle cup cocyle is cocycle.
Similarly, coboundary cup cocycle is coboundary.
Simiarly, cocycle cup coboundary is coboundary.
The three above propositions imply that the cup product descends to cohomology groups.
The algebra of cohomology (cohomology plus the cup product) sees the difference between spaces of identical homology!
The space $S^1 \times S^1$ have the same homology as $S^2 \cap S^1 \cap S^1$. Both have equal homology/cohomology.
However, we will find that it will be zero on the torus and non-zero on other side.
The cup product measures how the two generators are locally product like. So if we pick two generators on the torus, we can find a triangle which gives non-zero

Colimits examples with small diagram categories

Given a colimit, compute the value as taking the union of all objects, and imposing the relation $x \sim f(x)$ for all arrows $f \in Hom(X, Y)$ and all $x \in X$.
A colimit of the form $A \xrightarrow{f} B$ is computed by taking $A \sqcup B$ and then imposing the relation $a \sim f(b)$. This is entirely useless.
A colimit of the form $A \xrightarrow{f, g} B$ is computed by taking $A \sqcup B$ and then imposing the relation $a \sim f(a)$ as well as $a \sim g(a)$. Thus, this effectively imposes $f(a) \sim g(a)$. If we choose $f = id$, then we get $a \sim g(a)$. So we can create quotients by taking the colimit of an arrow with the identity.
A colimit of the form $A \xleftarrow{f} B \xrightarrow{g} C$ will construct $A \cup B \cup C$ and impose the relations $b \sim f(b) \in A$ and $b \sim g(b) \in C$. Thus, we take $A, B, C$ and we glue $A$ and $C$ along $B$ via $f, g$. Imagine gluing the upper and lower hemispheres of a sphere by a great circle.

Limits examples with small diagram categories

Given a limit, compute the value as taking product of all objects, and taking only those tuples which obey the relations the relation $f(a) = b$ for all arrows $f \in Hom(X, Y)$.

Classification of compact 2-manifolds [TODO]

Oriented compact 2-surfaces: sphere, torus, 2 holed torus, etc.
have euler characteristic $V - E + F $ as $2 - 2g$
Strategy: cut surface into polygonal pieces. Use oriented edges to know cutting. Lay them down on the surface such that the "top part" or "painted surface" will be up [so we retain orientation].
Attach all the polygons into one big polygon on the plane.
For each edge on boundary of the big polygon, it must attach to some other boundary of the big poygon [since the manifold is compact]. Furthermore, this edge must occur in the opposite direction to make the surface orientable. Otherwise we could pass through the side and flip orientation. Consider:

>>>>
|  |
>>>>

When I appear from the "other side", my direction wil have flipped. [TODO]
So far, we know the edges. What about identifying vertices?
Next, we need to group vertices together on the big polygon. We can find this by going around the edges incident at the vertex on the manifold surface.
The next step is to reduce the number of vertices to exactly one. We can cut the current polygon and re-paste it as long as we preserve all cutting/pasting relations.
Suppose I glue all the B vertices to a single vertex. Then, the edges emenating from this B vertex must necessarily be the same. If not, then the edge emenating would need a complementary edge somewhere else, which would give me another "copy" of the B vertex.
I can imagine such a B vertex as being "pushed inside the polygon" and then "glued over itself", thereby making it part of the interior of the polygon.
We can repeat this till there is only one type of vertex (possibly multiple copies).
If we only had two adjacent edges [edges incident against the same vertices], then we are done, since we get a sphere.
We can always remove adjacent pairs of edges. What about non-adjacent pairs?
Take a non adjacent pair. think of these as "left" and "right". We claim that for each edge at the "top", there is a corresponding edge at the "bottom". So we have left and right identified, and top identified with a continugous segment in the bottom. If there wasn't, then we would need another vertex!
This lets me create a commutator on the boundary, of the form $cdc^{-1}d^{-1}x$. Topologically, this is a handle, since if it were "full" [without the extra $x$], then we would have a torus. Since we do have the $x$, we have a "hole on the torus" which is a handle.
We keep removing hanldes till we are done.

Why does euler characteristic become $2-2g$?

If we add a vertex on an edge, we add a vertex and subrtact the (new) edge we have created. Thus $\xi$ is unchanged on adding a vertex on an edge.
Joining two vertices on a face also does not change $\xi$, since we add an edge and a face.
Given any two subdivisios, we find a common finer subdivision by these steps. Since the steps we use retain the euler characteristic, finally my original subdiv = common subdiv = friend subdiv.
Key idea: at each crossing between our subdivsion and the other subdivision, make a new vertex at every crossing. Then "trace over" the other subdivision to make our subdivision agree on the other subdivision on the inside.

https://www.youtube.com/watch?v=dUOmU-0t2Nc&list=PLIljB45xT85DWUiFYYGqJVtfnkUFWkKtP&index=27

Gauss, normals, fundamental forms [TODO]

consider a parametrization $r: u, v \to \mathbb R^3$
at a point $p = r(u, v)$ on the surface, the tangent vectors are $r_u \equiv \partial_u r$ and similarly $r_v \equiv \partial_v r$.
Let $k = xr_u + y r_v$. Then $k \cdot k$ is the first fundamental form. Computed as $k= (xr_u + y r+v) \cdot (x r_u + y r_v)$. Write this as $E x^2 + 2F x y + G y^2$. These numbers depend on the point $(u, v)$, or equally, depend on the point $p = r(u, v)$.
Further, we also have a normal vector to the tangent plane.$N(p)$ is the unit normal pointing outwards. We can describe it in terms of a parametrization as $n \equiv r_u \times r_v / ||r_u \times r_v||$.
Gauss map / Gauss Rodrigues map ($N$): map from the surface to $S^2$. $N$ sends a point $p$ to the unit normal at $p$.
The tangent plane to $N(p)$ on the sphere is parallel to the tanent plane on the surface at $p$, since the normals are the same, as that is the action of $N$ which sends the normal at the surface $p \in S$ to a point of the sphere / normal to the sphere.
Thus, the the derivative intuitively "preserves" tangent planes! [as normal directions are determined].
If we now think of $dN$, it's a map from $T_p S$ to $T N(p) S^2 = T_p S$. Thus it is a map to the tangent space to itself.
In terms of this, gauss realized that gaussian curvature $K_2 = K = k_1 k_2$ is the determinant of the map $dN_p$ [ie, the jacobian]. Curvature is the distortion of areas by the normal. So we can think of it as the ratio of areas area of image/area of preimage.

https://www.youtube.com/watch?v=drOldszOT7I&list=PLIljB45xT85DWUiFYYGqJVtfnkUFWkKtP&index=34

Second fundamental form

Let $z = f(x, y)$ be a (local) parametrization of the surface. Taylor expand $f$. we get:
$f(x + dx, y + dy) = f(x, y) + dx^T a + dy^T b + dx^T L dx + 2 dx^T M dy + dy^T N dy$.
We must get such a taylor expansion since our output is 1D (a real number), inputs are $dx, dy$ which are 3D vectors, and the infinitesimals must be linear/tensorial. These are the only possible contractions we can make.
So, the second degree part can be written as:

$$ \begin{bmatrix} x & y\end{bmatrix} \begin{bmatrix} L & M \ M & N\end{bmatrix} \begin{bmatrix} x \ y\end{bmatrix} $$

the matrix in the middle, or the quadratic form $II \equiv dx^T L dx + 2 dx^T M dy + dy^T N dy$ is the second fundamental form.

Classical geometry

Let $z = f(x, y)$ be a (local) parametrization of the surface.
At each point $p ≡ (u, v)$ on the surface within the local parametrization, we get tangent vectors $r_u(p) ≡ (\partial_x f(x, y)_p, r_v(p) ≡ (\partial_y f(x, y))_p$ which span the tangent space at $p$
These define a unique normal vector $n(p) ≡ r_u(p) × r_v(p)$ at each point on the surface. This gives us a normal field.
The coefficient of the second fundamental form project the second derivative of the function $f$ onto the normals. So they tell us how much the function is escaping the surface (ie, is moving along the normal to the surface) in second order.
Recall that this is pointless to do for first order, since on a circle, tangent is perpendicular to normal, so any dot product of first order information with normal will be zero.
Alternatively, first order information lies on tangent plane, and the normal is explicitly constructed as perpendicular to tangent plane, so any dot product of first order info with normal is zero.
We can only really get meaningful info by dotting with normal at second order.
So we get that $L(p) = (\partial_x \partial_x f(x, y))(p) \cdot N(p)$, $M(p) = (\partial_x \partial_y f(x, y))(p)$, and $N(p) = (\partial_y \partial_y f(x, y))(p)$, where we define $L, M, N$ via second fundamental form

Proof of equivalence between 2nd fundamental form and geometry

Shape operator [TODO]

Principal curvature

take a point $p$. Consider the normal to the surface at the point, $N(p)$.
Take any normal plane: a plane $Q_p$ which contains $N(p)$. This plane (which is normal to the surface, since it contains the normal) intersecs the surface $S$ at a curve (intuitively, since a plane in 3D is defined by 1 eqn, intersection with plane imposes 1 equation on the surface, cutting it down to 1D).
The curvature of this curve (normal plane $Q_p$ intersection surface $S$) at point $p$ is the normal curvature of the normal plane $Q_p$.
The maximum and minimum such normal curvatures at a point (max, min taken across all possible normal planes $Q_p$) are the principal curvatures.

Shape operator has principal curvatures as eigenvalues

https://math.stackexchange.com/questions/36517/shape-operator-and-principal-curvature
https://math.stackexchange.com/questions/3665865/why-are-the-eigenvalues-of-the-shape-operator-the-principle-curvatures

Shape operator in index notation

Let $X$ be tangent vectors at point $p$, $N$ be normal to surface at point $P$. The shape operator $S_{ij}$ is determined by the equation:
$\partial_i \mathbf N = -S_{ji} \mathbf X_b$

Theorem Egregium / Gauss's theorem (Integrating curvature in 2D) [TODO]

Let $S$ be a 2 dimensional surface.
Gauss Rodriguez map map: $N: S \to S^2$. The derivative of this map goes from $dN: T_p S \to T_p S^2$.
Since surfaces are parametric, we can think of it as a map from $U \subset \mathbb R^n \to S \to S^2$.
For gauss, the curvature of the surface at $p$ is $det(dN|_p)$. This tells us how small areas (on the tangent plane of $S$) is distorted (on the tangent plane of $S^2$, because it's the determinant / jacobian of the map. Thus, heuristically, it is the ratio of the area around $N(p)$ at $S^2$ to the area around $p$ at $S$
To show that this normal curvature view really is curvature, let's compute $dN_p$ for a normal paraboloid. Wildberger says that all surfaces are like normal paraboloids upto second order.
This fits with one of our views of curvature of a curve: one way was one over the osculating circle, the other was $k \cdot ds = d \theta$
We had a formula like $\int k ds$ was a change in angle. Similarly, in our case, we see that if we consider $\int \int k(s) darea(s)$, we get the area of the image of $N$, because infinitesimally is the ratio of areas.
In particular if the surface is homeomorphic to a sphere, then we get the total area of the sphere, $4 \pi$.. This is the 2D analogue of the fact that if we integrate the curvature of a closed curve, we get $2 \pi$. [area of a circle]. This is by green's theorem.

Integrating Curvature in 1D [TODO]

All curves are parametrized by arc length to avoid weird artefacts by time parametrization.
So $r(s)$ is a function from length of the curve to $\mathbb R^3$.
The (unit?) tangent to a curve is given by $T(s) \equiv dr/ds = r'(s)$.
The curvature is given by $\kappa(s) \equiv |dr^2/ds^2|$.
The unit normal is given by $\hat N(s) r''(s) / \kappa(s)$.
We wish to consider the total curvature, given by $\int_0^L \kappa(s) ds$ where $L$ is the total length of a closed curve on the plane.
TODO: how to prove that this will be a multiple of $2 \pi$?

Fundamental theorem of symmetric polynomials

Every symmetric polynomial of variables $x, y, z$ can be written in terms of the elementary symmetric polynomials $\sigma_0 \equiv 1$, $\sigma_1 = x + y + z$, $\sigma_2 = xy + yz + xz$. Generalize appropriately.

Two variable case

For even further simplicity, consider the two variable case: every symmetric polynomial of variables $x, y$ can be written in terms of the elementary symmetric polynomials $\sigma_0 = 1$, $\sigma_1 = x + y$, $\sigma_2 = xy$.
Consider some symmetric polynomial $p(x)$. Define an ordering on its monomials $x^ay^b > x^c y^d$ iff either $a + b > c + d$, or $a > c$, or $a = c \land b > d$. So we compare first by degree, then by $(a, b) > (c, d)$ lexicographically. Thus, call this order the lex order.
Define bigmon(p) to be the largest monomial in $p(x)$. Define bigcoeff(p) to be the coefficient of bigmon(p). Finally, define bigterm(p) = bigmon(p) . bigcoeff(p) as the leading term of the polynomial $p(x, y)$.
Prove an easy theorem that bigterm(pq) = bigterm(p)bigterm(q).
Now, suppose we have a leading monomial $x^5y^9$. Actually, this is incorrect! If we have a monomial $x^5y^9$, then we will also have a monomial $x^9y^5$, which is lex larger than $x^5y^9$. Thus, in any leading monomial, we will have the powers be in non-increasing (decreasing order).
OK, we have leading monomial $x^9 y ^5$. We wish to write this in terms of elementary symmetric polynomials. We could try and write this by using the leading term $x$ in $s_1$ and leading term $xy$ in $s_2$.
This means we need to solve $x^9y^5 = x^k (xy)^l$, or $9 = k + l$ and $5 = l$. This tells us that we should choose $l = 5$ and $k = 9 - 5 = 4$. If we do this, then our combination of symmetric polynomials will kill the leading term of $p(x)$. Any new terms we introduce will be smaller than the leading term, which we can write as elementary symmetric polynomials by induction!

Three variable case

Now consider three variables. Once again, suppose $p(x)$ has leading monomial $x^ay^bz^c$. We saw earlier that we must have $a \geq b \geq c$ for it to be a leading monomial.
Let's write it in terms of $s_1, s_2, s_3$. So we want to write it as product of $(x)^p$, $(xy)^q$, and $(xyz)^r$. Their product is $x^{p+q+r}y^{q+r}z^r$.
This gives us the system of equations $x^{p+q+r}y^{q+r}z^r = x^ay^bz^c$. This means (1) $r = c$, (2) $q+r = b$ or $q = b - c$, and (3) $p + q + r = a$ or $p = a - q - r = a - (b - c) - c = a - b$.

General situation

think of the monomial $x^ay^bz^c$ as a vector $[a, b, c]$. Then the leading terms of the symmetric polynomials correspond to $[1, 0, 0]$, $[1, 1, 0]$, and $[1, 1, 1$].
When we take powers of symmetric polynomials, we scale their exponent vector by that power. So for example, the leading term of $s_2^9$ is $x^9y^9$ which is $[9, 9] = 9[1, 1]$.
When we multiply symmetric polynomials, we add their exponent vectors. For example, the leading term of $s_1 s_2$ is $x(x+y) = x^2 y = [2, 1]$. This is equal to $[1, 0] + [1, 1]$
Thus, we wish to write the vector $[a, b, c]$ as a linear combination of vectors $[1, 0, 0]$, $[1, 1, 0]$, and $[1, 1, 1]$. This is solving the equation:

[a]   [1 1 1][p]
[b] = [0 1 1][q]
[c]   [0 0 1][r]

subject to the conditions that the unknowns $p, q, r, \geq 0$, given knowns $a, b, c$ such that $a \geq b \geq c$.
Let's check that the equation makes sense: $a = p + q + r$ as all of $[1, 0, 0]$, $[1, 1, 0]$ and $[1, 1, 1]$ have a $1$ at the $a$ position. Similarly for $b, c$.
Solve by the usual back-substitution.

DP over submasks

https://codeforces.com/contest/1554/problem/B
5e8 operations is 1
take every (mask, submask). The bits of the pair can be 1 1, 1 0, and 0 0. So the pairs of all mask, submask will be 3^n

for(int m = N; m >= 0; m--) {
  for(int s = m; s = (s- 1) & m; ) {
    // get all submasks.
  }
}

let s be a submask of m. [for arbitrary bits abc with the 1 shown being the rightmost 1, followed by all zeroes.]
Consider the largest submask of m smaller than s, call it t. We wish to show that t = (s-1)&m = abc011.
t is clearly a submask of m, since it is a submask of s.
We wish to show that t=(s-1)&m is the greatest submask of m smaller than s.
For contradiction, suppose s > c > t and c is a submask of m.
Let x be the index of the rightmost 1 in s (x marks the spot). So s is of the form s[n]s[n-1]...s[x]s[x-1]...s[0] where s[x]=1 and s[x-1]=s[x-2]=...=s[0]=0. So we can write s = s[n]s[n-1]...;x:1;00..0
Now t is of the form t = t[n]t[n-1]...;x:0;00..0.
Any number c < s must have for all indeces i > x such that s[i] = 0 implies c[i] = 0.
If c is such that for some index i > x where s[i]=1 we have c[i]=0, then such a number will be less that t since t[i]=s[i]=1.
Thus, for all indexes i > x we have c[i]=s[i].

Dual of Planar Euler graph is bipartite

Proof by contradiction

Euler graph is a graph with an euleian circuit, so we pass through every edge exactly once and return to the node we started from.
Alternatively, every node has even degree. Consider the cycle as starting at some vertex v. To pass through all edges adjacent to v must mean that every time we leave v on a previously unused v->_ we must return back to v via a unique edge _->v. This allows us to pair edges together uniquely, giving us an even number of edges at v.
This argument works at any generic v, since we can think of a cycle as starting from any vertex.
Thus, every vertex of G has even degree.
Consider the dual graph H := G*.
Suppose H is not bipartite, so H has an odd length cycle O.
Let K := H* be the dual of H. Consider the face of H that is bounded by the odd length cycle O, call it F(O). This face F(O) has an odd number of neighbours, one for each edge of O. So an number of edges in K connect F(O) to its neighbours. Thus, K has a vertex with an odd number of edges indicent on it.
However, K = H* = G** = G. This implies that G has a vertex with an odd number of neighbours, contradicting its eulerian nature.

Constructive proof

Consider a graph embedding. Since the graph is eulerian, we get a path/closed curve p: S^1 -> R^2 that traverses the graph along its euler tour.
If the closed curve p has self-intersections, remove them by gently "spreading" p.
This gives us two regions on the sphere, one inside the curve and one outside the curve (by jordan curve theorem).
Key takeaway: Euler graphs are graphs you can draw with a pen.

How this is different from hamiltonian circuit

Consider:

a----------b
|          |
f----------c
|          |
e----------d

The cycle a->b->c->d->e->f->a is hamiltonian.
There is no eulerian cycle since f has odd degree. So if we start from f, it is impossible to return to f.
So hamiltonian circuits do not correspond (at least in this way) to geometry.

Yoneda preserves limits

Let $J$ be small, $C$ locally small.
Let $F: J \to C$ be a diagram. Let $y : C \to [C^{op}, Set]$ be the contravariant yoneda defined by $y(c) \equiv Hom(-, c)$.
Consider $y(\lim F) : C^{op} \to Set$. Is this equal to $\lim (y \circ F : J \to [C^{op}, Set) : C^{op} \to Set$?
We know that limits in functor categories are computed pointwise. So let's start with $\lim (y \circ F) : C^{op} \to Set$. Let's Evaluate at some $e \in C^{op}$.
That gives us $(\lim (y \circ F))(e) = \lim (ev_e \circ y \circ F : J \to Set) : Set$.
Writing the above out, we get $\lim (ev_e \circ y \circ F) = \lim(\lambda j. (y(F(j))(e))$.
Plugging in the definition of $y$, we ge $\lim( \lambda j. Hom(-, F(j))(e))$.
Simplifying, we get $\lim (\lambda j. Hom(e, F(j))$.
We know from a previous theorem that $\lim Hom(e, F(-)) = = Hom(e, \lim F)$
Thus, we get $\lim(\lambda j. Hom(e, F(j)) = Hom(e, \lim F)$.
So we get $(\lim (y \circ F))(e) = Hom(e, \lim F)$.
In general, we get $\lim (y \circ F) = Hom(-, \lim F)$, which is the same as $y \circ \lim F$.
So we find that $\lim (y \circ F) = y \circ \lim F$, thereby proving that yoneda preserves limits.

Separable Polynomials and extensions

Separable polynomial

I didn't really study galois theory over char. $p$ all that well the first time I studied it, so let's review.
Let $J \subseteq K$ be an inclusion of fields, so $K$ is a field extension of $J$.
An irreducible polynomial $f \in J[x]$ is separable iff it has distinct roots in the algebraic closure of $J$, $\overline{J}$.
Said differently, the polynomial $f$ has no repeated roots in any extension of $J$.
Said differently, the polynomial $f$ has distinct roots in its splitting field over $J$. The roots as separable since we can separate all the roots from each other --- they are all distinct.
Said differently, the polynomial derivative $f'$ of $f$ is not the zero polynomial.

Proof that $p$ is not separable iff $p, p'$ share a root

Forward: $p$ is not separable implies $p, p'$ do not share a root.

Let $p$ have a repeated root $\alpha \in \overline K$. Thus $p(x) \equiv (x -\alpha)^2 g(x)$ in $\overline K$.
Computing $p'$ by product rule, we see that it is $p'(x) = 2(x- \alpha)g(x) + (x - \alpha)^2 g'(x)$ which can be written as $p'(x) = (x - \alpha)(2g(x)+ g'(x)$.
This shows that $p'(x)$ has $(x - \alpha)$ as a root.

Backward:$p, p'$ share a root implies $p$ is not separable

Let $\alpha \in \overline K$ be such that $p(\alpha) = p'(\alpha) = 0$.
Write $p(x) \equiv \prod_i (x - r_i)$ for roots $r_i \in \overline K$.
Let $\alpha = r_1$ [WLOG].
We know by product rule of calculus that $p'(x) \equiv \sum_i \prod_{j \neq i} (x - r_i)$.
Computing $p'(\alpha) = p'(r_1)$, only the first term survives, which is $\prod_{j \neq 1}(r_i - r_j)$ [all other terms have an $(x - r_1)$ term which vanishes.
For this to vanish, we must have some $j$ such that $r_i = r_j$.
This implies that $p$ has a repeated root $r_i, r_j$ and is thus not separable.

Proof that $p$ is separable iff $gcd(p, p') = 1$

Forward: $p$ is separable implies $gcd(p, p') = 1$

Let $d(x) = gcd(p, p')$.
Suppose that it is not a unit, and not a constant (ie, a real polynomial).
Then $d(x)$ has a root $\alpha \in \overline K$ (by previous)
Since $d$ divides $p, p'$, this means that in $\overline K$, $p(\alpha) = p'(\alpha) = 0$ since $d(\alpha) = 0$ and $d | p, p'$.
This implies that $\alpha$ is a repeated root of $p(x)$ since it vanishes at both $p$ and its derivative!
This contradicts $p$ is separable.
Thus $d(x)$ must be a unit, and $p$, $p'$ are relatively prime.

Backward: $gcd(p, p') = 1$ implies $p$ is separable

Since $gcd(p, p') = 1$, there are polynomials $k, l$ such that $pk + p'l = 1$.
Suppose $p$ is not separable. Thus it has a repeated root $\alpha$. This means that $p$ and $p'$ vanish at $\alpha$. Thus $(x - \alpha)$ divides $p, p$'.
Thus means that $(x - \alpha)$ divides $1$, by the eqn $pk + p'l = 1$. This is absurd, and thus $p$ is separable.

Separable extension

For all elements $\alpha \in L$ where $L/K$, there is a polynomial $f_\alpha \in K[x]$ such that $f_\alpha(\alpha) = 0$ is separable.
Thus, all elements have separable polynomials that they are roots of.

Separable extension is transitive

Claim: If $R/Q$ and $Q/P$ are separable field extensions, then $R/P$ is separable.
TODO!

All Polynomials over character 0 is separable

Let $L/K$ and $char(K) = 0$. Then we claim that $L$ is separable.
Let $f$ be an irreducible polynomial in $K[x]$. (the minimal polynomial of some element in $L$)
Recall that a polynomial $f \in K[x]$ is irreducible over a field $K$ iff it cannot be written as the product of two non-constant polynomials.
We wish to show that $f$ is separable (has no repeated roots).
For contradiction, suppose that $f$ has a repeated root $r$ in the algebraic closure. so $f(x) \equiv (x - r)^2 g(x)$ for $r \in \overline K[x]$, $g(x) \in \overline K[x]$.
Thus, $f(x)$ and $f'(x)$ share a common factor in $\overline K[x]$.
But the GCD algorithm works in $K[x]$, thus $f(x)$ and $f'(x)$ share a common factor in $K[x]$ [SID: I find this dubious!]
Hence, this means that $gcd(f, f') \in K[x]$ is not a constant polynomial.
If $gcd(f, f') \neq f$, then $f$ can be factored, which contradicts its irreducibility.
Thus, $gcd(f, f') = f$ [to prevent contradiction].
However, $f'$ has smaller degree than $f$. Thus the only way it can be divided by its GCD ($f$) which has larger degree than it is if $f'(x) = 0$.
This means (in characteristic zero) that $f(x)$ is linear, and thus cannot have repeated roots!
That this means that $f(x)$ is zero can be seen by computing te derivative. Suppose $f(x) \equiv \sum_{i=0}^n a_i x^i$ with $a_n \neq 0$. Then $f'(x) = \sum_{i=1}^n i a_i x^{i-1}$. Since $a_n \neq 0$, $n a_n \neq 0$, and thus the derivative of an $n$th degree polynomial is $(n-1)$ degree.

All Polynomials over character 0 is separable, alternative proof.

Let $f$ be an irreducible polynomial in $K[x]$ (the minimal polynomial of some element in $L$). We claim that $f$ is separable.
The key lemma is to show that if $g$ is ANY polynomial which shares a root $r$ with $f$, then $f | g$.
Idea: since $f(r) = g(r) = 0$, this means that $(x - r)$ divides $gcd(f, g)$. Thus, $gcd(f, g)$ is non-constant.
Further, $gcd(f, g) | f$ since the gcd divides both its factors.
But since $gcd(f, g)$ divides $f$ while $f$ is irreducible, we must have $gcd(f, g)$ equals $f$.
Since $gcd(f, g) = f$ divides $g$, we have $f | g$.
Now, going back to our claim, let $f$ be some irreducible in $K[x]$. Suppose for contradiction that $f$ is not separable. Then $f, f'$ share a common root. By the above lemma, this implies that $f$ divides $f'$. But this is absurd, since $deg(f) > deg(f')$.
Hence, no irreducible polynomial in $f$ can share a root with its derivative, which implies $f$ is always separable.
This breaks down for character $p$ since $f'$ can simply "die out".

All finite field extensions over character 0 is separable

Can write any finite field extension $L/K$ as $L = K(\alpha_1, \dots, \alpha_n)$. This is the same as $K(\alpha_1)(\alpha_2)\dots(\alpha_n)$.
Since separability of field extensions is transitive, and at each step, we add an element with separable minimal polynomial (all polynomials over char. 0 are separable), the full extension is separable.

All field extensions over character $p$ is separable

Consider $F_{p^m} \subseteq F_{p^n}$.
build the "Fermat's little theorem" polynomial $x^{p^n} - x = f(x)$.
All elements of $F_{p^n}$ satisfy this, thus $f(x)$ has $p^n$ roots, which means all of its roots are distinct.
Alternatively, see that $f'(x) = p^n x^{p^n - 1} - 1 = 0 - 1 = -1$ so $f(x)$ and $f'(x)$ don't share a commmon root.

Purely inseparable extensions

$K \subseteq L$ of char. p. Then the field $L$ is purely inseparable iff any $\alpha \in L$ is a root of $x^{p^n} - k$ for some $k \in K$, with $n \geq 1$.
Over the algebraic closure, this can be written as $x^{p^n} - (k^{1/p^n})^{p^n}$ which over finite fields factorizes as $(x - \sqrt{a}{p^n})^{p^n}$ by freshman's dream.
Thus, over the algebraic closure, this has many copies of one root, which is "as far as possible" from being separable.

Breaking down extension into separable + purely inseparable

Given any $K \subseteq L$ algebraic, can break it down into $K \subseteq K^{sep} \subseteq L$.
The extension $K^{sep}/K$ contains all elements of $L$ which have separable polynomials. Can show that this is a field.
We can show that $L/K^{sep}$ will be purely inseparable.

Example of inseparable extension

Let $L \equiv F_p(t)$, which are rational functions in $t$ over $F_p$, and $K \equiv F_p(t^p)$ which are rational functions in $t^p$.
Clearly, $K \subseteq L$, and the extension $L/K$ is of degree $p$, because $t \in L$ is a root of $X^p - t^p \in F_p(t^p) = K$.
See that $X^p - t^p$ is irreducible over $K = F_p(t^p)$.
Over $L = F_p(t)$, $X^p - t^p$ factorizes as $(X - t)^p$ by freshman's dream, with all roots the same.
Thus, we have an element whose minimal polynomial is not separable. Here, the function $g(Y) = Y^p - X \in K(X)$ has derivative zero, and is thus inseparable.
In some sense, the failure is because of freshman's dream, where $X^p - t^p \equiv (X - t)^p$.
Reference: Borcherds, separable extension

Primitive element theorem / Theorem of the primitive element

Let $J \subseteq K$ be a field extesion. We say $\alpha \in K$ is primitive for the extension $K/J$ (the extension) if $K = J(\alpha)$.
If $K/J$ is a finite separable extension, then for some $\alpha \in J$, we have $K = J(\alpha)$.
Recall that $J(\alpha) \simeq J[x]/minpoly(x)$.
TODO!

Tensor product of field extensions

Let $K$ be a finite separable extension of $J$ and $\Omega$ be an arbitrary extension of $J$. (usually, $\Omega$ is the p-adics, $J$ is $\mathbb Q$, $K$ is a number field).
Then, $K \otimes_J \Omega$ is a product of finite separable extensions of $\Omega$. So $K \otimes_J \Omega \equiv \prod_i \Omega_i$, where each $\Omega_i$ is a finite extension of $\Omega$.
If $\alpha$ is a primitive element for the extension $K$ (so $K = J(\alpha)$), then the image of $\alpha \otimes 1$ in $\Omega_i$ is a primitive element for $\Omega_i$ over $\Omega$.
If $f$ is the minimal poly. for $\alpha \in K$ over $J$ and $f_i$ is the minimal polynomial for $\alpha_i \in \Omega_i$ over $\Omega$ then $f(x) = \prod_i f_i(x)$.
Proof: Start with a primitive element $\alpha$ for $K/J$. Then $K \simeq_\phi J[x]/(f(x))$ for $f(x)$ the minimal polynomial of $\alpha$ over $J$. So $\phi$ witnesses this isomorphism.
Consider $K \otimes_J \Omega$. This is isomorphic to $(J[x]/(f(x))) \otimes \Omega$. Call the map that sends the LHS to the RHS as $\phi \otimes id$
We claim that the ring $(J[x]/(f(x))) \otimes_J \Omega$ is isomorphic to $\Omega[x]/(f(x))$. Intuition: tensoring by $J$ doesn't do anything useful, and we can re-interpret $f(x)$ as living in $\Omega(x)$. The isomorphism is $\psi((g(x) + K(x)f(x)) \otimes \omega) \equiv \omega \cdot g(x) + \Omega[x]f(x)$.
Now suppose $f$ factors as $\prod_i f_i$ over $\Omega[x]$. Since $\alpha$ is separable over $J$ and $\Omega$ is an extension of $J$, all the $f_i$ are distinct (otherwise it contradicts separability). Thus the family of ideals ${ (f_i) }$ is pairwise coprime.

Limits of a functor category are computed pointwise.

Reduction to discrete category

Let's take a functor category $[X, Y]$.
Take a diagram $D: J \to [X, Y]$. What is the limit $\lim D: [X, Y]$?
First, let's assume that $X$ has no arrows, or that we forget all the arrows of $X$ except the identity arrows. denote this forgotten/discrete category by $ob(X)$, whose objects are those of $X$, and morphisms are only identity morphisms.
We can define the diagram $ob(D): J \to [ob(X), Y]$. Can we compute $\lim ob(D)$?
A functor $ob(X) \to Y$ is the same as a tuple $Y^{ob(X)}$. See that $Y^{ob(X)}$ lives in CAT, since it is a category that is the $ob(X)$ copies of $Y$.

Formal proof by limits of product categories

Now, the limit $ob(D)$ can be interpreted as a limit of $ob(D): J \to Y \times Y \times \cdots \times Y$.
By the universal property of the product, limits over product categories can be computed pointwise. So if we have a diagram $E: K \to X \times Y$, then $l \equiv \lim E$ can be calculated by calculating $l_x \equiv \lim (\pi_1 \circ E : K \to X)$, then $l_y \equiv \lim (\pi_2 \circ E : K \to Y)$, and then setting $l \equiv (l_x, l_y) \in X \times Y$.
Thus, we split the morphism $ob(D): J \to Y \times Y \times \cdots \times Y$ into the individial tuple components, which correspond to the images of $x \in ob(X)$ under $D$, and we compute their limits. So we can compute this pointwise.

Draw the right diagram.

Suppose we had J = (f -a-> h <-b- g), and we had ob(X) = (p q). We only have objects, no morphisms.
Now, what is a diagram ob(D): J -> [ob(X), Y]? For each of f, g, h in J, we must get a functor from ob(X) to Y.
Denote F = ob(D)(f), G = ob(D)(g), and H = ob(D)(h). Each of F, G, H are functors ob(X) -> Y.
I'll write the functors by identifying them by their image. The image of F is going to be [Fp Fq] with no interesting morphisms between Fp and Fq.
Now, that we've considered the action of ob(D) on objects of J, what about the arrows?
The images of the arrows f -a-> h and h <-b- g are natural transformations from F to H and G to H respectively. Denote these by F =α>= H and H <=β=G. So we have ob(D)(a) = α, ob(D)(b) = β.
In total, the image of ob(D) in [ob(X), Y] looks like this:

F =α=> H <=β= G

If we expand out the functors by identifying them with the image, and write the natural transformations in terms of components, it looks like so:

[Fp     Fq]
 |       |
 αp     αq
 v       v
[Hp      Hq]
 ^       ^
 βp      βq
 |       |
[Gp      Gq]

Really, the diagram consists of two parts which don't interact: the part about p and the part about q. So computing limits should be possible separately!

This extends to `[X, Y]`

We now believe that given $D: J \to [X, Y]$, we know that we can compute $ob(D): J \to [ob(X), Y]$ pointwise.
Formally, we define $\lim ob(D)$ to be equal to $\lim (ev_x \circ D : J \to Y)$.
We define the action of $\lim D$ (which is a functor from $X$ to $Y$) on objects of $X$ to be equal to the action of $\lim ob(D)$ on objects of $X$, which is given by the above equation.
So what about the action of $\lim D$ on the morphisms of $X$? it's a functor from $X$ to $Y$, so it should send morphisms to morphisms!
Now, let's suppose we have a morphism $x \xrightarrow{a} x'$ in $X$. How do we compute the the action of $D$ on the morphism $a$?
Well, first off, what's $D(a)$ a morphism between? It must be between $D(x)$ and $D(x')$.
What is $D(x)$? We know that $D(x) \equiv \lim (ev_x \circ D: J \to Y)$. Similarly, we know that $D(x') \equiv \lim ev_x' \circ D: J \to Y)$.

`a + b = (a or b) + (a and b)`

0 + 0 = or(0, 0) + and(0, 0)
0 + 1 = or(0, 1) + and(0, 1)
1 + 1 = or(1, 1) + and(1, 1)
Extend by linearity?

Intuition for why choosing closed-closed intervals of `[1..n]` is $(n+1)C2$

$nC2$ counts all intervals ${ [i, j]: i > j }$.
To count intervals $[i, i]$, there are $n$ of them, so it's $nC2 + n$ which is $n(n-1)/2 + n $, which is $n(n+1)/2$ or $(n+1)C2$.
Combinatorially, add a "special point *" to [1..n]. If we pick a pair (i, *) from the $(n+1)C2$, take this to mean that we are picking the interval [i, i].

Thoughtful discussion on the limits of safe spaces

(2) You cannot make all valuable, positive, motivated people feel safe. It's really sad, but there are fundamental incompatibilities in the kind of safety that different people need (even before we get to what makes them productive--you can't even make everyone feel comfortable!). I think this discussion has demonstrated amazing attempts by people at understanding and incorporating different perspectives, but at the end of it all, some people are going to have to be triaged out, or will have to accept some lack of safety. Two examples: (a) people with low self-esteem tend to find confrontational environments unsafe emotionally, but many neurodivergent people tend to find environments that require high social awareness unsafe emotionally. You can ameliorate this contradiction somewhat with careful guidelines, but fundamentally the problem cannot be solved: the neurodivergent simply cannot do what the emotionally fragile require of them, so one or the other or both is going to have a bad time. There is nothing wicked about either of these people! But they're not compatible. (b) people of a category that has faced systematic discrimination often do not feel safe with "free speech" that is allowed to get anywhere near sounding like discrimination against them (for very good reason!), but people who have exposure to thought-policing with severe consequences for disobedience often do not feel safe with anything less than very broad construal of "free speech". This one's even harder, because both sides can have really deep emotionally salient reasons for their perspective, and yet they are incompatible. There is nothing wicked about either of these people! But different types of wickedness have been done to them or are reasonably feared by them, rendering them incompatible with each other.

you are not literally able to make a community welcoming to everyone who, one-on-one, you would consider a good person. Sometimes you can get a few extra valuable people by special-casing things. (E.g. a reasonable response to "I don't understand respect" might be "we are still going to call it respect, but we will maintain an additional note approximating what 'act with respect' means in terms of other concepts that might be easier to actualize for some people".)

I agree that it is not possible to resolve fundamental incompatibilities through policy. However, it often is resolvable through mediation, a third party who can deal with the needs of both sides and is willing and able to translate, clarify, provide private feedback, and otherwise help smooth over the situation.

Link of NixOS RFC

Semidirect product: Panning and Zooming

I think I finally have an example of a semidirect product that I understand well enough I'd dare to teach a friend.
Take the real line. We can move points on it by adding them (panning). Viewed differently, we can pan the real line left and right, by the action of the real line on itself. This is a group $P \simeq (\mathbb R, 0, +)$ ($P$ for pan). I'll draw the line as follows:

  ^
  |
  |
  0
  |
  |
  v

Next, we can zoom the real line by multiplication: So given a number, I can scale the entire real line by this number. This group of zoom operations is $Z \simeq (\mathbb R, \times 1)$.I'll show this by stacking copies of the real line next to each other:

    ^
    |       ^
    |       |        ^
Z---[z=1]---[z=1/2]--[z=1/4]----...
    |       |        V
    |       V        P
    v       P
    P

So we show the group Z on the horizontal axis, which zooms the real line. We "attach" a copy of P to each element z of Z, appropriately scaled.
How should I write the pan-and-zoom operation as a single unit? I'll denote by (z, p) the operation of panning by p and then zooming by z. Why not the other order? Well, if I zoom first by z and then pan by p, the pan p gets "disturbed" by the zoom, since the pan would like to talk about the initial state of the world, but we now need to pan with respect to the world after zooming. So we prefer the order where we can pan first (with no zoom interfering with our affairs), and then zoom.
How do these combine? If we have (1, p) . (1, p') we get (1, p + p') since combining pans at zoom level 1x is like us not having zooming. Similarly, combining (z, 0) . (z', 0) is (zz', 0), since zooming by z with no pan followed by z' is the same as zooming in one shot by zz'.
What about (z, p). (z', p')? What does it mean? It means we should (a) pan by p, (b) zoom z, (c) pan by p', (d) zoom z'. See that the total zoom will be zz' at the end of this operation. What about the total pan? the second pan by p' happens after we have already zoomed by z. So relative to no zoom, this is a pan by zp'. So in total, we can replace by an operation which (1) pans by p + zp', and then (2) zooms by zz'. So we have that (z, p).(z', p') = (zz', zp + p'). This is a semidirect product.
If we stare at the picture above, we see that we have many copies of p, one for each z. So the full group is like Z x P.
It's hopefully clear that if we "squish" the Ps, (ie, quotient by P) down towards the Z, we'll still have a fully functioning Z group.
On the other hand, if we attempt to "squish" the Zs(ie, quotient by Z) down towards a single P, we'll be left with incompatible copies of P, each at different scales! This tells us that we can quotient by P (so P is normal), but not by Z (so Z is not normal).
So, this is sort of like a vector bundle P -> Z |x P -> Z where the fibers are P and the base space is Z. We can remove the fibers to recover the base space. You can't delete the base space, since there's no way to make the fibers "compatible".

Longest Convex Subsequence DP

This was an enlightening problem to solve due to the presence of many degenerate cases.
The question: given an array xs[], find the length of the longest subsequence ys[] such that for all indexes 0 <= l < m < r < |ys|, we have that 2*ys[m] < ys[l] + ys[r].
Key DP idea: dp[maxm][maxr] is the length of the longest convex subsequence with penultimate index <= maxm and final index <= maxr.
The bare recurrence (without thinking about base cases) can be implemented as follows:

int f(vector<int> &xs) {
    const int n = xs.size();
    vector<vector<int>> dp(n, vector<int>(n, 0));

    for (int r = 0; r < n; ++r) {
        for (int m = 0; m < r; ++m) {
            for (int l = 0; l < m; ++l) {
                if (2 * xs[m] < xs[l] + xs[r]) {
                    dp[m][r] = max<int>(dp[m][r], 1 + dp[l][m]);
                }
            }
        }
    }

    return ???;
}

The "problem" is to deal with the degenrate cases where the array has only length 0, length 1, or length 2 when the DP conditions don't kick in. How does one implement these neatly?
The insight is to see that at each location in the program, we have a lower bound on the best DP value we can achieve. For example, at the beginning, we know that best >= 0. When we enter into the loop of r, we know that we have at least one element, so best >= 1, and so on. If we insert these lower bounds systematically into the code, we arrive at:

int f(vector<int> &xs) {
    const int n = xs.size();
    vector<vector<int>> dp(n, vector<int>(n, 0));
    int best = 0;

    for (int r = 0; r < n; ++r) {
        ATLEAST1: best = max<int>(best, 1);
        for (int m = 0; m < r; ++m) {
            ATLEAST2: best = max<int>(best, 2);
            dp[m][r] = 2;
            for (int l = 0; l < m; ++l) {
                if (2 * xs[m] < xs[l] + xs[r]) {
                    dp[m][r] = max<int>(dp[m][r], 1 + dp[l][m]);
                }
                best = max<int>(best, dp[m][r]);
            }
        }
    }

    return best;
}

We see that whenever we "learn" more information, we increase best, possibly without being able to even initialize dp[.][.]! This happens for example at ATLEAST1:, where we have one element so we know that best >= 1, but we don't have two elements to initialize the DP array.

Representation theory of $SU(2)$ [TODO]

2x2 unitary matrices, so $AA^\dagger = I$.
Lie algebra is $su(2)$, which are of the form $A^\dagger = -A$, and $Tr(A) = 0$.
We write $M_v \equiv \begin{bmatrix} ix & y + iz \ -y + iz & -ix \end{bmatrix}$.
The group elements are matrices, so this is the standard representation, which goes from $SU(2)$ to $GL(2, \mathbb C)$. Turns out this is irreducible, 2D complex representation.
We have a transformation which for a $g \in SU(2)$ creates a map which sends a matrix $M_v$ to $g M_v g^{-1}$. so the representation is $g \mapsto \lambda M_v. g M_v g^{-1}$, which has type signature $SU(2) \to GL(su(2))$. This is a 3D, real representation: the vectors $M_v$ have 3 degrees of freedom.
We like complex representations, so we're going to build $SU(2) \to GL(su(2) \otimes \mathbb C)$.
There is the trivial representation $\lambda g. (1)$.
There is a zero dimensional representation $\lambda g. ()$ which maps $\star \in \mathbb C^0$ to $\star$. So it's the identity transformation on $\mathbb C^0$.

Theorem

For any integer $n $ there is an irrep $R_n: SU(2) \to GL(n, \mathbb C)$. Also, any irrep $R: SU(2) \to GL(v)$ is isomorphic to one of these.

New representations from old

If we have $R: G \to GL(V)$ and $S: G \to GL(W)$, what are new representations?
For one, we can build the direct sum $R \oplus S$. But this is useless, since we don't get irreps.
We shall choose to take tensor product of representations.
Symmetric power of $R: G \to GL(V)$ is $R^{\otimes n}: G \to GL(V^{\otimes n})$. This is not irreducible because it contains a subrep of symmetric tensors .
Example, in $C^2 \otimes C^2$, we can consider $e1 \otimes e1$, $e2 \otimes e2$, and $e1 \otimes e2 + e2 \otimes e1$.
Define $Av$ (for averaging) of $v_1 \otimes v_2 \dots v_n$ to be $1/n! \sum_{\sigma \in S_n} v_{\sigma(1)} \otimes v_{\sigma(2)} \dots v_{\sigma(n)}$. In other words, it symmetrizes an input tensor.
Define $Sym^n (V) = Im(Av: V^{\otimes n} \to V^{\otimes n})$. We claim that $Sym^n(V)$ is a suprep of $V$. We do this by first showing that $Av$ is a morphism of representations, and then by showing that the image of a morphism is a sub-representation.

Weight space decomposition

$SU(2)$ contains a subgroup isomorphic to $U(1)$. Call this subgroup $T$, which is of the form $\begin{bmatrix} e^{i \theta} & 0 \ 0 & e^{-i \theta} \end{bmatrix}$.
Reference

Why quaternions work better

We want to manipuate $SO(3)$. Imagine it like $SO(1)$.
Unfortunately, $\pi_1(SO(3)) = \mathbb Z/2\mathbb Z$. This is a pain, much like rotations of a circle need to be concatenated with modulo, which is a pain.
idea for why $\pi_1(SO(3))$ is $\mathbb Z/2\mathbb Z$: $SO(3)$ is sphere with antipodal points identified. So a path from the north pole to the south pole on the sphere is a "loop" in $SO(3)$. Concatenate this loop with itself (make another trip from the south pole to the north pole) to get a full loop around the sphere, which can be shrunk into nothing as $\pi_1(S^2)$ is trivial. So $ns^2 = e$, where $ns$ is the north-south path in $S^2$ which is a loop in $SO(3)$).
Key idea: deloop the space! How? find univesal cover. Lucikly, universal cover of $SO(3)$ is $SU(2)$ / quaternions, just as universal cover of $SO(1)$ is $\mathbb R$.
Universal cover also explains why $SU(2)$ is a double cover. Since $\pi_1(SO(3))$ is $\mathbb Z/2Z$, we need to deloop "once" to get the delooped space.
No more redundancy now! Just store a bloch sphere representation, or a quaternion (store $SU(2)$). Just like we can just store a real number for angle and add it.
How to go back to $SO(3)$ or $SO(1)$? Move down the universal cover map $SU(2) \to SO(3)$ or $\mathbb R \to \mathbb SO(1)$.
This is strange though. Why is $\mathbb R$ both the lie algebra and the covering space of $SO(1)$ ? What about in general?
In general, the original lie group $SO(3)$ and the universal cover $SU(2)$ both have the same lie algebra. It is only that the lie group has less or more fundamental group.

DFA to CFG via colimits?

Can convert a CFG to DFA by keeping an arbitrary limit on the depth of the stack, counting how many elements are in the stack, and going to a failure state when we exceed the depth.
If we do so, can get a DFA for each natural number --- this is the max stack depth we keep track of.
Can we now define a colimit of these DFAs? does this recover the CFG?
If so, what is the correct category? And is the colimit completion of DFA correspond to DPDA/CFG?

Why pointless topology is powerful

Key idea of pointless topology: topology manipulates open sets and their lattice. Forget the set, simply manipulate lattices!
When can a lattice be written in terms of sets?
Birkhoff representation theorem: Lattice is distributive iff isomorphic to a lattice of subsets of join-irreducible elements.
Hence, if we take non distributive lattices, we have geometry (locale) which has no incarnation as subsets!
Yay, extra power.

Denotational semantics in a few sentences

We want to find a math object that reflects lambda calculus
Such an object must contain its own space of functions; $L \simeq [L \to L]$.
This is impossible for cardinality constraints.
Key idea: restrict to continuous functions! $L \simeq [L \xrightarrow{\texttt{cont}} L]$.
Solutions exist! Eg. space of continuous $[\mathbb N \to \mathbb N]$ with appropriate topology is like space of "eventually stabilizing sequences", which is equinumerous to $\mathbb N$, since sequences that eventually become stable have information $\cup_{i=0}^\infty \mathbb N^i$. This has the same cardinality as $\mathbb N$.
For continuity in general, we need a topology.
OK, now that we know this is what we need, how do we exhibit a space $L \simeq [L \to L]$? One invokes the hammer of domain theory
Now that we have the space $L$, what's the right topology on it? That's worth a turing award! The Scott topology

Monge Matrix

Suppose we two line segmentsAB, CD:

A   C
@   @
@   @
@   @
B   D

What is the relationship between the lengths |AC| + |BD| (straight lines) versus |AC| + |BD| (diagonals)?
Draw the diagonal, label the point of intersection of diagonals as I.

A---C
@\ /@
@ I @
@/ \@
B---D

By triangle inequality, we have that AI + IC > AC and BI + ID > BD. Adding these up, we get (AI + IC) + (BI + ID) > AC + BD.
Rearranging we get (AI + ID) + (BI + IC) > AC + BD, which is equal to AD + BC > AC + BD.
So, the sum of lengths between opposite points is greater than sum of lengths between non-opposite points.
If we think of this as a matrix dist[A/B][C/D], we have that dist[a][d] + dist[b][c] > dist[a][c] + dist[b][d].
If we replace A=0, B=1, C=0, D=1 (since those are the indexes of the points on the two line segments), we get dist[0][1] + dist[1][0] > dist[0][0] + dist[1][1]
If we generalize to sets of points on a line, let's have the indexes i, j. Then the condition would read dist[i][j] + dist[j][i] > dist[i][i] + dist[j][j].
A matrix dist[.][.] which obeys this condition is said to be a Monge Matrix.

Theorem: Monge matrices are totally monotone

Theorem: totally monotone matrices have ascending row minima

1D 1D DP [TODO]

https://robert1003.github.io/2020/02/29/dp-opt-knuth.html
Suppose we have a dp $dp[r] = \min_{0 \leq l \leq r} f(l, r)$. That is, we need to find row minima for each row $r$ in a 2D matrix.
Now assume that $f(l, r)$ has ascending row minima

Fixpoint as decorator

#!/usr/bin/env python3
class Thunk:
    def __init__(self, func, *args):
        self.func = func
        self.args = args
    def force(self):
        return self.func(*self.args)

def fix(f):
    return f(Thunk(fix, f))

@fix
def fact(f):
    def fact_n(n):
        if n == 0: return 1
        else: return n * (f.force())(n-1)
    return fact_n

print(fact(5))

Combinatorial generation algorithms

Perform DP on measures, not indexes.

In the problem of longest common subsequence (or any string problem in general), we should conceptually think of the DP state as the length. This gives us a natural base case (length = 0), as well as makes it much clearer to implement. Compare (1) LCS using indexes as DP state:

int lcs_len(const vector<int> &xs, const vector<int> &ys) {
    // dp[i][j]: LCS between xs[0:i] and ys[0:j] [closed-closed].
    vector<vector<int>> dp(xs.size(), vector<int>(ys.size(), 0));
    int best = 0;
    for(int i = 0; i < xs.size(); ++i) {
        for(int j = 0; j < ys.size(); ++j) {
            if (i > 0 && j > 0) { dp[i][j] = max(dp[i][j], dp[i-1][j-1]); }
            if (i > 0) { dp[i][j] = max(dp[i][j], dp[i-1][j]); }
            if (j > 0) { dp[i][j] = max(dp[i][j], dp[i][j-1]); }
            if (xs[i] == ys[j]) {
                const int prev = i > 0 && j > 0 ? dp[i-1][j-1] : 0;
                dp[i][j] = max(dp[i][j], 1 + prev);
            }
        }
    }
    return dp[xs.size()-1][ys.size()-1];
}

Versus using length as DP state:

int lcs_len(const vector<int> &xs, const vector<int> &ys) {
    // dp[lx][ly]: LCS between xs[0:lx) and ys[0:ly) [closed-open].
    // lx, ly for ``length of xs, length of ys''
    vector<vector<int>> dp(1+xs.size(), vector<int>(1+ys.size(), 0));
    int best = 0;
    for(int lx = 1; lx <= xs.size(); ++lx) {
        for(int ly = 1; ly <= ys.size(); ++ly) {
            dp[lx][ly] = max(dp[lx-1][ly-1], dp[lx][ly-1], dp[lx-1][ly]);
            if (xs[lx-1] == ys[ly-1]) {
                dp[lx][ly] = max(dp[lx][ly], 1 + dp[lx-1][ly-1]);
            }
        }
    }
    return dp[xs.size()][ys.size()];
}

Length is more natural, because length=0 actually corresponds to a degenerate case.
In general, whenever performing DP, you will likely always need extra states for the degenerate case. It's a good thing to have this, since it simplifies a TON of the implementation work!

Alternative version of Myhill-Nerode

In one version of myhill-nerode I know, the states correspond to equivalence classes of strings under the equivalence relation $x \sim y$ iff forall strings $s$, $x + s \in L \iff y + s \in L$.
In another version (V2), we define the right context of a string $w$ to be the set of all suffixes $s$ such that $w + s \in L$. That is, $R(w) \equiv { s \in A^* : w + s \in L }$.
This induces an equivalence relation where $x \sim y$ iff $R(x) = R(y)$.
In this version (V2), the states are the right contexts of all strings in the language.
The transitions are given by concatenating strings in the set with the new character.
The initial string corresponds to the right context of the empty word.
The accepting states are those which correspond to right contexts of words in the language.
This version is much more explicit for computational purposes! We can use it to think about what the automata looks like for small languages, in particular for the suffix automata.

Polya Enumeration

Let $X$ be a set of objects with the action of a group $G$. For example, $X$ is the configurations of a square, represented as 4-tuples by reading the vertices offin clockwise order, and let $G$ be the group of symmetries of a square.
Let $C$ be the set of colorings $c_1, c_2, \dots c_n$.
Let the objects of $X$ be colored . That is, we have functions $f: X \to C$ which assigns a color $C$ to each element of $X$. Let $Y$ be the set of colorings $X \to C$.
We extend the action of the group to the colorings, given by $g (f) \equiv \lambda x. g^{-1}(x)$. This makes the action of the group consistent.
What's this funny inverse? Well, the idea is this. We really have that $(gf)(gx) \equiv g(f(x))$, as the group must act in an invariant way on the function space and the domain space. So to definethe expression of $(gf)(x')$, we think of it as $(gf)(x') = (gf)(g(g^{-1}x')) = f(g^{-1}x')$.
Define the weight of a coloring $h \in Y$, (ie $h: X \to C$) to be the monomial given by the product of colors; $wt(h) \equiv \prod_{x \in X} h(x)$. The weight $wt(h)$ is a monomial in the commutative ring $R[c_1, c_2, \dots, c_n]$.
Statement of Polyma Enumeration: The weight enumerator for the action $G$ on $Y \equiv (X \to C)$ is equal to the cycle index polynomial $Z(G, X)$ with $z_i$ replaced by the power sum symmetric polynomial $P[p](\vec c) \equiv c_1^p + c_2^p + \dots + c_n^p$.

Proof via Weighted Burnside Lemma

Let $g \in G$. By weighted version of Burnside Lemma, we have that

$$ \sum_{O \in Orb(Y)} wt(O) \equiv 1/|G| \sum_{g \in G} \sum_{y \in Fix(g)} wt(y) $$

Suppose that $g \in G$ have a cycle monomial $z_1^{k_1} \dots z_m^{k_m}$. That is, we kount such that $g$ has $k_1$ cycles of length $1$, $k_2$ cycles of length $2$, and so on upto $k_m$ cycles of length $m$. So $g$ has $k_i$ cycles of length $i$.
We want to know which colorings $y \in Y$ are fixed by $g$.
Suppose a coloring $y: X \to C$ is fixed by $g$, so $g(y) = y$. Since $g$ pushes things around in its cycles, for each cycle in $g$, we must use a constant color.
Said differently, all elements in the same cycle of $g$ have the same color.
Thus for each cycle of length $l$, We must color all of the $l$ elements (of $X$) with the same color.
We can color distinct cycles independent of each other.
Thus the weight of a cycle of length $1$ is given by $(c_1 + c_2 + \dots + c_n)$, since we can color the single element with either $c_1$ or $c_2$ and so on upto $c_n$.
The weight of a cycle of length $2$ is given by $(c_1^2 + c_2^2 + \dots + c_n^2)$, since we can color the two elements in the cycle with either $c_1$ or $c_2$ and so on upto $c_i$.
The weight of a cycle of length $l$ is given by $(c_1^l + c_2^l + \dots + c_n^l)$ since we can color the $l$ elements in the cycle.
Since we the element $g$ has $k_1$ cycles of length $1$, the weight of all cycles of length $1$ is $(c_1 + c_2 + dots + c_n)^k_1$.
Since we the element $g$ has $k_2$ cycles of length $2$, the weight of all cycles of length $2$ is $(c_1^2 + c_2^3 + dots + c_n^2)^k_2$.
Since we the element $g$ has $k_l$ cycles of length $l$, the weight of all cycles of length $l$ is $(c_1^l + c_2^l + dots + c_n^l)^k_l$.
Thus, the total weight of $g$ is given by the polynomial $cyc(g)(p_1(\vec c), p_2(\vec c), \dots, p_l(\vec c))$.

Example: Weight enumerator for square with $D_4$ actions.

$G \equiv D_4$
$X$ is the configurations of a square.
$C$ are the colors $r, g, b$.
$Y$ is the set of colorings of $X$ by $C$.

Weighted Burnside Lemma

I'm learning the weighted burnside lemma as a preamble to polya enumeration.
Define for a set $X$ with a group action $G$, a weight function on the orbits $O$. Said differently, we have a weight function $w: X \to W$ such that $w(x) = w(g(x))$ for all $x \in X$ and $g \in G$.
We wish to count the orbits of $X$ weighted by the weight function $w: X \to W$ (where $W$ is a commutative ring). So we wish to find $\sum_{o \in Orb(X)} w(o)$.
Recall that Burnside tells us that:

$$ |X/G| = \sum_{g \in G} |Fix(g)| $$

We replace cardinality with weight, giving us the statement:

$$ \begin{aligned} &w(X/G) = 1/|G| (\sum_{g \in G} w(Fix(g))) \ &=\sum_{[o] \in X/G} w(o) = 1/|G| (\sum_{g \in G} \sum_{x \in Fix(g)} w(x) ) \end{aligned} $$

In english, this reads: for each orbit in $X/G$, pick an equivalence class representative $o$. The sum of weights of the representatives equals the average over $G$ of the fixpoint-weights.

Proof

We begin by considering the LHS:
$y = \sum_{g \in G} \sum_{x \in Fix(g)} w(x)$.
We switch the order of summation to get $y = \sum_{x \in X} \sum_{g \in G} [gx = x] w(x)$ where $[gx = x]$ is the Iverson bracket --- it evaluates to 1 if $gx = x$ and $0$ if $gx \neq x$.
We pull the constant $w(x)$ out to get $y = \sum_{x \in X} w(x) (\sum_{g \in G} [gx = x])$.
We see that $\sum_{g \in G} [gx = x]$ is the cardinality of the stabilizer of $x$, written as $|Stab(G, x)|$. So we write this as $y = \sum_{x \in X} |Stab(G, x)| w(x)$.
By orbit stabilizer, we use $|Stab(G, x)| \cdot |Orb(G, x)| = |G|$. Thus, we get $y = |G| \sum_{x \in X} w(x) / |Orb(G, x)|$.
Since the set of orbits partitions $X$, we write the above as
$y = |G| \sum_{[o] \in G/X} \sum_{x \in [o]} w(x)/|Orb(G, x)|$.
Since $[o]$ is the orbit of $x$, we replace $Orb(G, x)$ with $o$, giving $y = |G| \sum_{[o] \in G/X} \sum_{x \in [o]} w(x)/|[o]|$.
Since the weight is constant on orbits, we replace $w(x)$ by $w(o)$ giving $y = |G| \sum_{[o] \in G/X} \sum_{x \in [o]} w(o)/|[o]|$.
We pull the inner terms out giving $y = |G| \sum_{[o] \in X/G} w(o)/|[o]| \sum_{x \in [o]} 1$.
Since $\sum_{x \in [o]} 1 = |o|$, we get $|G| \sum_{[o] \in X/G} w(o)/|[o]| |[o]|$ which simplies to $y = |G| \sum_{[o] \in X/G} w(o)$.
We are done, since we have shown that $\sum_{g \in G} \sum_{x \in Fix(g)} w(x) = |G| \sum_{[o] \in X/G} w(o)$.
The full derivation is:

$$ \begin{aligned} &y = \sum_{g \in G} \sum_{x \in Fix(g)} w(x) \ &= \sum_{g \in G} \sum_{x \in X} [gx = x] w(x) \ &= \sum_{x \in X} \sum_{g \in G} [gx = x] w(x) \ &= \sum_{x \in X} w(x) \sum_{g \in G} [gx = x] \ &= \sum_{x \in X} w(x) Stab(x) \ &= \sum_{x \in X} w(x) |G|/|Orb(G, x) \ &=|G| \sum_{x \in X} w(x)/|Orb(G, x) \ &=|G| \sum_{[o] \in X/G} \sum_{x \in O} w(x) / |Orb(G, x)| \ &=|G| \sum_{[o] \in X/G} \sum_{x \in O} w(o) / |Orb(G, x)| \ &=|G| \sum_{[o] \in X/G} \sum_{x \in O} w(o) / |o| \ &=|G| \sum_{[o] \in X/G} w(o) / |o| \sum_{x \in O} 1 \ &=|G| \sum_{[o] \in X/G} w(o) / |o| \cdot |o| \ &=|G| \sum_{[o] \in X/G} w(o)\ \end{aligned} $$

Example, Unweighted

Suppose we squares acted on by rotations $e, r, r^2, r^3$. It takes the square:

a b
c d

to the squares:

e     r     r^2   r^3
----|-----|-----|-----
1 2 | 4 1 | 4 2 | 4 3
3 4 | 3 2 | 3 1 | 1 2

Cycle index polynomial

If $\sigma \in S_n$ let $cyc(\sigma)$ be the integer partition of $n$ giving cycle lengths.
For example, if $\sigma = (1 2)(3)(4 5 6)(7 8)$, then $cyc(\sigma) = 3 + 2 + 2 + 1 \sim (3, 2, 2, 1)$.
Recall that an integer partition $\lambda$ of $n$ is a tuple $\lambda[:]$ such that $\sum_i \lambda[i] = n$ and $\lambda$ is non-increasing.
The cycle index polynomial for a group $G \subseteq S_n$ is $Z(G) \equiv 1/|G| \sum_{g \in G} P[cyc(g)]$ where $P[\cdot]$ is the power sum symmetric polynomial for the cycle-type partition $cyc(g)$.
Recall the definition of power sum symmetric polynomial. First, for a natural number $k \in \mathbb N$, we define $P[k](\vec x) \equiv x_1^k + x_2^k + \dots + x_n^k$.
Next, for a partition $\lambda$, we define $P[\lambda](\vec x)$ to be the product over the parts of the partition: $P[\lambda_1](\vec x) \cdot P[\lambda_2](\vec x) \cdot \dots \cdot P[\lambda_l](\vec x)$.

Cycle index polynomial of dihedral group

For example, consider the dihedral group $D_4$ acting on a square with vertices $a, b, c, d$:
More formally, the dihedral group $D_4$ acts on the set of labelled squares $X$.

a b
d c

1. For the identity $e$, the cycle is $(a)(b)(c)(d)$. The cycle partition is $(1, 1, 1, 1)$.
1. For the rotation $r$, the cycle is $(a b c d)$. The cycle partition is $(4)$.
1. For the rotation $r^2$, the cycle is $(a c)(b d)$. The cycle partition is $(2, 2)$.
1. For the rotation $r^3$, the cycle is $(a c b d)$. The partition is $(4)$.
1. For the horizontal swap $h$, the cycle is $(a d)(b c)$. The cycle partition is $(2, 2)$.
1. For the vertical swap $v$, the cycle is $(a b)(c d)$. The cycle partition is $(2, 2)$.
1. For the diagonal a-c swap $ac$, the cycle is $(b d)(a)(c)$. The cycle partition is $(2, 1, 1)$.
1. For the diagonal b-d swap $bd$, the cycle is $(a b)(c)(d)$. The cycle partitionis $(2, 1, 1)$.
The cycle index polynomial is $Z(D_4, X) \equiv 1/|D_4|(P[(1, 1, 1, 1)] + 2P[2, 1, 1] + 3P[2, 2] + P[4])$.
$P[1, 1, 1, 1] = p_1^4 = (x_1 + x_2 + x_3 + \dots + x_n)^4$, where $p_1$ is a power sum symmetric polynomial.
$P[2, 1, 1] = p_2 p_1 p_1 = (x_1^2 + x_2^2 + \dots+ x_n^2)\cdot (x_1^1 + x_2^1 + \dots + x+n)^2$.
...and so on.

Mnemonics For Symmetric Polynomials

Some notation for partitions

Consider a partition $\lambda \equiv (\lambda_1, \lambda_2, \dots \lambda_l)$ of a partition of $N$.
The $L_0$ norm of the partition will be $1 + 1 + \dots 1$ ($l$ times), which is equal to $N$. Thus, $|\lambda|_0 = l$.
So the $L_0$ norm of a partition is the number of parts of the partition.
The $L_1$ norm of the partition will be $|\lambda_1| + |\lambda_2| + \dots + |\lambda_l|$ which equals $N$.
So the $L_1$ norm of a partition is the number it is partitoining. Thus, $|\lambda|_1 = N$.

Elementary Symmetric Polynomials (integer)

We need to define $e_k(\vec r)$ for $k \in \mathbb N$, $r \in X^d$ a sequence of variables ($r$ for "roots").
These were elementary for Newton/Galois, and so has to do with the structure of roots.
The value of $e_k(\vec r)$ is the coefficients of the "root polynomial" $(x - \vec r)$, that is:

$$ \begin{aligned} &(x+r_1)(x+r_2)(x+r_3) = 1x^3 + (r_1 + r_2 + r_3) x^2 + (r_1r_2 + r_2r_3 + r_1r_3) x + r_1r_2r_3 \cdot x^0 \ &e_0 = 1 \ &e_1 = r_1 + r_2 + r_3 \ &e_2 = r_1 r_2 + r_2 r_3 + r_1 r_3 \ &e_3 = r_1 r_2 r_3 \ \end{aligned} $$

Formally, we define $e_k(\vec r)$ to be the product of all terms $(r_a r_b\dots, r_k)$ for distinct numbers $(a, b, \dots, k) \in [1, n]$.

$$ \begin{aligned} e_k(\vec r) \equiv \sum_{1 \leq a < b < \dots k \leq n} r_a r_b \dots r_k \end{aligned} $$

Elementary Symmetric Polynomials (partition)

For a partition $\vec \lambda \equiv (\lambda_1, \lambda_2, \dots, \lambda_l)$, the elementary symmetric polynomial $e_\lambda$ is the product of the elementary symmetric polynomial $e_{\lambda_1} \cdot e_{\lambda_2} \dots e_{\lambda_l}$.

Monomial Symmetric Polynomials (partition)

We symmetrize the monomial dictated by the partition. To calculate $m_\lambda(\vec r)$, we compute $\vec r^\lambda \equiv r_1^{\lambda_1} r_2^{\lambda_2} \dots r_l^{\lambda_l}$, and then symmetrize the above monomial.
For example, $m_{(3, 1, 1)}(r_1, r_2, r_3)$ is given by symmetrizing $r_1^3 r_2^1 r_3^1$. So we must add the terms $r_1 r_2^3 r_3$ and $r_1 r_2 r_3^3$.
Thus, $m_{(3, 1, 1)}(r_1, r_2, r_3) \equiv r_1^3 r_2 r_3 + r_1 r_2^3 r_3 + r_1 r_2 r_3^3$.

Power Sum Symmetric Polynomials (number)

It's all in the name: take a sum of powers.
Alternatively, take a power and symmetrize it.
$P_k(\vec r) \equiv r_1^k + r_2^k + \dots + r_n^k$.

Power Sum Symmetric Polynomials (partition)

Extend to partitions by taking product of power sets of numbers.
$P_\lambda(\vec r) \equiv P_{\lambda_1}(\vec r) + P_{\lambda 2}(\vec r) + \dots + P_{\lambda_l}(\vec r)$.

Uses of minimal string rotation

This algorithm always struck me as useless. Now I know some uses.
1. Finger print identification: We can encode the finger print into many detailed circular strings. How to search such finger print again from those in very huge data base ? Circular comparision using lyndon factorization is requried.
1. Forest canonicalization. Write a tree out in terms of dyck grammar / brackets. Forest will correspond to sequence of such trees. When are two forests equivalent? Normalize them by minimal rotation.

Suffix Automata

We take for granted knowledge of the Myhill nerode theorem to build the minimal automata of the set of suffixes of a string $l$.
Let the alphabet be $A$, and let us build the suffix automata of $l \in A^\star$.
Define the language of suffixes of a string $l$ as $L \equiv { l[i:] : i \in \mathbb N }$.
By Myhill Nerode, states of the minimal DFA correspond to strings that are indistinguishable under extensions by a membership oracle of $L$.
Suppose a state in the DFA corresponds to two strings $b, s \in A^\star$ ($b$ for big and $s$ for small) such that $|b| \geq |s|$. So we have that $b =_L s$.
Now, for all strings $z$ such that $bz \in L$ we also have $sz \in L$.
So the string must look like follows:

----bbbbzzzzzzzzz
------sszzzzzzzzz

This implies that $s$ is a suffix of $b$!
Strings in the same state correspond to suffixes of the largest string in the state.
Next, we claim that a state consists of all suffixes upto some length. TODO.
Therefore, it is helpful to imagine states as "funnels" or "triangles" or "narrowing trapeziums", which have at the top the longest string, and then shorter and shorter suffixes. The suffix link from a to a->link points from the base of a to the top of the trapezium a->link such that a and a->link can be "joined" into a larger trapezium.

Suffix Automata must be a DAG

A cycle in the automata implies that we have an infinite number of strings in the language, since we can traverse along the cycle as many times as we want before we reach a final state.
Thus, the suffix automata, which accepts a finite language must be a DAG.
This implies that we can perform dynamic programming on the DAG.

Suffix automata: relinking `q` to `qsmol`

Suppose we are inserting a character c. We are at a state p which points to a state q on transitition c.
Now since q contains p:c, we have that len(q) > len(p). If p:c is the longest string of q, then len(p) + 1 = len(q). Otherwise, q has longest string a string which contains p:c as a proper suffix, and thus len(p) + 1 < len(q).
Since q contains p:c, the suffix link at q must point to a state which is a proper prefix of p:c.
If I therefore create a state with longest string p:c, this state p:c has a longest string longer than q->link.
Thus, it is proper to attach q->link to the newly created state.

Simpson's Paradox

The example which made simpson's paradox click for me was the extreme case.
Suppose department E hires every woman but only half the men (E for every), while department N hires neither men nor women.
So in each department, women are either advantaged (as in E) or are on-par (as in N).
Suppose we have 100 men and 100 women.
Let 90 men apply for E and 10 men apply for N. In total, 45 men are accepted (90/2 + 0).
Let 10 women apply for E and 90 women apply for N. In total, 10+0 women are accepted.
Thus, it appears as if only 10 women are selected to 45 men, implying some kind of bias.
In reality, all departments are pro women hiring. The majority of women apply to the deparment N which is hard to get into, thereby making it appear as if the institute (E and N combined) are against women hires.
The information that is lost is that of the split up of men and women who apply to E and N.

Myhill Nerode Theorem

Take a language $L$ over an alphabet $A$.
Define $x \in A^\star$ to have a disginguishing extension from $y \in A^\star$ iff there exists a $s \in A^\star$ such that either (a) $xs \in L \land ys \not \in L$, or $xs \not \in L \land ys \in L$
Said differently, given $x$ and $y$, there suffix $s$ which can distinguish $x$ and $y$ using the membership oracle for $L$.
Now define $x \sim_L y$ ($x$ is indistinguishable from $y$) iff there is no distinguishing extension between $x$ and $y$ with respect to $L$.
See that this is an equivalence relation:
(1) Reflexivity: $x$ cannot be distinguished from $x$ (using $L$), because any suffix $s$ cannot produce different outputs for $x$ and $x$.
(2) Symmetry: If $x$ cannot be disguished from $y$ (using $L$), then $y$ cannot be distinguished from $x$ (using $L$).
(3) Transitivity: If $x$ cannot be distinguished from $y$ (using $L$) and $y$ cannot be distinguished from $z$ (using $L$), then $x$ cannot be distinguished from $z$ (using $L$). Intuition: What about $y$ in this situation? It can't be indistinguishable from both $x$ and $z$.
Proof of Transitivity: Suppose for contradiction $x$ can be distinguished from $z$. There there is a suffix $s$ such that $xs \in L$ while $zs \not \in L$ (WLOG). Now what about $ys$? if $ys \in L$ then we can distinguish $y$ and $z$, contradicting assumption that $y$ is indistinguishable from $z$. If $y \not \in L$ then we can distinguish $x$ from $y$ contradicting assumption that $x$ is indistinguishable from $y$.
Hence, being indistinguishable is an equivalence relation, denoted by $x \sim y$.
Myhill nerode says that the minimal DFA for a language $L$ has as many states as there are equivalence classes for $\sim_L$.

Given DFA $D$ of language $L$ over $A$: $\sim_D$ implies $\sim_L$

Let $L$ be a regular langugage over alphabet $A$ and let $D$ be a DFA (with finite number of states $|D|$ in the DFA).
Partition the set of all strings $A*\star$ via the relation $\sim_D$: $x \sim_D y$ iff $x$ and $y$ end at the same state when fed to DFA.
Thus will have $|D|$ equivalence classes for $\sim_D$, one for each state of the DFA.
For any two strings such that $x \sim_D y$, given any suffix $s$, we have that $xs \sim_D ys$ since we start at the same state at $x$ (or $y$) and continue when we feed the suffix $s$.
Thus, strings such that $x \sim_D y$ are indistinguishable for the DFA.
So we have $x \sim_L y$, since on any extension, both $xs$ and $ys$ either belong or don't belong to $L$.

given language $L$ over $A$: show that $\sim_L$ implies $\sim_D$.

Let $L$ be a langugage over alphabet $A$ such that $\sim_L$ has finitely many equivalence classes.
We will design DFA (called $D$)for $L$ with as many states as equivalence classes.
The start state of $D$ is the equivalence class of the empty string with respect to $\sim_L$.
At an equivalence class/state $T$, given character $c \in A$, we move to $T \diamond c$ (extend $T$ by $c$).
Formally, $\delta(T, c) \equiv T \diamond c$, where $T \diamond c \equiv { tc : t \in T }$
This is well defined, as if $t \sim_L t'$ are in the equivalence class $T$, then we must have $tc \sim_L t'c$.
Suppose $tc$ is distinguishable from $t'c$ by a suffix $s$. Then we can distinguish between $t$ and $t'$ via suffix $cs$. This contradicts $t \sim_L t'$. Thus we have that $t \sim_L t$ implies $tc \sim_L t'c$.
Thus our transition function $\delta$ is well defined over equivalence classes $A/\sim_L$.
A state $T$ in $D$ is accepting if the state contains \emph{any} string $l \in L$.
That is, $T$ is accepting iff there exists a $l \in L$ such that $l \in T$.
In this case, we infer that $T \subseteq L$, or any string in $T$ is accepted by $L$.
Suppose for contradiction that $l \in L$ such that $l \in T$ while also having a string $z \in T$, $z \not in L$.
the empty string would distinguish the two strings $l, z$ which contradicts $z \sim_L l$ since they are both in the equivalence class $T$.
Thus for a regular language $L$ there is a DFA $D$ which accepts strings from $L$ and has number of states $|D|$ equal to number of equivalence clases $|A/\sim_L|$.

DFA needs at least $|A/\sim_L|$ states

Let $n \equiv |A/\sim_L|$ be the number of equivalence classes of $\sim_L$.
Suppose a DFA $D$ recongizes $L$ and has fewer than $|A/\sim_L|$ states.
Let $x_1, x_2, \dots x_n$ be strings from different equivalence classes of $L$.
Push these strings through $D$. Some two strings $x_i, x_j$ must land on the state state $d \in D$ of $D$ (by pigeonhole).
We must have $x_i$ and $x_j$ distinguishable, since they come from different equivalence classes. So the DFA must accept one and reject the other.
But the DFA can't tell the difference between $x_i$ and $x_j$ since they landed on the same state! So the DFA will accept or reject both.
Thus, we have a contradiction from the assumptions (a) $D$ has fewer states than $n$ and (b) $D$ recognizes $L$.
Thus the DFA needs at least $|A/\sim_L|$ states.

The two imply DFA minimization

We have seen that every DFA for $L$ needs at least $|A/\sim L|$ states.
Now starting from $L$, we can build an automata $D^\star$ such that $|D^\star|$ is exactly $|A/\sim L|$.
Thus the automata $D^\star$ is a (the) minimal automata for $L$.

Linearity of expectation for sampling

# process 1
def val(c): return 1 + ord(c) - 'a'
def process(addx, addy):
    s = 0
    ndraws = 10
    for _ in range(ndraws):
        x = random.choice("abcde"),  # draw a random chit
        y = random.choice(x*5+"abcde") # draw a random chit, dependent on first random chit.
        if addx: s += val(x)
        if addy: x += val(y)
    return s

Linearity of expectation says that process(True, True) equals process(True, False) + process(False, True).
Intuitively, if we run the code for process infinitely many times, then each execution of process(True, True) can be split into an execution of process(True, False) and an execution of process(False, True). This can be depicted as:

The above process assumes that we get the same results from running process(True, True) as we do when we run process(True, False) and process(False, True) in succession. Of course, this will never happen.
However, since we are taking an average over many trials, we can imagine that for a run of process(True, True), we will have corresponding runs of process(True, False) and process(False, True):

The key thing to remember is that the random variable does not care about the process, only about the value that is spit out by the process.
Thus we can read linearity of expectation as saying either (1) Simulate the full process in one go and accumulate the results as you go along process(True, True) or E[a+b], or (2) Simulate the process in parts, and add up the accumulated results from the partial simulations (process(True, False) + process(False, True) or E[a] + E[b]).
In both cases, we are allowed to simulate the process fully! The only thing that differs is when we accumulate the answers.
This is in contrast to computing conditional probability, where the situation/process in which P(A|B) occurs is wildly different from P(A).
Linearity of expectation asks us to run the same process, just tally results differently.
It tells us that randomness allows the tallies to line up, whether we tally in two separate phases or in a single phase, which makes intuitive sense!

Linearity of expectation is purity

Suppose we write:

x = random.choice("abcde")
y = random.choice("abcde")
s =  val(x) + val(y)

If we ask for the expected value of s. It's going to be:

E[s] = E[val(x) + val(y)]
= E[val(x)] + E[val(y)]
= 2 E[val(x)]

The last inequality follows because x and y are two copies of the same random variable random.choice("abcde"), thus have the same expected value for val(x), val(y).
So, expecatation 'purifies' random computations.

"Deriving" equivalence for two processses using purity

First write down what process(True, False) + process(False, True) as:

def rhsI():
    sx = 0; sy = 0
    ndraws = 10
    for _ in range(ndraws):
        x = random.choice("abcde"),  # draw a random chit
        y = random.choice(x1*5+"abcde") # draw a random chit, dependent on first random chit.
        sx += val(x)

    for _ in range(ndraws):
        x = random.choice("abcde"),  # draw a random chit
        y = random.choice(x2*5+"abcde") # draw a random chit, dependent on first random chit.
        sy += val(y)
    return sx + sy

Next, we use the purity of random.choice (within expectation) to fuse the two loops:

def rhsII():
    sx = 0; sy = 0
    ndraws = 10

    # loop fusion is safe, because even though random.choice has a side effect, the order
    # of calling random.choice does not matter. It commutes with other random ops.
    for _ in range(ndraws):
        x1 = random.choice("abcde"),  # draw a random chit
        y1 = random.choice(x1*5+"abcde") # draw a random chit, dependent on first random chit.
        sx += val(x1)
        # loop fusion
        x2 = random.choice("abcde"),  # draw a random chit
        y2 = random.choice(x2*5+"abcde") # draw a random chit, dependent on first random chit.
        sy += val(y2)
    return sx + sy

Next, we use purity to set x1 = x2 and y1 = y2, since on expectation, their values are the same.

def rhsIII():
    sx = 0; sy = 0
    ndraws = 10

    # once again, expectation purifies randomness. So within the context of expecattion, we can
    # replace `x2` with `x1` with `x1`
    for _ in range(ndraws):
        x1 = random.choice("abcde"),  # draw a random chit
        y1 = random.choice(x1*5+"abcde") # draw a random chit, dependent on first random chit.
        sx += val(x1)
        # loop fusion
        x2 = x1
        y2 = y1
        sy += val(y2)
    return sx + sy

Finally, we cleanup the code to arrive at process(True, True):

def rhsIV():
    sx = 0; sy = 0
    ndraws = 10

    # once again, expectation purifies randomness. So within the context of expecattion, we can
    # replace `x2` with `x1` with `x1`
    for _ in range(ndraws):
        x1 = random.choice("abcde"),  # draw a random chit
        y1 = random.choice(x1*5+"abcde") # draw a random chit, dependent on first random chit.
        sx += val(x1)
        sy += val(y1)
    return sx + sy

Dedekind MacNiellie

Dedekind MacNiellie

Good and bad combinatorics: intro to counting

Elements of sets are the only objects that we are allowed to count.

From IMO math

Expected number of turns to generate all numbers `1..N` (TODO)

Supposedly, asymptotically N log N

For $N=1$, the expected number of turns is $1$.

Diameter in single DFS (TODO)

Gist by pedu

Min cost flow (TODO)

Problem statement: Find a maximal flow with minimum cost.

Find max flow.
Find negative cost cycle in residual graph of max flow. Push flow around the negative cost cycle.

Relation between max flow and min cost circulation

Recall that min cost circulation asks to compute a circulation with minimum cost [no maximality constraint].
Given a flow network $(V, E, s, t, C)$ ($C$ is capacity fn), create a new cost function $c: V \to \mathbb R$ which assigns cost zero to all edges in the flow networ. Also add a new edge $t \to s$ which has infinite capacity, cost $-1$.
A circulation with cost lower than zero will have to use the $t \to s$ edge. To get minimum cost, it must send as much flow through this edge as possible. For it to be a circulation, the full flow in the network must be zero. So suppose we send $f$ units of flow back from $t$ to $s$. Then we must send $f$ units of flow from $s$ to $t$ for it to be a circulation. Incrasing $f$ (max flow) decreases the cost of the circluation! Thus, max flow is reduced to min cost circulation.

Min Cost Flow in general

First find max flow using whatever.
Next, we need to find negative cost cycle in the residual graph.
Use bellman ford, or SPFA to find negative cost cycles in $O(VE)$ time [run edge relaxation $|V|$ times].

Minimum mean cycle

Which is best cycle to push flow around to reduce cost? The min cost cycle may not be best, since it may have very little capacity.
A negative cycle with max capacity may not have good cost.
Correct: total cost/number of edges --- that is, the mean cost.

shortest path as circulation.

Need to find single source shortest path in a graph (with possibly negative edges, no negative cycles).
We have a balance at each vertex $v$, which tells us how much extra flow must can have coming in versus going out. So, $\sum_u f(l \to v) - \sum_w f(v \to r) = b(v)$. Intuitively, the balance is stored in a tank at the vertex.
We need total balance to be zero.
We set the source $s$ to have balance $1-v$ (supply) and all the other nodes to have balance $1$ (demand).
Let the cost of each edge be the distance, let the capacity of each edge be infinite.
Now, what is a min cost flow which obeys the demands?
Consider the shortest path tree. Imagine it as carrying a flow. Then the shortest path tree indeed obeys the flow constraints.
To convert this into circulation, add back edges from each node back to the source, with a capacity of 1, cost of zero.
This converts shortest path trees into flows/circulations.

Min cost circulation algorithms

Old algorithm: start with a circulation that obeys balance, then push more around (by using negative cycles)
New algorithm (successive shortest path): remove all negative cycles, then restore balance constraints.
how to remove negative cycles? We can just send flow down all negative edges. The resdiual graph will contain no negative cycles. (NOTE: we don't have a valid flow at this point!) This leaves us with resdiual balances at each vertex, about how much more flow we need to send.

References

Jeff E: algorithms video

Clojure: minimal makefile for REPL driven dev with Neovim

Create the deps.edn file:

{:deps
 {org.clojure/clojure {:mvn/version "1.10.1"}
   nrepl {:mvn/version "0.7.0"}
     cider/cider-nrepl {:mvn/version "0.25.2"}}}

and write the Makefile:

# https://clojure.org/guides/deps_and_cli
.PHONY: run

repl:
    clj -m nrepl.cmdline \
        --middleware "[cider.nrepl/cider-middleware]" \
        --interactive

run:
    clj -X dg/run

test:
    clj -Atest

Delimited continuations

reset: add a marker to delimit the capture of the continuation by shift. So called because we add a reset mark onto the stack.
shift: .. So called because to start executing a shift, we move stack frames upto the closest reset from the stack into the heap. When the continuation of shift is called, move back the stack frames from the heap onto the stack.

Direct Implementation of Shift and Reset in the MinCaml Compiler

Never forget monic again

Remember monic ~ injective.
Remember that injective is $f(x) = f(y) \implies x = y$.
Since we're doing category theory, replace $x$ and $y$ by functions $h(p)$ and $k(p)$.
This means that the rule of monic is $\forall p, f(h(p)) = f(k(p)) \implies h = k$.
Thus, monic is left cancellative!

Weird canonical example of monic and epic: left/right shift

Consider the function right over an infinite sequence a_n which is defined as right(a[:])[i] = 0 if i == 0 else a[i-1]. That is, it shifts a sequence to the right.
See that this is injective, and not surjective: for example, there is no pre-image to any sequence that starts with a non-zero value, such as 1, 0, 0, ....
Its dual, left(a[:])[i] = a[i+1] is surjective, but not injective.
This makes it ideal as an "extreme case" to test the monic/epic conditions as left/right cancellable.

Monic

We know that right is monic. Is it cancellable if we run it before or after?
We should run it after --- that way, if right(f(a[:])) equals right(g(a[:])) for all a[:], we know that the sequences are (0, f1, f2, ...) which equals (0, g1, g2, ...) which implies (f1, f2, ...) equals (g1, g2, ...). So we can conclude that f = g from right . f = right . g.
On the other hand, if we consider f(right(a[:])) and g(right([a:]), we will only test the equality of f and g at sequences of the form (0, a1, a2, ...) which is insufficient, this we cannot conclude f = g. So we cannot conclude f = g from f . right = g . right.

Epic

We know that left is epic. Is it cancellable if we run it before or after?
Suppose f . left = g . left. Since left is epic, this tests f and g on every possible input. Thus f = g.
On the other hand, suppose left . f = left . g. This is insufficient, since we will only test the equality of (f2, f3, ...) with (g2, g3, ...) leaving f1 =? g1 untested. Thus, we cannot conclude f = g from left . f = left . g.

Playing guitar: being okay with incorrect chords

I find it very hard to switch chords, since I feel "afraid" of playing the wrong chord.
I feel like this manifests in different ways: I am relunctant to write documents which I fear maybe incorrect, and yet would be valuable to write up. I feel relunctant to compete in competitions for fear of not knowing the "right answer".
Regardless, it's very interesting how when playing the guitar, people (and you) literally don't notice!
As long as you keep the rhythm up, it "sounds fine".
So, if there's a hard chord change to be done, stagger it! play two beats with all strings open. Then hold down a single finger for a beat. Then another finger for the next beat. And so on, till perhaps at the final beat, we make the "complete/correct" chord.
It's interesting, since it adds a sort of design challenge: what is the best sequence of strings to play to "musically" approach a given chord starting from open strings? Different choices of fingers have surprisingly different sounds!
It's also very relieving to be able to simply.. play, experiment with leaving strings one-by-one, pressing strings one by one, without worrying about getting it right, as long as I allow the rhythm-beat to march forward :)
This lends itself particularly well to the style where we mute the guitar every even beat (1 MUTE 2 MUTE) to create a percurssive effect. It allows one to hear the chord being "layered" up, finger by finger.
It also mutes the "open string" sound by the time we get the first finger on, so it helps create
TL;DR: strumming hand >>> chord hand. Focus on the strumming! It's okay to screw up on chords :)

Sparse table

Given an array as :: Semilattice a => [a], find semilattice join of any range [lft..rt] in O(1) time, given O(n log n) preprocessing.
Core idea: store results of queries [lft..l+2^k). So the code:

// mins [l, l+1) = arr[l]
for(int i = 0; i < n; ++i) { mins[i][0] = arr[l]; }
for(int len = 1; len < NBITS; ++len) {
  for(int i = 0; i < n; ++i) {
    const int midix = i + 1 << (len-1);
    if (midix >= n) { break; }
    // mins [l..l+N) = min mins[l..l+N/2) mins[l+N/2..l+N]
    mins[i][l] = min(mins[i][len-1], mins[i + midix][len-1]);
  }
}

Now given a query, the "naive" method is to consider the range [lft, l+len). We break len down into its powers of 2, and then query the indexes based on its binary representation. Eg. a query from [3, 3+7) is broken down into 7 = 4 + 2 + 1, so we query [3, 3+4) which is [3, 7), then [3+4, 3+4+2) which is [7, 9), and finally [3+4+2, 3+4+2+1) which is [9, 10). But this is O(log n) time. We want O(1) time.

    [--------------)
1 2 3 4 5 6 7 8 9 10
    |       |   |  |
    [-------)   |  |
            [---)  |
                [--)

The key is to notice that so far, we've only used associatvity of the lattice operation, not idempotence! We can exploit idempotence by not caring about overlaps.
to find min in [3, 9), we combine [3, 3+4) with [9-4, 9), which is [3, 7) combined with [6, 9) which overlaps at 6.

    [-----------)
1 2 3 4 5 6 7 8 9
    |     | |   |
    [-----+-)   |
          [-----)

The actual expression is:

// [l, r)
int query_mins(int l, int r) {
  int len = r-l;
  if (len < 0) { return INFTY; }
  int j = log2(len); // round down.
  // min  [l, l+halflen), [l+halflen, r)
   return min(mins[l][j], mins[l+-(1<<j)][j]);
}

Duval's algorithm

https://stackoverflow.com/questions/55642656/how-does-duvals-algorithm-handle-odd-length-strings
https://ritukundu.wordpress.com/2016/10/07/algorithm-to-find-the-least-lexicographic-rotation-of-a-circular-string/

Amortized complexity from the verifier perspective

If we want an API that can verify amortized complexity, then each method returns two costs: (a) "number of cycles" spent on the operation, (b) "claimed cost" of the operation. For example, vector.push_back() may return "number of cycles" to be as large as O(n) when doubling, while always returning "claimed cost" as 1.
At the end of any sequence of operations, the verifier verifies that sum (claimed cost) > sum of (#cycles).
This establishes that the claimed/amortized cost is an upper bound on the real cost!

Relationship betwee permutations and runs

Let the permutation be $\pi \equiv (3 9 2 5 6 7 10 11 13 15 14 16 12 1 4 8)$.
Split into runs: $r_1:(3 9)$, $r_2:(2 5 6 7 10 11 13 15)$, $r_3:(14 16)$, $r_4:(12)$, $r_5:(1 4 8)$.
The runs begin at indeces $p[1] = 1$, $p[2] = 3$, $p[3] = 11$, $p[4] = 13$, $p[5] = 14$. Total number of runs is $R=5$.
Encode each number in $[1..15]$ by the run to which it belongs to. This is us mapping the integer $k$ to $run(\pi^{-1}(k))$.
We get that:

1 -> run 5 | (1 4 8)
2 -> run 2 | (2 5 ... 15)
3 -> run 1 | (3 9)
4 -> run 5 | (1 4 8)
--
5 -> run 2 | (2 5 ... 15)
6 -> run 2 | (2 5 ... 15)
7 -> run 2 | (2 5 ... 15)
8 -> run 5 | (1 4 8)
--
9 -> run 1 | (3 9)
10 -> run 2| (2 5 ... 15)
11 -> run 2| (2 5 ... 15)
12 -> run 4| (12)
--
13 -> run 2| (2 5 ... 15)
14 -> run 3| (14 16)
15 -> run 2| (2 5 ... 15)
16 -> run 3| (14 16)

This gives us the array $S = [5, 2, 1, 5| 2, 2, 2, 5| 1, 2, 2, 4| 2, 3, 2, 3]$
The $k$th occurrence of symbol $s$ in $S$ corresponds to the row of the permutation $P[s] + k$. The occurrence will be at $\pi(P[s] + k)$.
Suppose we want to find $\pi(P[s] + k) = y$.

Relationship to burrows wheeler?

See that we do sort of the same thing, where we identify a string based on the ranks of its characters?!

Brouwer's fixed point theorem

General statement:

Given an nD simplex that has been subdivided, and a function that maps vertices of the subdivision to vertices of the simplex such that the function on the boundary of the simplex maps it to endpoints of the boundary, we will always have a subdivided simplex with all vertices of the original simplex.

1D

Given a line with endpoints $a, b$ and points in between, we will always have an occurrence of $ab$ on the line.
Can prove something slightly stronger: there will always be an odd number of $ab$ on the line.

2D

Given a triangle labelled $abc$ and a subdivision of it, there will be a smaller triangle labelled $abc$.
Consider all smaller triangles.
Call a side with $bc$ a door.
How many doors can a triangle have? It can have 0 doors if it is labelled $aaa$, or $abb$, or some such.
It can have 1 door if it is:

 a
/ \
b==c

It can have two doors if it is:

  c
// \
b===c

We can't have three doors. So triangles can have 0, 1, or 2 doors.
If we find a triangle with one door, we are done, since it will have $abc$.
Now start from the bottom of the triangle where we have the side $bc$. Here we will find at least one edge $bc$.
Walk along the triangle, entering any triangle with a door.
If that's the only door of the triangle, we are done.
If not, then the triangle has two doors. Exit the current triangle through the other door (the door we did not enter from). This will take us to another triangle.
See that we cannot terminate the walk by exiting from sides $AB$ or $AC$, for such a side will be of the form $ab$. Then to have a door, we will get a $bc$, so the triangle must be $abc$, ie a triangle we are looking for!
So if we ever escape the simplex, we must escape from the bottom side $BC$. This removes an even number of $bc$ edges from $BC$. But we know there are an odd number of $bc$ from $BC$, so we must find a triangle $ABC$ eventually.

XOR on binary trie

If we XOR a number, then it flips the path that were taking on the binary trie! This seems like a handy way to visualize numbers. In particular, to solve question 1554C

Inconvergent: beautiful generative art

[https://inconvergent.net/faq/](Link to website)

Prefix/Border function

Function that for a string s, at index i, returns the length of the longest border of s[0..i] (inclusive). For example, consider the string s=abababcaab.

at i=0, we have substring s[0..0]=a which has no border (border is proper prefix/suffix). So pr(0) = 0.
at i=1, we have substring s[0..1]=ab which has no border, so pr(1) = 0.
at i=2, we have substring s[0..2]=aba which has a..a as border. pr(2) = 1
at i=3, we have substring s[0..3]=abab which has ab..ab as border. pr(3) = 2
at i=4, we have substring s[0..4]=ababa which has ab[a]ba as border (that is, the prefix is aba and the suffix is aba which overlaps). pr(4) = 3.
at i=5, we have substring s[0..5]=ababab which has ab[ab]ab as border (that is, the prefix is abab and the suffix is abab which overlaps). pr(5) = 4.
at i=6, we have substring s[0..6]=abababc which has no border. pr(6) = 0.
at i=7, we have substring s[0..7]=abababca which has border a..a. pr(7) = 1.
at i=8, we have substring s[0..8]=abababcab which has border ab..ab. pr(8) = 2.

In toto, the prefix function is:

  ababcaab
  01234012

`s[0..i]` has a border of length at `pr(i+1)-1`

That is, given the substring s[0..i], we can predict that s[0..i] will have some border (perhaps not the longest border) of length pr(i+1)-1.
Suppose s[0..i+1] has longest border of length L=pr(i+1) (by definition of pr). Suppose pr(i+1) >= 1. Then I can write s[0..i+1] as:

s[0..i+1] = p[0]p[1]...p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L]
            ^^^^^^^^^^^^^^^                     ^^^^^^^^^^^^^^^

If I now want to consider s[0..i], I need to drop the last letter p[L], which leaves me with:

s[0..i] = p[0]p[1]...p[L-1]p[L]|s[L+1]...s[i+1-L-1]|p[0]p[1]...p[L-1]
          ^^^^^^^^^^^^^^^^^                         ^^^^^^^^^^^^^^^^^

where we have a border of p[0]...p[L-1].
There maybe a longer border, involving other terms of s[0..i]. But we know that the border is at least pr(i) >= pr(i+1)-1.
Upon re-arranging, we see that pr(i+1) <= pr(i) + 1.
This tells us that the border can increase by at most 1 (it can drop to zero, no lower bound!). So we have: 0 <= pr(i+1) <= pr(i) + 1.
So if we think of borders of s[0..i+1], we know that the longest border can be of length of border pr(i) + 1. All other borders will be of length <= pr(i), so these other borders will be borders of s[0..i]!
Thus, to move from s[0..i] to s[0..(i+1)], we simply need to be able to find the longest border of s[0..(i+1)]. All other borders will come from s[0..i].

Lemma: Enumerating borders (what is a good name for this lemma?)

Think of the longest border 123456:

123456----123456
     ^         ^
     L         N

Now consider a shorter border ab:

ab3456----1234ab
     ^         ^
     L         N

But we must have a~1 and b~2 since it's the same string! So the border is really ab34ab, and the string is:

ab34ab----ab34ab
     ^         ^
     L         N

This shows that given the longest border 123456 ofs[0:N] which has length L, any other border of s[0:N](such as ab) is also a border of s[0:L].
Generalizing, given the longest border of s[0..n] of length L, any smaller border of s[0:N] is a border of s[0:L].

Algorithm to enumerate borders.

All borders of s[0:N] can be enumerated by:

Taking the longest border of length L of s[0:N] (given by pr(N)).
considering the border as being the substring s[0:L] (given by pr(L)).
Taking the longest border of s[0:L]
... recurse till we hit empty string.

Computing `pr(i+1)`

We know that pr(0) = 0.
At pr(i+1), we know that s[0:pr(i)] = s[-pr(i):-1]

Border function is a fractal

Consider a general string which looks like:

abcdef--abcdef

If we think of the second longest border, say 12, we will get that the string must be:

12cdef--abcd12
     ^^  &&

But this implies that the occurrence of ef (marked with &&) and the occurrence of ab (marked with ##) must be 12, 12 respectively. This means that the string is:

12cd12--12cd12

So we started with a full string:

------------------

Then matched the left and right (very 1/3 cantor):

-------    ------
       ----

Then matched the left and right of these again and unified the leftovers, finding that it looks like:

--  --   --  --
  --       --
      ---

and so on. Isn't this so cool? Borders of a string are a fractal-like object!

Shortest walk versus shortest path

path is a sequence of vertices connected by edges.
walk is a simple path or a path with no loops.
djikstra's solves shortest walk, not shortest path, since it can't hangle paths with negative cycles!
Bellman ford solves shortest path, since it reports when the question of "shortest path" does not have a sensible answer (ie, the set of paths ordered by length is not well founded).

Minimal tech stack

st: suckless terminal.
mtm: minimal terminal multiplexer.
text editor? I need one.

FFT

Evaluating a polynomial p(x) at [a0, a1, ... am] in general is hard, even though we have the recurrence p(x) = po(x^2) + x pe(x^2). This makes the polynomials smaller (degree of po, pe is half of that of p). However, we need to evaluate at all the points [a0...am] , so to merge we need to merge values at [a0...am] at each step. So our recurrence will be T(n) = T(n/2) + m with T(1) = m. This solves to O(nm).
The special property of DFT is that we can reconstruct p(x) at [w[n][0], ... wn[n][n-1]] given the values of po, pe at [w[n/2][1], w[n/2][2], w[n/2][3], ... w[n/2][n/2-1]]. So the number of points we need to evaluate the polynomial decreases with the size of the polynomial!
This makes the recurrence T(n) = T(n/2) + n with T(1) = 1 which is O(n log n).

Worked out example of FFT of 8 elements

$$ p(x) \equiv a_0 x^0 + a_1 x^1 + a_2 x^2 + a_3 x^3 + a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7 p_e(x) \equiv a_0 x^0 + a_2 x + a_4 x^2 + a_6 x^3 \ p_o(x) \equiv a_1 + a_3 x + a_5 x^2 + a_7 x^3 \ P(x) = p_e(x) = x p_o(x) $$

Now suppose we know how to evaluate $p_e(x)$ and $p_o(x)$ at $[w_4^0, w_4^1, w_4^2, w_4^3]$. where $w_4$ is the 4th root of unity. We wish to evaluate $p(x)$ at $[w_8^0, w_8^1, w_8^2, \dots, w_8^7]$, where $w_8$ is the 8th root of unity. The only two properties of the roots of unity we will need are:

$w_8^2 = w_4$.
$w_8^4 = -1$.

Using the value of $w_8$, the above two relations, the values $p_o(w_4^k) = [p_o(1), p_o(w_4, p_o(w_4^2), p_o(w_4^3)]$ and $p_e(w_4^k) = [p_e(1), p_e(w_4), p_e(w_4^2), p_e(w_4^3)]$, we evaluate $p$ at powers of $w_8$ ( $[p(w_8^k)]$ ) as:

$p(w_8^k) = p_e((w_8^k)^2) + w_8^k p_o((w_8^k)^2) = p_e(w_4^k) + w_8^k p_o(w_4^k)$.
$p(w_8^0) = p_e((w_8^0) + w_8^0 p_o(w_8^0) = p_e(1) + p_o(1)$
$p(w_8^1) = p_e(w_8^2) + w_8^1 p_o(w_8^2) = p_e(w_4^1) + w_8 p_o(w_4^1)$
$p(w_8^2) = p_e(w_8^4) + w_8^2 p_o(w_8^4) = p_e(w_4^2) + w_8^2 p_o(w_4^2)$
$p(w_8^3) = p_e(w_8^6) + w_8^3 p_o(w_8^6) = p_e(w_4^3) + w_8^3 p_o(w_4^3)$
$p(w_8^4) = p_e(w_8^8) + w_8^4 p_o(w_8^8) = p_e(w_4^4) + w_8^4 p_o(w_4^4) = p_e(1) - p_o(1)$

Solving the Recurrence: `T(n) = n + T(n/2)`, with `T(1) = 1`

Proof 1:

Expand the recurrence:

= T(n)
= n + 2T(n/2)
= n + 2[n/2 + T(n/4)]
= n + n + 4T(n/4)
= n + n + 4[n/4 + 2T(n/8)]
= n + n + n + 8T(n/8)
= ...
= kn + ... 2^k T(n/2^k)
= (log n)n + 2^(log n) T(n/2^(log n))
= (log n)n + n T(n/n)
= (log n)n + n* 1
= (log n)n + n

Proof 2:

Consider the tree:

           8
          mrg:8
   4                 4
   mrg:4           mrg:4
  2     2          2       2
 mrg:2  mrg:2     mrg:2   mrg:2
 1 1    1  1      1   1    1  1

Number of leaves is n. Cost of each leaf is T(1) = 1. Total cost of leaf level is n.
At each level above, total cost is 8 = 4*2 = 2*4.
Number of levels in log n.
Total cost is cost of leaf n, plus cost of interior nodes n log n.

codeforces rating of some GMs

Zscoder: Is quite comforting to see the rating of someone who started at 1400, dropped to pupil and then worked their way back up?
ScarletS

Continuum TTRPG

Events don't conspire. People do. Events can't conspire, and people can. Causality is not a renewable resource. If a time machine could be constructed, it would be married to the trend of instant gratification.

Fragging

Sentient force must be applied to undo sentient damage --- time combat.

As/As not

At this moment, anything is possible.

Causality is only one principle and psycohology essentially cannot be exhausted by causal methods only.

Blending in with levellers

Anthropologists in the field have noted that to be accepted as a native to a place, one has to be born there. No matter how well you behave, or how welcome a part of the com- munity you become, it will be remembered that you came from outside. A curious exception to this was observed by anthropologist Charles Ward, working in the 1950s: If one leaves the community, and then returns after a distinct absence, one is then welcomed much as a returning native. This can be seen as a norm in most human cultures, as long as the person returning was looked upon favorably when they left.

Tipler Cylinder

A Tipler cylinder, also called a Tipler time machine, is a hypothetical object theorized to be a potential mode of time travel—although results have shown that a Tipler cylinder could only allow time travel if its length were infinite or with the existence of negative energy.

Words to know in target language

Animal: dog, cat, fish, bird, cow, pig, mouse, horse, wing, animalC
Transportation: train, plane, car, truck, bicycle, bus, boat, ship, tire, gasoline, engine, (train) ticket.
Location: city, house, apartment, street/road, airport, train station, bridge, hotel, restaurant, farm, court, school, office, room, town, university, club, bar, park, camp, store/shop, theater, library, hospital, church, market, country (USA, France, etc.), building, ground, space (outer space), bank.
Clothing: hat, dress, suit, skirt, shirt, T-shirt, pants, shoes, pocket, coat, stain, clothingC
Color: red, green, blue (light/dark), yellow, brown, pink, orange, black, white, gray, colorC
People: son, daughter, mother, father, parent (= mother/father), baby, man, woman, brother, sister, family, grandfather, grandmother, husband, wife, king, queen, president, neighbor, boy, girl, child (= boy/girl), adult (= man/woman), human (≠ animal), friend (Add a friend’s name), victim, player, fan, crowd, personC
Job: Teacher, student, lawyer, doctor, patient, waiter, secretary, priest, police, army, soldier, artist, author, manager, reporter, actor, jobC
Society: religion, heaven, hell, death, medicine, money, dollar, bill, marriage, wedding, team, race (ethnicity), sex (the act), sex (gender), murder, prison, technology, energy, war, peace, attack, election, magazine, newspaper, poison, gun, sport, race (sport), exercise, ball, game, price, contract, drug, sign, science, God
Art: band, song, instrument (musical), music, movie, art
Beverages: coffee, tea, wine, beer, juice, water, milk
Food: egg, cheese, bread, soup, cake, chicken, pork, beef, apple, banana, orange, lemon, corn, rice, oil, seed, knife, spoon, fork, plate, cup, breakfast, lunch, dinner, sugar, salt, bottle
Home: table, chair, bed, dream, window, door, bedroom, kitchen, bathroom, pencil, pen, photograph, soap, book, page, key, paint, letter, note, wall, paper, floor, ceiling, roof, pool, lock, telephone, garden, yard, needle, bag, box, gift, card, ring, tool
Electronics: clock, lamp, fan, cell phone, network, computer, program (computer), laptop, screen, camera, television, radio
Body: head, neck, face, beard, hair, eye, mouth, lip, nose, tooth, ear, tear (drop), tongue, back, toe, finger, foot, hand, leg, arm, shoulder, heart, blood, brain, knee, sweat, disease, bone, voice, skin, body
Nature: sea, ocean, river, mountain, rain, snow, tree, sun, moon, world, Earth, forest, sky, plant, wind, soil/earth, flower, valley, root, lake, star, grass, leaf, air, sand, beach, wave, fire, ice, island, hill, heat
Materials: glass, metal, plastic, wood, stone, diamond, clay, dust, gold, copper, silver
Math/Measurements: meter, centimeter, kilogram, inch, foot, pound, half, circle, square, temperature, date, weight, edge, corner
Misc Nouns: map, dot, consonant, vowel, light, sound, yes, no, piece, pain, injury, hole, image, pattern,
Parts of speech: noun, verb, adjective. Use as labels to help distinguish between very similar-looking words (i.e., to die (verb), death (noun), dead (adjective))
Directions: top, bottom, side, front, back, outside, inside, up, down, left, right, straight, north, south, east, west, directionC
Seasons: Summer, Spring, Winter, Fall, season
Numbers: 0 to 20, 30, 40, ... 100, 1st...5th
Months: January, February, March, April, May, June, July, August, September, October, November, December
Days of the week: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. Note: You’ll usually find pictures of people going to work on Mondays and partying on Fridays/Saturdays, etc.
Time: year, month, week, day, hour, minute, second , morning, afternoon, evening, night, time.
Verbs: work, play, walk, run, drive, fly, swim, goC, stop, follow, think, speak/say, eat, drink, kill, die, smile, laugh, cry, buy, pay, sell, shoot(a gun), learn, jump, smell, hear(a sound), listen(music), taste, touch, see (a bird), watch (TV), kiss, burn, melt, dig, explode, sit, stand, love, pass by, cut, fight, lie down, dance, sleep, wake up, sing, count, marry, pray, win, lose, mix/stir, bend, wash, cook, open, close, write, call, turn, build, teach, grow, draw, feed, catch, throw, clean, find, fall, push, pull, carry, break, wear, hang, shake, sign, beat, lift
Adjectives: long, short (long), tall, short (vs tall), wide, narrow, big/large, small/little, slow, fast, hot, cold, warm, cool, new, old (new), young, old (young), good, bad, wet, dry, sick, healthy, loud, quiet, happy, sad, beautiful, ugly, deaf, blind, nice, mean, rich, poor, thick, thin, expensive, cheap, flat, curved, male, female, tight, loose, high, low, soft, hard, deep, shallow, clean, dirty, strong, weak, dead, alive, heavy, light (heavy), dark, light (dark), nuclear, famous
Pronouns: I, you (singular), he, she, it, we, you (plural, as in “y’all”), they.

DP on subarrays

We can update subarrays with the rule dp[l][r] = merge(dp[l][r-1], dp[l+1][r], compute(l, r)) where merge merges the best results of all subarrays, and compute(l, r) computes the value for [l..r]. This guarantees that dp[l][r] will track the best value from all subarrays. For this DP to work, we iterate by length of the subarray.

Vis editor cheat sheet

Insert

x/search: selects all things that match search
C-k/C-j: extend cursor above/down
C-n: select next match [This is sublime's C-d].
Select a block, hit I to create cursors at the beginning of each line
Select a block, hit A to create cursors on end of each line

Removal

C-p: remove the primary selection.
C-x: skip

Navigation

C-d/C-u: navigation
+/-: rotation
<Tab> and <S-Tab>: alignment
_: trim white space
o: orientation: move to beginning and ending of selection.

References

Mean, Median and Jensen's

The intuition for Jensen's is typically presented as:

|
| \       /
|  \  *  /
|   \   /
|    -@-
|
+--x----->

* is the average of the $f(x)$
@ is the $f$ of average of the x's.
I wish to reinterpret this: the @ is at the median of the $f(x)$s. So Jensen is maybe saying that the value at the median is lower than the mean of the values in this case due to the convexity of $f$.
In some sense, this tells us that the "data" ${ f(x): l \leq x \leq r }$ is skewed in such a way that median is lower than the mean.
I don't know if this perspective helps, or even if it is correct, but I wish to dwell on this perspective since it's one I don't use often. I've been thinking more along these lines due to competitive programming, and I quite enjoy the change!

The similarity between labellings and representations

One way to think about labellings is that we track the "entire history" of the object.
it's hard to count unlabelled objects. it's easier to count labelled objects.
for example, suppose we have graphs $g = (v, e)$ and $h = (v', e')$. an isomorphism of these as unlabelled graphs is a bijection function $f: v \rightarrow v'$ such that $s e t$ if and only if $f(s) e f(t)$.
there could be many such $f$, or no such $f$. it's hard to find out!
Now let's suppose the graphs have labellings, so we have labels $l: V \rightarrow [|V|]$ and $l': V' \rightarrow [|V'|]$ where $[n] \equiv {1, 2, \dots, n}$.
An isomorphism of labelled graphs is an unlabelled isomorphism along with the constraint that $l'(f(v)) = l(v)$. That is, we must preseve labels. So, for example, the graph:

a:1 -- b:2
c:2 -- d:1

are isomorphic since I can send a -> d and b -> c.

On the other hand, the graph:

a:1-b:2-c:3
d:1-e:3-f:2

is not isomorphic (though they would be if we forget the numbering), since the center vertices b and e have different labels.

Let's think of the equation $l'(f(v)) = l(v)$. Since $f$ is a bijection, we have $|V| = |V'|$, so $l$ and $l'$ are both bijections to the same set $[|V|] = [|V'|]$. So we can invert the equation to write $f(v) = l'^{-1}(l(v))$. This tells us that $f$ is determined by the labellings!
The point of having a labelling is that it forces upon us a unique isomorphism (if it exists), given by the equation $f(v) \equiv l^{-1}(l(v))$.
This collapses hom sets to either empty, or a unique isomorphism, which is far tamer than having many possible graph isomorphisms that we must search for/enumerate!
In analogy to representation theory, if we consider two irreducible representations of a group $G$, say $\alpha: G \rightarrow GL(V)$ and $\beta: G \rightarrow GL(W)$, Schur's lemma tells us that the Hom-set between the two representations (an intertwining map) is either the zero map (which is like having no isos) or a scaling of the identity map (which is like having a uniquely determined iso).
In this sense, we can think of an irrep as a "labelling" of group elements in a particularly nice way, since it constrains the potential isomorphisms of the "labelled objects"!

L1 norm is greater than or equal to L2 norm

Pick two points $A \equiv (x_1, y_1)$ and $B \equiv (x_2, y_2)$, and suppose $x_1 < x_2$ and $y_1 < y_2$. So we imagine this as two sides of a triangle:

     B
    /
   /
  /
 /
A

The L1 norm is $|x_2 - x_1| + |y_2 - y_1|$. This is the distance on connecting to an origin $O$:

  δx
O----B
|   /
δy /
| /
|/
A

The L2 norm is $\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$, which is the distance of the vector $AB$, or the hypotenuse of the right angled triangle $AOB$:

  δx
O----B
|   /
δy / L2
| /
|/
A

By triangle inequality, $OA + OB \geq AB$, hence $L_1 = \delta_x + \delta_y \geq L_2$

Z algorithm

The Z algorithm, for a given string $s$, computes a function $Z: [len(s)] \rightarrow [len(s)]$.
$Z[i]$ is the length of the longest common prefix between $S$ and $S[i:]$.
So, $S[0] = S[i]$, $S[1] = S[i+1]$, $S[2] = S[i+2]$, and so on till $S[Z[i]] = S[i + Z[i]]$, and then $S[Z[i]+1] \neq S[i + Z[i] + 1]$.
If we can compute the Z function for a string, we can then check if pattern P is a substring of text T by constructing the string P#T$. Then, if we have an index such that Z[i] = len(P), we know that at that index, we have the string P as a substring.
Note that the Z-algorithm computes the Z function in linear time.
The key idea of the Z algorithm is to see that if we are at an index i, but we have an index l < i such that i < l + z[l], then we have that s[0:z[l]] = s[l:l+z[l]]. Thus, we are "in the shade" of the l.
In this situation, we can reuse z[i-l] as a seed for z[i]. There are two cases: i + z[i-l] > l + z[l] and the converse.
If i + z[i-l] < l + z[l], then we are still "in the shade" of l, so we can safely set z[i] = z[i-l].
If not, we set z[i] = l + z[l], since we know that we match at least this much of the beginning of the string.

Specification

vector<int> calcz(std::string s) {
 const int n = s.size();
 vector<int> z(n);
 z[0] = 0;
 for(int i = 1; i < s.size(); ++i) {
   z[i] = 0;
   while(i + z[i] < n && s[i+z[i]] == s[z[i]]) {
     z[i]++;
   }
 }

 return z;
}

Implementation

vector<int> myz(std::string s) {
    const int n = s.size();
    vector<int> z(n);
    for(int i = 0; i < n; ++i) { z[i] = 0; }

    // shade that was last computed.
    int l = 0;
    for(int i = 1; i < n; ++i) {
        // shade: (l + z[l]) - i
        // guess from start: z[i-l]
        z[i] = max(0, min(l + z[l] - i, z[i-l]));

        // compare with initial portion of string.
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) { z[i]++; }

        // we exceed the current shade. Begin ruling.
        if (i + z[i] >= l + z[l]) { l = i; }
    }

    return z;
}

Reference: Algorithms on strings, trees, and sequences.

For a given recurrence, what base cases do I need to implement?

For a linear recurrence, we need to defie base cases for as many steps as we go back.
For combinations, we step n by 1, r by 1. So we need to define what happens for n=0 OR r=0.

Number of distinct numbers in a partition

A positive integer $n$ is represented as a partition $\lambda \equiv (k_1, k_2, \dots)$ where $\sum_i k_i = n$ and $k_1 \leq k_2, \dots$. Such a $\lambda$ always contains at most $O(\sqrt n)$ distinct numbers.
Intuition: suppose we want to have the maximum number of distinct numbers. Since we are tied down by the constraint $\sum k_i = n$, we must try to choose the $k_i$ as small as possible. But we know that even $\sum_{i=1}^p i = p(p+1)/2 \sim O(p^2)$. Now if $O(p^2) = n$, then the sum can only run upto $\sqrt p$.
Alternate intuiton: asking to build a number $n$ out of distinct numbers $k_1, k_2, \dots$ is asking to build a "jagged triangle" out of columns $(i, k_i)$ whose area is $n$. Area is $1/2 b h$, which is sorta quadratic (?)

Splitting $f(x) = y$ into indicators

If the output of $f(x)$ is a natural number, then we can write the value $f(x)$ as:

$$ f(x) = \sum_{i=1}^\infty [f(x) \geq i] $$

where $[f(x) \geq i]$ is $1$ if the condition is true and $0$ otherwise.

Another useful indicator type equation is:

$$ \sum_x f(x) = \sum_x \sum_i i \cdot [f(x) = i] = \sum_i i \cdot (\sum_x [f(x) = i]) $$

Why searching for divisors upto `sqrt(n)` works

It's not that all divisors are smaller than $\sqrt n$. For example, consider $14 = 7 \times 2$. $\sqrt{14} \sim 4$, but one of its diviors ($7$) is greater than 4.
Rather, it is that if there is a divisor $l$ (for large) which is larger than $\sqrt n$, there will be another divisor $s$ which is smaller than $\sqrt n$.
Proof: Suppose $l \div n$, $l \geq \sqrt n$. So there exists an $s$ such that $ls = n$, or $s = n / l$.
Since $l \geq \sqrt n$, $n / l \leq n / \sqrt n = \sqrt n$. Thus $s \leq \sqrt n$.
So if we wish to find some factor of $n$, we can simply search within the range $\sqrt n$.
If $n$ has no factors in the range $\sqrt n$, then $n$ must be prime, for if $n$ did have a larger factor, $n$ would also have a smaller factor we would have found.

Heuristics for the prime number theorem

"It is evident that the primes are randomly distributed but, unfortunately, we don't know what 'random' means."

The prime number theorem says that at $n$, the number of primes upto $n$ is (approx.) $(n/\log n)$. Formally:

$$ P(n) \equiv |{ p \text{ prime } : 2 \leq p \leq n }| \sim \frac{n}{\log n} $$

Consider sieving. In the beginning, everything is potentially prime.
When we remove the multiples of a prime p, we decrease the density of potential primes.
As we remove 1/p of the remaining potential primes (eg. removing 2 drops density of potential primes by half. Sieving by 3 reduces the density by one-third).
See that the reduction is additive not multiplicative. That is, upon removing the 5, we lose 1/5 of our potential primes, so the new density is D2 = D - D/5. Said differently, we have 4/5th the number of potential primes we used to have, so D2 = 4D/5.
Furthermore, removing 5 only begins to affect the density of primes after 5*5 = 25, since smaller multiples of 5 (5*1, 5*2, 5*3, 5*4) have been removed at earlier iterations of the sieve (on sieving 5, 2, 3, and 2, respectively).
In general: On removing a prime p, Our new density becomes D2 = D - D/p which is (1-1/p)D.
In general: On removing a prime p, the density till p*p remains untouched. Density after p*p is multiplicatively scaled by (p-1)/p.
Define a function $f(x)$ which estimates density of primes around $x$.
We consider the effect of the primes in the interval $A \equiv [x, x+dx]$ on the interval $B \equiv [x^2, (x+dx)^2]$
Each prime $p$ in interval $A$ decreases the density of primes in interval $B$ by subtracting $f(x^2)/p$, since we lose those many primes. Since each number in $[x, x+dx]$ is basically $x$, we approximate this subtraction to $f(x^2)/x$.
In interval $A$, there are $f(x)dx$ primes.

$$ \begin{aligned} &f((x+dx)^2) = \ &= f(x^2) - \text{killing of from primes in $x$} \ &= f(x^2) - \sum_{p \in [x, x+dx]} [\texttt{Prob}(p \text{ prime})] \cdot f(x^2)/p \ &= f(x^2) - \sum_{p \in [x, x+dx]} [f(p)] \cdot f(x^2)/p \ &= \text{($p \sim x$ in $[x, x+dx]$):} \ &= f(x^2) - \texttt{length}([x,x+dx]) \cdot [f(x)] \cdot f(x^2)/p \ &= f(x^2) - (dx) f(x) \cdot f(x^2)/x \ &f((x+dx)^2) = f(x^2) - f(x^2)f(x)dx/x \ &f((x+dx)^2) - f(x^2) = -\frac{f(x^2)f(x)dx}{x} \end{aligned} $$

From this, we now estimate $f'(x^2)$ as:

$$ \begin{aligned} &f((x+dx)^2) - f(x^2) = -\frac{f(x^2)f(x)dx}{x} \ &\frac{f((x+dx)^2) - f(x^2)}{dx} = -\frac{f(x^2)f(x)}{x} \ &\frac{f(x^2 + 2dx + dx^2) - f(x^2)}{dx} = -\frac{f(x^2)f(x)}{x} \ &\text{Ignore $O(dx^2)$:} \ &\frac{f(x^2 + 2dx) - f(x^2)}{dx} = -\frac{f(x^2)f(x)}{x} \ &\frac{f(x^2 + 2dx) - f(x^2)}{dx \cdot 2x} = -\frac{f(x^2)f(x)}{x\cdot 2x} \ &\frac{f(x^2 + 2dx) - f(x^2)}{2xdx} = -\frac{f(x^2)f(x)}{2x^2} \ & \text{Let $u = x^2$} \ &\frac{f(u + du) - f(u)}{du} = -\frac{f(u)f(\sqrt u)}{2u} \ &f'(u) = -\frac{f(u)f(\sqrt u )}{2u} \end{aligned} $$

An immediate consequence

Sice $u$ is large, we approximate $u \sim \sqrt{u}$ to get:

$$ \begin{aligned} &f'(u) = -\frac{f(u)f(\sqrt u )}{2u} \ &f'(u) \sim -\frac{f^2(u)}{2u} \ &\frac{df}{du} \sim -\frac{f^2(u)}{u} \ &\frac{df}{f^2} \sim -\frac{du}{u} \ &\int \frac{df}{f^2} \sim - \int \frac{du}{u} \ &-\frac{1}{f(u)} \sim -\ln(u) \ &\frac{1}{f(u)} \sim \ln(u) \ &f(u) \sim \frac{1}{\ln(u)} \end{aligned} $$

So the density of primes around $f(u)$ is $1/\ln(u)$. So upto $n$, the number of primes is $\int_0^n f(x) dx$ which is $\int_0^n 1/ln(x) dx$, bounded by $n/ln(n)$. This "proves" the prime number theorem.

Sum of absolute differences of an array

We are given an array a[:] and we are asked to compute the sum of differences $\sum_{i=1}^n \sum_{j=i+1}^n |a[i] - a[j]|$.
To compute this efficiently, first sort a[:] into a sorted array s[:]. For simplicity, say we have N = 4.
Now see that if we write down the values for N=4, we will see:

D =
|s[1] - s[2]| + |s[1] - s[3]| + |s[1] - s[4]| +
|s[2] - s[3]| + |s[2] - s[4]| +
|s[3] - s[4]|

i < j implies s[i] < s[j] as s is sorted. So each of the terms (s[i] - s[j]|) is negative. We thus flip the terms, giving:

D =
(s[2] - s[1]) + (s[3] - s[1]) + (s[4] - s[1]) +
(s[3] - s[2]) + (s[4] - s[2]) +
(s[4] - s[3])

Note that s[1] is always negative, so it will have coefficient -4 on grouping.
See that s[2] was positive in the grouping (1, 2), and was negative in the groupings (2, 3) and (2, 4). So 2 will have a coefficient +1*(2-1) -1*(4 - 2).
Similarly, s[3] was positive in the grouping (1, 3) and (2, 3) and was negative in the grouping (3, 4).
In general, s[i] will be positive when paired with [1, 2, ..i-1, i) and negative when paired with (i, i+1, i+2, \dots n]. So s[i] will contribute a coefficient of +1*(i-1) - 1*(n-i) [using the formula that for intervals [l, r) and (l, r] the number of elements is (r-l)]

GCD is at most difference of numbers

assume WLOG $l< r$. Then, Let $g \equiv gcd(l, r)$. Claim: $g \leq r - l$.
Proof: we have $g \div r$ an $g \div l$ by definition, hence we must have $g \div (r - l)$, and $g$, $(r-l)$ are nonnegative. So $g \leq (r - l)$.
Intuition: the gcd represents the common roots of $l, r$ in Zariski land. That is, if $l, r$ are zero at a prime then so is $r - l$.
So, the GCD equally well represents the common roots of $l$ and $(r - l)$.
Now, if a number $x$ vanishes at a subset of the places where $y$ vanishes, we have $x < y$ (the prime factorization of $y$ contains all the prime factors of $x$).
Since the GCD vanishes at the subset of the roots of $l$, a subset of the roots of $r$, and a subset of the roots of $(r-l)$, it must be smaller than all of these.
Thus, the GCD is at most $r - l$.
Why does GCD not vanish at exactly the roots of $r-l$? If $l$ and $r$ both take the same non-zero value at some prime then $(r - l)$ does too. But this is not a loacation where $l$ and $r$ vanish.

implementing GCD and LCM

// gcd(x, y) = d <=> min({ ax + by : ax + b y >= 0 })= d
long gcd(long x,long y) { return y == 0 ? x : gcd(y, x%y); }
long lcm(long x,long y) {return x/gcd(x,y)*y;}

Centroid of a tree

A centroid is a node which upon removal creates subtrees of size at most ceil(n/2).

Existence of centroid for rooted tree (algorithm to compute centroid)

If tree has exactly one node, we are done, the centroid is the root.
Suppose for induction a centroid exists for trees of size $n-1$. We will now prove the existence of a centroid for tree of size $n$.
Otherwise, if the root has all children whose subtree sizes are at most ceil(n/2), the root is the centroid and we are done.
Otherwise, the root has one child with subtree size strictly greater than ceil(n/2). There can't be two such children, because their combined size would be 2*ceil(n/2) >= n. This is nonsensical, as the size of the subtrees plus the root node would mean the tree has 2*ceil(n/2) + 1 >= n+1 nodes, a contradiction.
We recurse into the subtree. The size of the subtree of the child is at least one less than the size of the root, thus we are decreasing on the size of the tree.
By recursion, we must terminate this process and find a centroid.

Centroid decomposition

Once we find the centroid of a tree, we see that all of its subtrees has size less than ceil(n/2).
We can now recurse, and find sizes of centroids of these subtrees.
These subtrees are disjoint, so we will take at most O(n) to compute sizes and whatnot.
We can do this log(n) many steps since we're halving the size of the subtree each time.
In total, this implies that we can recursively find centroids to arrive at a "centroid decomposition" of a tree.
Note that the centroid decomposition of the tree constructs a new tree, which is different from the original tree, sorta how the dominator tree is a different tree from the original tree.

Center of a tree

The remoteness / eccentricity of a vertex $v$ is its distance from its furthest node. $r(v) \equiv \max_{w \in V} d(v, w)$.
The center of a tree is the vertex with minimum remoteness.

Claim: center is on any diameter.

Let $D$ be the diameter of length $L$.
Let $c$ be the center. We claim that $c$ lies on $D$. If so, we are done.
If not, then there is a path from $c$ to some vertex $v$ in $D$. Let WLOG the endpoints of the diameter be $s$ and $e$, and such that $v$ is further from $s$ than $e$. That is: $d(s, v) \geq s(v, e)$. In a picture:

s----------v---e
           |
           n
           |
         n-c--n
           |
           n

Key idea: the important distance is the distance from $v$ to $s$. So we can forget everything in a radius of $d(v, e)$, as the distance $d(v, e) < d(v, s)$, and $d(c, e) < d(c, s)$. But if we forget the structure around $v$ in a radius of $e$, all we are left with is:

s-------------------v
                    |
                    |
                    c

where clearly $v$ is closer to $s$ than to $c$, and thus $c$ cannot be the center. In some sense, we are making a large scale/coarse structure argument, where the large scale structure is dominated by $d(s, v)$, which is all that matters.

For any node $n$ in the subtree hanging from $v$, we have $d(n, v) \leq d(v, e)$, since otherwise the path $s-v-n$ would become a path longer than the diameter, contradicting the maximality of the diameter.
Hence, we have $d(n, v) \leq d(v, e) \leq d(v, s)$, where the second inequality comes from the assumption of $s$ and $e$. So $s$ is the node that is furthest from $v$ amongst all nodes in the graph.
But now notice that $d(c, s) = d(c, v) + d(v, s)$, and this is the longest distance from $c$ to any other node. This implies that $d(c, s) > d(v, s)$ as $d(c, v) > 0$.
This contradicts the minimality of the eccentricity of $c$: the longest distance from $c$ to

Claim: center is median of any diameter

We've already seen that center is on the diameter. Now if a center node is not on the median, the distance to the furthest node (start/end) can be improved by moving the center node closer to the median. So the best choice is to have the center be at (one of the) medians.

Claim: center does not change by removing all leaf vertices

We've shown that the center is the median of all diameters. Removing all leaves removes two elements at the beginning and end of all diameters, leaving the median (the center) invariant.

Image unshredding as hamiltonian path

This was a cool use of hamiltonian path that I saw on hacker news recently.

The problem is this: given an image where the columns are created by shuffling columns of an original image, we must recreate the original image.
The reduction: treat each column as a vertex, connect columns that are close to each other in similarity.
Hamiltonian path will visit each vertex exactly once (ie, pick each column exactly once).

I think this example is striking enough that I'll never forget that in a hamiltonian path, we can visit vertices exactly one, (in contrast to an euler tour, we must visit each edge exactly once).

Distance between lines in nD

https://www.codechef.com/viewsolution/28723599

Subproblem: point-line distance in nD

Intuitlvely, given a point $o$ and a line $L \equiv p + \alpha x$ (greek letters will be reals, all else vectors), we must have that the line that witnesses the shortest distance from $o$ to $L$ must be perpendicular to $L$.
For if not, we would have some "slack" that we could spend to shorten the distance. Alternatively, using Lagrange multipliers intuition, the gradient must be perpendicular to the level surface of the constraint. In this case, we are tryng to find a point $o'$ that minimizes distance $oo'$ such that $o' \in L$. The ladder is a lagrange constraint, and hence defines a level surface to which the optimal solution must be perpendicular to.
Some calculus to prove this Let $l \equiv p + \alpha x$ be a point on the line $L$. We extrmize the length $ol$ as a parameter of $\alpha$:

$$ \begin{aligned} &\partial_\alpha (ol \cdot ol) = 0 \ &\partial_\alpha ((o - p - \alpha x) \cdot (o - p - \alpha x) = 0 \ &\text{only terms with $\alpha$ survive $\partial_\alpha$: } \ &\partial_\alpha - o \cdot \alpha x + p \cdot \alpha x - \alpha x \cdot o - \alpha x \cdot (- p) - \alpha x \cdot (- \alpha x) = 0\ &\partial_\alpha - 2 \alpha o \cdot x + 2 p \cdot \alpha x + \alpha^2 x \cdot x = 0\ &\partial_\alpha - 2 \alpha o \cdot x + 2\alpha p \cdot x + \alpha^2 x \cdot x = 0 \ &- 2 o \cdot x + 2 p \cdot x + 2 \alpha x \cdot x = 0 \ &2 (- o + p + \alpha x) \cdot x = 0 \ &2 (- o + l) \cdot x = 0 \ &2 (\vec{lo}) \cdot x = 0 \ &(\vec{lo}) \cdot x = 0 \ &\vec{lo} \bot x \end{aligned} $$

This tells us that the line $ol$ is perpendicular to the direction $x$, which is the direction of the line $L$. Hence, the line $(ol)$ from the point $o$ to the line $L$ with minimum distance is orthogonal to the line $L$ itself.

Line-Line distance

We can take two parametric points on two lines $L \equiv p + \alpha x$, and $M \equiv q + \beta y$, and build the line $lm$ which witnesses the shortest distance.
From the above derivation, we see that the line $lm$ must be perpendicular to both $L$ and $M$, since we can view line-line-distance as two simultaneous point-line-distance problems: distance from point $l \in L$ to line $M$, and distance from point $m \in M$ to line $L$.
This gives us the equations $lm \cdot x = 0$, and $lm \cdot y = 0$. We have two variables $\alpha, \beta$ and two equations, so we solve for $\alpha, beta$.
This allows us to find the line $lm$ whose length is the shortest distance.

`lower_bound` binary search with closed intervals

// find rightmost ix such that ps[ix].b < t
ll max_earlier(ll t, vector<P> &ps) {
  assert(ps.size() > 0);
  // [l, r]
  ll l = 0, r = ps.size()-1;
  // closed interval.
  int ans = -1;
  while (l <= r) {
    ll mid = l + (r-l)/2;
    if (ps[mid].b < t) {
      // we have considered `mid`.
      // now move to the higher range to find other candidates.
      ans = max(ans, mid);
      l = mid+1;
    } else {
     // ps[mid] does not satisfy our invariant.
     // move to the lower range.
     r = mid-1;
    }
  }
  assert(ps[l].b < t);
  if (l + 1 < ps.size()) { assert(ps[l+1].b >= t); }
  return ans;
}

Sliding window implementation style

I usually implement sliding window as:

// [l, r)
int l = r = 0;
while (r < n) {
 assert(l <= r);
 if (extend_window) { r++; }
 else {
    l--; //contract window
 }
}

However, there are cases where we have complicated invariants on the sliding window, such as a maximum length. An example is codeforces 676c, where we must maintain a sliding window which contains at most k >= 0 "illegal" elements.

My flawed implementation using a while loop was:

int best = 0;
for(int c = 'a'; c <= 'b'; ++c) {
    // window: [l, r)
    int l = 0, r = 0;
    // number of illegal letters changed. <= k
    int changed = 0;
    while(r < n) {
        assert(changed <= k);
        assert(l <= r);
        if (s[r] == c) { r++; } // legal, extend.
        else {
            // need to change a letter to extend, s[r] != c.
            if (changed == k) {
                // cannot extend, contract from left.
                if (s[l] != c) { changed--; }
                l++;
            } else {
                // extend, spending a change.
                r++;
                changed++;
            }
        }
        // keep track of best window size.
        best = max(best, r-l);
    }
}

Unfortunately, the above code is flawed. It does not work when the window size is zero. (TODO: explain) on the other hand, the implementation where we always stride forward with the r value in a for loop, and only deciding what happens with l does not suffer from this (link to implementation):

int best = 0;
for(int c = 'a'; c <= 'b'; ++c) {
    int l = 0;
    // number of illegal letters changed. <= k
    int changed = 0;
    // [l, r]
    for(int r = 0; r < n; ++r) {
        // change to 'a'.
        if (s[r] != c) { changed++; }
        // maintain invariants: must have changed <= k,
        // and at the end of a loop trip, we must have l <= r.
        while(changed > k && l < r) {
            if (s[l] != c) { changed--; }
            l++;
        }
        assert(l <= r);
        // keep track of best window size.
        best = max(best, r-l+1);
    }
}

Kawaii implementation of `x = min(x, y)`

template <typename T>
inline void Mn(T &x, T y) { x > y && (x = y); }

Wrapping the thing into a template allows one to write code such as Mn(x, 10) to mean x = min(x, 10). This a nice pattern!

CSES: Counting Towers

Link to problem I found the problem interesting, as I found the DP states un-obvious.
I eventually performed a DP on the the number of possible towers in y-axis [0, h) where we keep track of whether the last layer has a 2x1 tile or two 1x1 tiles.
Importantly, this means that the decision of "closing" a section to create a new section is left to the next DP state.
This is weirdly reminisecent of some kind of topological phenomena, where we use intervals of the form [l, l+1) to cover a space.
It seems to help me to look at this kind of DP as first creating the combinatorial objects, and then switching it over to counting the number of such objects created.

Smallest positive natural which can't be represented as sum of any subset of a set of naturals

we're given a set of naturals $S \equiv { x_i }$ and we want to find $n \equiv \min { sum(T): T \subseteq S }$, the smallest number that can't be written as a sum of elements from $S$.

By observation

Key observation: If we sort the set $S$ as $s[1] \leq s[2] \leq \dots \leq s[n]$, then we must have $s[1] = 1$. For if not, then $1$ is the smallest nmber which cannot be written as a sum of elements.
Next, if we think about the second number, it must be $s[2] = 2$. If not, we return $2$ as the answer.
The third number $s[3]$ can be $3$. Interestingly, it can also be $4$, since we can write $3 = 1 + 2$, so we can skip $3$ as an input.
What about $s[4]$? If we had $[1 \leq 2 \leq 3]$ so far, then see that we can represent all numbers upto $6$. If we have $[1 \leq 2 \leq 4]$ so far, then we can represent all numbers upto $7$. Is it always true that given a "satisfactory" sorted array $A$ (to be defined recursively), we can always build numbers upto $\sum A$?
The answer is yes. Suppose the array $A$ can represent numbers upto $\sum A$. Let's now append $r \equiv sum(A)+1$ into $A$. ($r$ for result). Define B := append(A, r). We claim we can represent numbers $[1 \dots (r+\sum A)]$ using numbers from $B$. By induction hypothesis on A. We can represent $[1 \dots \sum A ]$ from $A$. We've added $r = \sum A + 1$ to this array. Since we can build numbers $[1\dots \sum A]$ from $A$, we can add $r$ to this to build the range $[r+1 \dots r + \sum A]$. In total, by not choosing $r$, we build the segment $[1 \dots \sum A ]$ and by choosing $r$ we build the segment $[\sum A + 1 \dots \sum A + r]$ giving us the full segment $[1 \dots \sum A + r]$.

Take 2: code

Input processing:

void main() {
  int n;
  cin >> n;
  vector<ll> xs(n);
  for (ll i = 0; i < n; ++i) {
    cin >> xs[i];
  }

Sort to order array

  sort(xs.begin(), xs.end());

Next define r as max sum seen so far.

  ll r = 0; // Σ_i=0^n xs[i]

We can represent number $[0\dots r]$ What can $xs[i]$ be? If it is greater than $(r+1)$, then we have found a hole. If $xs[i] = r+1$, then we can already represent $[0\dots r]$. We now have $(r+1)$. By using the previous numbers, we can represent the sums $(r+1) + [0 \dots r]$, which is equal to $[r+1 \dots 2r+1]$.
More generally, if $xs[i] < r+1$, we can represent $[0 \dots r]; [xs[i]+0, xs[i]+r]$.
The condition that this will not leave a gap between $r$ and $xs[i]$ is to say that $xs[i]+0 \leq (r+1)$.

  for (ll i = 0; i < n; ++i) {
    if (xs[i] <= r+1) {
      // xs[i] can represent r+1.
      // We can already represent [0..r]
      // By adding, we can represent [0..r] + (xs[i]) = [xs[i]..r+xs[i]]
      // Since xs[i] <= r+1, [xs[i]..r+xs[i]] <= [r+1, 2r+1].
      // In total, we can represent [0..r] (not using xs[i]) and [<=r+1, <=2r+1]
      // (by using xs[i]) So we can can be sure we won't miss numbers when going
      // from [1..r] to [xs[i]<=r+1...] The largest number we can represent is
      // [xs[i]+r].
      r += xs[i]; // max number we can represent is previous max plus current
    } else {
      // xs[i] > r+1. We have a gap at r+1
      cout << r + 1 << "\n";
      return;
    }
  }
  cout << r + 1 << "\n";
}

Example of RVs that are pairwise but not 3-way independent.

Define X, Y to be uniformly random {0, 1} variables. Z = X xor Y. Each of the pairs are independent, but X, Y determine Z so it's not 3-way independent.

Notes on Liam O Connor's thesis: Cogent

AutoCoress: cool tool
sel4: translate C to HOL using AutoCoress
cake: verify subset of ML
Cogent has no recursion, provide higher order iterators/recursion schemes to do stuff.
We do this by using the ! operator. Converts linear, writeable type into read-only type. Function that takes value of type Buffer! is free to read, but not write to Buffer.
Constraint based type inference: (1) generate constraints, (2) solve.
refinement relation R between values in the value semantics and states in the update semantics, and show that any update semantics evaluation has a corresponding value semantics evaluation that preserves this relation. When each semantics is viewed as a binary relation from initial states to final states (outputs), this requirement can be succinctly expressed as a commutative diagram...
"Translation is the art of failure. Umberto Eco" --- nice.
For the most part, this is because these refinement stages involve shallow embeddings, which do not allow the kind of term inspection needed to directly model a compiler phase and prove it.
Strange, refinement relation goes upwards? Not downwards?
State = (Set (a, state), bool) seems weird to me. I would have expected State = Set (a, state, bool). But I guess if some contol flow path leads to UB, you can blow everything out.
Translation validation: for each output, produce proof of correctness. Different from proving a compiler correct. More like a proof certificate.
The reasoning behind the decision to relate representations instead of Cogent types to C types is quite subtle: Unlike in C, for a Cogent value to be well-typed, all accessible pointers in the value must be valid (i.e. defined in the store μ) and the values those pointers reference must also, in turn, be well-typed. For taken fields of a record, however, no typing obligations are required for those values, as they may include invalid pointers (see the update semantics erasure of the rules in Figure 4.5). In C, however, taken fields [what is a taken field?] must still be well-typed, and values can be well-typed even if they contain invalid pointers. Therefore, it is impossible to determine from a Cogent value alone what C type it corresponds to, making the overloading used for these relations ambiguous
Cogent is a total language and does not permit recursion, so we have, in principle, a well-ordering on function calls in any program. Therefore, our tactic proceeds by starting at the leaves of the call graph, proving corres theorems bottom-up until refinement is proven for the entire program. [amazing]
With this state definition, it is not well-defined to take a pointer to a stack-allocated variable, nor to reinterpret stack memory as a different type. C code that performs such operations is rejected by the parser.
At the moment, such processes are implemented in Cogent with a C shell, which awaits events in a loop and executes a Cogent function whenever an event occurs. These are clearly better speci ed as productive corecursive programs. Extending Cogent to support corecursion will likely be ultimately needed in order to support moving these particular C loops into Cogent. Fortunately, Isabelle also supports corecursive shallow embeddings, providing us with a direct translation target.
Future work: Property based testing, Concurrency, Recursion+Non-termination+Coinduction, Richer type system (refinement types), Data layout /Data description

C++ `lower_bound`, `upper_bound` API

I never remember what precisely lower_bound returns, so this is me collecting this information in a way that makes sense to me. The API docs say

Returns an iterator pointing to the first element in the range [first,last) which does not compare less than val.

So lower_bound(first, last, bound) it finds the leftmost location l in [first..last) such that as[l] >= bound.
See that it can be equal to the value bound.

In contrast, upper_bound says:

Returns an iterator pointing to the first element in the range [first,last) which compares greater than val.

So upper_bound(first, last, bound) it finds the leftmost location l in [first..last) such that as[l] > bound.

Pictorially

If we have a range:

<<<<<<< ======= >>>>>>>>
        |     |
        L     R

The < values are less than bound, = values are equal to bound, and > values are greater than bound, then lower_bound and upper_bound return iterators to represent the = range [L, R] in half-open form.
So we will have [lower_bound, upper_bound) = [L, R]. This matches the C++ API where everything uses half-open intervals.

<<<<<<< ======= >>>>>>>>
        L     R |
        lower   upper

Traversals

$[L, H]$: loop as for(auto it = lowerbound(l); it < upperbound(h); ++it) {}. This works since upperbound(h) will find first index > h, so we include all =h.
$[L, H)$: loop as for(auto it = lowerbound(l); it <= lowerbound(h); ++it) {}. This works lowerbound(h) first first index >= h, so we don't include any =h.
$(L, H]$: use for(auto it = upperbound(l); it <= upperbound(h); ++it) {}. upperbound(l) finds first index >l, so we ignore values =l.
$(L, H)$: use for(auto it = upperbound(l); it < lowerbound(h); ++it) {}.

How to think about which one we want? This about it as lowerbound shifts iterators towards the left, and upperbound shifts iterators to right.

For [L, we want to shift beginning leftwards, so lowerbound(L).
For (L, we want to shift beginning rightwards, so upperbound(L).
For H], we want to shift ending rightwards, so upperbound(H).
For H), we want to shift ending leftwards, so lowerbound(H).

`equal_range`

To unify the above description, one can simply use std::equal_range(fst, last, val) which returns the half-open interval [l, r) where the array has value val. This is equivalent to returning a pair of lower_bound, upper_bound.

Books that impart mental models

I love books that impart menetal models of how a domain expert thinks about their field. This was something I loved in particular about TiHKAL which describes reaction mechanisms. I'd love references to other books that do the same.

Subarrays ~= prefixes

To solve any problem about subarrays, we can reinterpret a subarray [l..r] as a prefix [0..r] - [0..l]. For example, to find all subarrays [l..r] whose sum of elements divides n, we can think of this as finding a subarray [l..r] where the sum of elements modulo n is zero. This is CSES' subarray divisibiity problem:


int main() {
    int n;
    cin >> n;
    vector<ll> xs(n);
    for (int i = 0; i < n; ++i) {
        cin >> xs[i]; xs[i] = xs[i] % n; if (xs[i] < 0) { xs[i] += n; }
    }

    ll count = 0; // number of subarrays with sum = 0 (mod n)
    ll cursum = 0; //  current sum [0..i]
    // number of subarrays [0..r] (for some r) such that Σa[i] = count.
    map<ll, ll> partial_sum_count;
    partial_sum_count[0] = 1;

    for (int i = 0; i < n; ++i) {
        // current sum [0..i]
        cursum = (cursum + xs[i]) % n;

        // for each [0..j] (for j < i) with sum cursum, we want:
        // sum([i..j]) = 0
        // => sum([0..i]) - sum([0..j)) = 0
        // => sum([0..i]) = sum([0..j))
        // for each such `j`, we get one subarray.
        auto it = partial_sum_count.find(cursum);
        if (it != partial_sum_count.end()) {
            count += it->second;
        }

        // partial sum [0..i] = cursum
        partial_sum_count[cursum]++;
    }

    cout << count << "\n";

    return 0;
}

Operations with modular fractions

Quick note on why it's legal to perform regular arithmetic operations on fractions $a/b$ as operations on $ab^{-1}$ where $ab^{-1} \in \mathbb Z/pZ$.
The idea is that we wish to show that the map $a/b \mapsto ab^{-1}$ is a ring homomorphism $\phi: \mathbb Q \to \mathbb Z/p \mathbb Z$.
The proof: (i) the map $Z \rightarrow Z/pZ$ is a ring homormophism, (ii) map from an integral domain to a field always factors through the field of fractions of the domain, we get a map $\phi: \mathbb Q \rightarrow \mathbb Z/ p \mathbb Z$. So from abstract nonsense, we see that $\phi$ will be a well defined ring.hom.
More down to earth: let's check addition multiplication, and multiplicative inverse. All else should work automagically.
For addition, we wish to show that $\phi(a/b + c/d) = \phi(a/b) + \phi(c/d)$. Perform the calculation:

\begin{aligned} &\phi(a/b + c/d) \ &=\phi((ad + bc)/bd) \ &= (ad + bc)(bd)^{-1}\ &= abb^{-1}d^{-1} + bcb^{-1}d^{-1} \ &= ad^{-1} + cd^{-1} \ &= \phi{a/d} + \phi{c/d} \ \end{aligned}

For multiplication, we wish to show that $\phi(a/b \cdot c/d) = \phi(a/b) \cdot \phi(c/d)$:

\begin{aligned} &\phi(a/b \cdot c/d) \ &=\phi{ac/bd} &= ac(bd)^{-1} \ &= acd^{-1}b^{-1} \ &= ab^{-1} \cdot cd^{-1} \ &= \phi{a/b} \cdot \phi{c/d} \ \end{aligned}

For inverse, we wish to show that $\phi(1/(a/b)) = \phi(a/b)^{-1}$:

\begin{aligned} &\phi(1/(a/b)) &=\phi{b/a} &= ba^{-1} &= (ab^{-1})^{-1} &= \phi(a/b)^{-1} \end{aligned}

Thus, we can simply represent terms $a/b$ in terms of $ab^{-1}$ and perform arithmetic as usual.

Modular inverse calculation

Easy way to calculate $a^{-1}$ mod $p$ is to use $a^{p-2}$. We know that $a^{p - 1} \equiv 1$ from Lagrane's theorem, so $a^{p-2} \cdot a \equiv 1$, or $a^{-1} \equiv a^{p-2}$. This can be done fairly quickly with repeated exponentiation.
Another way to do this is to use extended eucliean division. Suppose $ a \not \equiv 0$ (mod p). Then we can find numbers $\alpha, \beta$ such that $a \alpha + p \beta = gcd(a, p) = 1$. If we look at the whole equation (mod $p$), we find that $a \alpha \equiv 1$ (mod $p$), or $\alpha$ is the modular inverse of $a$.

pair<int, int> euc(int x, int y) {
  if (x < y) { return euc(y, x); }
  // x > y
  if (x % y == 0) { return {0, 1}; }
  int a, b; std::tie(a, b) = euc(y, x%y);
  // ay + b(x%y) = gcd(y, x%y) = gcd(x, y)
  // ay + b(x - y(x//y)) = gcd(y, x%y) = gcd(x, y)
  // ay + bx - by(x//y)  = gcd(y, x%y) = gcd(x, y)
  // (a - b(x//y))y + bx = gcd(y, x%y) = gcd(x, y)
  // bx + (a - b(x//y))y = gcd(y, x%y) = gcd(x, y)
  // intuition?
  return {b, a - b*(x/y)};
}

The number of pairs `(a,b)` such that `ab≤x` is `O(xlogx)`

Fix a given a. ab ≤ x implies that b ≤ x/a, or there are only x/a possible values for b. If we now consider all possible values for a from 1 upto x, we get:

$$ \begin{aligned} |{ (a, b) : ab <= x }| = \sum_{a=1}^x |{ b: b <= x/a }| \leq \sum_{a=1}^x |x/a| \leq x \sum_{a=1}^x (1/a) \leq x \log x \end{aligned} $$

To show that the harmonic numbers are upper bounded by $\log$, can integrate: $\sum_{i=1}^n 1/i \leq \int_0^n 1/i = \log n$

Relationship to Euler Mascheroni constant

This is the limit $\gamma \equiv \lim_{n \to \infty} H_n - \log n$. That this is a constant tells us that these functions grow at the same rate. To see that this si indeed a constant, consider the two functions:

$f(n) \equiv H_n - \log n$ which starts at $f(1) = 1$ and strictly decreases.
$g(n) \equiv H_n - \log(n+1)$ start lower at $g(1) 1 - \log 2$ and strictly increases. [why?]
Also, $\lim_n f(n) - g(n) = 0$. So these sandwhich something in between, which is the constant $\gamma$.

DP as path independence

Dp is about forgetting the past / path independence. Doesn't matter how we got to a state, only what the state is. For example, to DP on subsequences, we don't care about how we got to a given subsequence. We only care about the final result that we computed for that subsequence. This lets us "extend" knowledge about a subsequence. So we go from 2^n (subsets), to 2 followed by 2 followed by 2 followed by 2, since at each stage, we forget how we got there and collate information.
In this light, the recursive sub-computation is the "path dependent" part since it tries a path. The path independence states that it's safe to cache the results of the sub-computation, since all that matters is the final state (inputs).

Binary search to find rightmost index which does not possess some property

// p for predicate/property
// precondition: p(0) = false
// precondition: p(1 << NBITS) = last ix we process.
// precondition: p is monotonic;
//   once it switches to true, does not switch back to false.

if (p(1 << NBITS) == 0) { return 1 << NBITS; }
else {
  assert(p(1<<NBITS) == 1);
  int ans = 0;
  for (int i = NBITS-1; i >= 0; i--) {
    int k = 1 << i;
    assert(p(ans + 2*k) == 1);
    if (p(ans + k) == 0) {
      ans = ans + k;
    }
  }
}
// postcondition:
// ans is largest index such that
// has_some_poperty(ans) = 0

Claim 1: (Correctness) p(ans[i]) = 0. By precondition, this is true before the loop. See that it's a loop invariant, as we only update ans[i] to ans[i]+k if p(ans[i]+k) = 0. Thus, is is true after the loop.
Claim 2: (Maximality): At loop iteration i: p(ans[i] + 2k[i]) = 1. We cannot improve our solution by using previous jump lengths.

This implies optimality once the loop ends. At the end of the loop we have i = -1. So:

2k[-1] = 2(1/2) = 1
finalans = ans[-1]
---
p(ans[-1] + 2k[-1]) = 1
=> p(finalans+1) = 1

Proof of Claim 2: induction on i
Suppose claim 2 is true till index i: p(ans[i] + 2k[i]) = 1.
To prove: induction hypothesis holds at index (i-1).
Case analysis based on loop body at i: p(ans[i] + k[i]) = 0 or 1
(a) p(ans[i] + k[i]) = 0. We update ans[i-1] = ans[i] + k[i].
We wish to show that the loop invariant holds at i-1: p(ans[i-1]+2k[i-1]) == 1.

$$ \begin{aligned} &\text{k value: } k[i] = 2^i \ &\text{(k-1) value: } k[i-1] = 2^{i-1} = 2k[i] \ &\text{Ind: } p(ans[i] + 2k[i]) = 0 \ &\text{Case (a): } p(ans[i] + k[i]) = 0 \ &\text{Update: } ans[i-1] \equiv ans[i] + k[i] \ &p(ans[i-1] + 2k[i-1]) \ &= p((ans[i] + k[i]) + 2k[i-1]) \ &= p(ans[i] + k[i] + k[i])\ &= p(ans[i] + 2k[i]) \ &= 1 ~\text{(By Induction Hyp.)} \end{aligned} $$

We've shown that the induction hypothesis hold at index $(i-1)$ in case (a) where we update the value of $ans[i]$.
(b) If p(ans[i] + k[i]) = 1, then we update ans[i-1] = ans[i].
We wish to show that the loop invariant holds at i-1: p(ans[i-1]+2k[i-1]) ==1.

$$ \begin{aligned} &\text{k value: } k[i] = 2^i \ &\text{(k-1) value: } k[i-1] = 2^{i-1} = 2k[i] \ &\text{Ind: } p(ans[i] + 2k[i]) = 0 \ &\text{Case (b): } p(ans[i] + k[i]) = 1 \ &\text{Update: } ans[i-1] \equiv ans[i] \ &p(ans[i-1] + 2k[i-1]) \ &= p(ans[i] + 2k[i-1]) \ &= p(ans[i] + k[i] + k[i])\ &= p(ans[i] + 2k[i]) \ &= 1 ~\text{(By Induction Hyp.)} \end{aligned} $$

We've shown that the induction hypothesis hold at index $(i-1)$ in case (b) where we don't change the value of $ans[i]$.
In summary, the loop invariant is held at index $(i-1)$ assuming the loop invariant is satisfied at index $(i)$, for both updates of $ans[i]$. Thus, by induction, the loop invariant holds for all iterations.
Elaborated proof of why p(ans[0]+1) = 1 at the end of the loop

See that we can insert a new invaraiant at the end of the loop which asserts p(ans[i]+k[i]) == 1:

if (p(1 << nbits) == 0) { return 1 << nbits; }
else {
  assert(p(1<<nbits) == 1);
  int ans = 0;
  for (int i = nbits-1; i >= 0; i--) {
    int k = 1 << i;
    assert(p(ans + 2*k) == 1);
    int ans2;
    if (p(ans + k) == 0) {
      ans2 = ans + k;
      // ans2 + k
      // = (ans + k) + k
      // = ans + 2k
      // = 1 (from assertion)
    } else {
       ans2 = ans;
      // ans2 + k
      // = ans + k
      // = 1 [from else branch]
    }
    // ## new loop end invariant ##
    // true from then, else branch.
    assert(p(ans2+k) == 1)
    ans = ans2;
  }
}

We've proven the correctness of the loop invariant at the end of the loop, given the prior loop invariant at the beginning of the loop.
So, At the end of the (i=0) iteration, we have k=1, and so p(ans+1) == 1, which is the "rightmost index" condition. that we originally wanted.

Fully elaborated proof

if (p(1 << nbits) == 0) { return 1 << nbits; }
else {
  assert(p(1<<nbits) == 1);
  int ans = 0;
  // p(ans[nbits-1] + 2*(1<<nbits-1))
  // = p(0 + 1 << nbits)
  // = p(1 << nbits)
  // = 1 [from assert]
  for (int i = nbits-1; i >= 0; i--) {
    int k = 1 << i;
    // From previous loop iteratio (i+1):
    // ---------------------------------
    // p(ans[(i+1)-1] + k[i+1]) == true
    // => p(ans[i] + k[i+1]) == true
    // => p(ans[i] + 2k[i]) == true
    assert(p(ans + 2*k) == true);

    if (p(ans + k) == 0) {
      ans += ans + k;
      // ASSIGNMENT: ans[i-1] = ans[i] + k[i]
      // p(ans[i-1] + k[i])
      // = p(ans[i] + k[i] + k[i])
      // = p(ans[i] + 2k[i])
      // = p(ans[i] + k[i+1])
      // = 1 (from induction hyp)
    } else {
       ans = ans; // no-op
       // ASSIGNMENT: ans[i-1] = ans[i].
       // p(ans[i-1] + k[i])
       // = p(ans[i] + k[i])
       // = 1 (from else branch)
    }
    // ## new loop end invariant ##
    // p(ans[i-1] + k[i])== 1
    assert(p(ans+k) == 1)
  }
}

Simplified implementation

If we are willing to suffer some performance impact, we can change the loop to become significantly easier to prove:

if (p(1 << nbits) == 0) { return 1 << nbits; }
else {
  assert(p(1<<nbits) == 1);
  int ans = 0;
  int i = nbits-1;
  while(i >= 0) {
    assert (p(ans+2*k) == 1);
    int k = 1 << i;
    if (p(ans + k) == 0) {
      ans += ans + k;
    } else {
        i--;
    }
    assert(p(ans) == 0)
  }
}

In this version of the loop, we only decrement i when we are sure that p(ans+k) == 0. We don't need to prove that decrementing i monotonically per loop trip maintains the invariant; Rather, we can try "as many is as necessary" and then decrement i once it turns out to not be useful.

Relationship to LCA / binary lifting

This is very similar to LCA, where we find the lowest node that is not an ancestor. The ancestor of such a node must be the ancestor.

int lca(int u, int v) {
    if (is_ancestor(u, v)) return u;
    if (is_ancestor(v, u)) return v;

    // u is not an ancestor of v.
    // find lowest parent of u that is not an ancestor of v.
    for (int i = l; i >= 0; --i) {
        if (!is_ancestor(up[u][i], v))
            u = up[u][i];
    }
    return up[u][0];
}

Correctness of `lower_bound` search with half-open intervals

// precondition: `xs` is sorted.
// find rightmost i such that xs[i] <= y and dp[i+1] > y.
int tallest(vector<long> &xs, int y) {
    // [l, r)
    int l = 0, r = dp.size();
    // precondition: l < r
    while(1) {
        if (l + 1 == r) { return l; }
        // info gained from if: r > (l+1)
        int m = (l+r)/2;
        // should this be (xs[m] > y) or (xs[m] >= y)?
        if (xs[m] > y) {
            r = m; // will decrease interval floor division.
        } else {
            // r > (l+1)
            // so m := (l+r/2) > (2l+1)/2 > l.
            l = m;
        }
    }
}

Firt see that if we can find such an i, then in the extreme case where the array does not have a greater element, we would like the find the rightmost i that fulfils the condition that xs[i] <= y. So in our imagination, we right pad the array with an infinitely large value.
We wish to know whether the if condition should have xs[m] > y or xs[m] >= y before it decides to shrink the search range.
Intuitively, we wish to move the search range rightwards. So if we have xs[m] == y, we must move l towards m to move the search range rightwards. For more clarity, let's write the above as:

// precondition: `xs` is sorted.
// find i such that xs[i] <= y and dp[i+1] > y.
int tallest(vector<long> &xs, int y) {
    // [l, r)
    int l = 0, r = dp.size();
    // precondition: l < r
    while(1) {
        if (l + 1 == r) { return l; }
        // info gained from if: r > (l+1)
        int m = (l+r)/2;
        // should this be (xs[m] > y) or (xs[m] >= y)?
        if (xs[m] > y) {
            // move interval towards `l` for smaller values.
            r = m; // will decrease interval floor division.
        } else if (xs[m] < y) {
            // move interval towards `r` for larger values.
            // r > (l+1)
            // so m := (l+r/2) > (2l+1)/2 > l.
            l = m;
        } else {
            //xs[m] == y
            // we want rightmost index `l` where `xs[l] <= y`.
            // - this `xs[m]` is a legal index.
            // - we want rightmost `m`. Since `m > l`, move `l` rightward by setting `l = m`.
            l = m;
        }
    }
}

Greedy Coin change: proof by probing

Probing the coin set `{1, 5, 10, 20, 100}`

Let O* be optimal solution for this coin set. I'll write copies x [coinval$] notationally.
O* will convert 5x[1$] → 1x[5$] , because it's better to use less coins.
O* will convert 2x[5$] → 1x[10$]
O* will convert 2x[10$] → 1x[20$]
O* will convert 5x[20$] → 1x[100$]
So we summarize: O* can have at most: 4x[1$], 1x[5$], 1x[10$], 4x[20$]. If it has more than these, it can convert to one copy of a larger coin, losing optimality.

Optimal takes as many `100$` as greedy.

Recall: G (the greedy solution) takes as many 100, 10, 5, 1 as possible, starting from 100, working its way down to 1.
Let G[100$] be the number of copies of the 100$ coin the greedy solution uses to represent n.
Claim: O[100$] >= G[100$]. Since O is optimal, it won't take more than greedy since that's wasteful, so as a Corollary O[100$] = G[100$].
Suppose for contradiction that O[100$] < G[100$]. Then there is a 100$ to be made up by O, which G fulfils by using a [100$] coin.
We know by probing that if we stick to coins less than [100$], O can have at most 4x[20$] + 1x[10$] + 1x[5$] + 4x[1$] coins.
See that we can't add any more of [1$], [5$], [10$], [20$]. For example, suppose we try and use another [1$] coin. This means we have 4x[20$] + 1x[10$] + 1x[5$] + 5x[1$]. From probing, we know we should change 5x[1$] → 1x[5$]. This changes the sum to 4x[20$] + 1x[10$] + 2x[5$]. From probing, we know 2x[5$] → 1x[10$]. The sum becomes 4x[20$] + 2x[10$]. Again from probing, we know to be optimal and use less coins, we should change 2x[10$] → 1x[20$]. This makes the sum 5x[20$]. This too should be changed to 1x[100$], a fact we learnt from probing. But this contradicts the assumption that we want to use only coins smaller than [100$]. So if we are using coins smaller than [100$], the maximum value we can represent is given by 4x[20$] + 1x[10$] + 1x[5$] + 4x[1$] .
The maximum value 4x[20$] + 1x[10$] + 1x[5$] + 4x[1$] adds up to 99$, which is one shy of 100$. So, it is impossible for us to represent a value of a 100 dollars using coins of value less than [100$] in an optimal fashion. Thus, O[100$] = G[100$], as it is best to take as many coins as possible.
Repeat the argument for smaller denominations.

Clean way to write burnside lemma

Burnside lemma says that $|Orb(G)| \equiv 1/|G| \sum_{g \in G} fix(g)$. We prove this as follows:

$$ \begin{aligned} &\sum_{g \in G} fix(g) \ &= \sum_{g \in G} |{x : g(x) = x }| \ &= |{(g, x) : g(x) = x }| \ &= \sum_{x \in X}|{x : g(x) = x }| \ &= \sum_{x \in X} Stab(x) \end{aligned} $$

From orbit stabilizer, we know that $|Orb(x)||Stab(x)| = |G|$.
Since $|Orb(x)$ is the total cardinality of the orbit, each element in the orbit contributes $1/|Orb(x)|$ towards cardinality of the full orbit.
Thus, the sum over an orbit $\sum_{x \in Orb(x)} 1/|Orb(x)|$ will be 1.
Suppose a group action has two orbits, $O_1$ and $O_2$. I can write the sum $\sum_{x \in g} 1/|Orb(x)|$ as: $\sum_{x \in O_1} 1/|O_1| + \sum_{x \in O_2} 1/|O_2|$, which is equal to 2.
I can equally write the sum as $\sum_{o \in Orbits} \sum_{x \in o} 1/|o|$. But this sum is equal to $\sum_{o \in Orbits} \sum_{x \in o} 1/|Orb(x)|$.
This sum sums over the entire group, so it can be written as $\sum_{x \in G} 1/|Orb(x)|$.
In general, the sum over the entire group $\sum_{x \in g} 1/|Orb(x)|$ will be the number of orbits, since the same argument holds for each orbit.

$$ \begin{aligned} &= \sum_{x \in X} Stab(x) \ &= \sum_{x \in X} |G|/|Orb(x)| \ &= |G| \sum_{o \in orbits} \sum_{x \in o} 1/|o| \ &= |G| \texttt{num.orbits} \ \end{aligned} $$

So we have derived:

$$ \begin{aligned} &\sum_{g \in G} fix(g) = |G| \texttt{num.orbits} \ &1/|G| (\sum_{g \in G} fix(g)) = \texttt{num.orbits} \ \end{aligned} $$

If we have a transformation that fixes many things, ie, $fix(g)$ is large, then this $g$ is not helping "fuse" orbits of $x$ together, so the number of orbits will increase.

Reference

The groupoid interpretation of type theory

The monograph by Martin Hofmann and Thomas Streicher is remarkably lucid. It opens by stating that UIP (uniqueness of identity proofs) is false by providing a model for the axioms of MLTT where UIP fails --- a groupoid!

Mnemonics for free = left adjoint

To free is a very liberal thought. Very left

Left and free have the same number of letters (4)

Where to scratch a cat

Scratch the sides of their rear legs - that's where they can't scratch themselves. Found this useful to know, since we've recently adopted a stray.

Mnemonic for Specht module actions

Consider the two extreme cases, of wide v/s narrow:

x = [* * *]
y = [#]
    [#]
    [#]

Consider x = [* * *]. It's very wide/fat, so it doesn't like much exercise, which is why it's columns stabilizer $C_x ={ e}$ is trivial. Thus, the action $A_x \equiv id$.
Consider y = [*][*][*]. It's very slim, and exercises quite a bit. So it's column stabilizer is $S_3$, and its action $A_y \equiv \dots$ has a lot of exercise.
Anyone can participate in $x$'s exercise regime. In particular, $A_x(y) = id(y) = y$ since $y$ doesn't tire out from the exercise regime of $x$.
On the other side, it's hard to take part in $y$'s exercise regime and not get TODOed out. If we consider $A_y(x)$, we're going to get zero because by tableaux, there are swaps in $A_y$ that leave $x$ invariant, which causes sign cancellations. But intuitively, $A_y(x)$ is asking $x$ to participate in $y$'s exercise regmine, which it's not strong enough to do, and so it dies.
In general, if $\lambda \triangleright \mu$, then $\lambda$ is wider/fatter than $\mu$. Thus we will have $A_\mu(\lambda) = 0$ since $A_\mu$ is a harder exercise regime that has more permutations.
Extend this to arrive at specht module morphism: If we have a non-zero morphism $\phi: S^\lambda \rightarrow S^\mu$ then $\lambda \rightarrow \mu$ [Check this?? Unsure]

Quotes from 'Braiding Sweetgrass'

Listening in wild places, we are audience to conversations in a language not our own

Puhpowee, she explained, translates as “the force which causes mushrooms to push up from the earth overnight.” As a biologist, I was stunned that such a word existed. In all its technical vocabulary, Western science has no such term, no words to hold this mystery. You’d think that biologists, of all people, would have words for life. But in scientific language our terminology is used to define the boundaries of our knowing. What lies beyond our grasp remains unnamed.

Only 30 percent of English words are verbs, but in Potawatomi that proportion is 70 percent. Which means that 70 percent of the words have to be conjugated, and 70 percent have different tenses and cases to be mastered..

Our toddlers speak of plants and animals as if they were people, extending to them self and intention and compassion—until we teach them not to. We quickly retrain them and make them forget. When we tell them that the tree is not a who, but an it, we make that maple an object;

We don’t know their names or their faces, but our fingers rest right where theirs had been and we know what they too were doing one morning in April long ago. And we know what they had on their pancakes. Our stories are linked in this run of sap; our trees knew them as they know us today..

I realize that those first homesteaders were not the beneficiaries of that shade, at least not as a young couple. They must have meant for their people to stay here. Surely those two were sleeping up on Cemetery Road long before the shade arched across the road. I am living today in the shady future they imagined, drinking sap from trees planted with their wedding vows. They could not have imagined me, many generations later, and yet I live in the gift of their care. Could they have imagined that when my daughter Linden was married, she would choose leaves of maple sugar for the wedding giveaway?

You should not be able to walk on a pond. It should be an invitation to wildlife, not a snare. The likelihood of making the pond swimmable, even for geese, seemed remote at best. But I am an ecologist, so I was confident that I could at least improve the situation. The word ecology is derived from the Greek oikos, the word for home. I could use ecology to make a good home for goslings and girls.

Our appetite for their fruits leads us to till, prune, irrigate, fertilize, and weed on their behalf. Perhaps they have domesticated us. Wild plants have changed to stand in well-behaved rows and wild humans have changed to settle alongside the fields and care for the plants—a kind of mutual taming.

In that awareness, looking over the objects on my desk—the basket, the candle, the paper—I delight in following their origins back to the ground. I twirl a pencil—a magic wand lathed from incense cedar— between my fingers. The willow bark in the aspirin. Even the metal of my lamp asks me to consider its roots in the strata of the earth.

I smile when I hear my colleagues say “I discovered X.” That’s kind of like Columbus claiming to have discovered America. It was here all along, it’s just that he didn’t know it. Experiments are not about discovery but about listening and translating the knowledge of other beings.

It seems counterintuitive, but when a herd of buffalo grazes down a sward of fresh grass, it actually grows faster in response. This helps the plant recover, but also invites the buffalo back for dinner later in the season. It’s even been discovered that there is an enzyme in the saliva of grazing buffalo that actually stimulates grass growth. To say nothing of the fertilizer produced by a passing herd. Grass gives to buffalo and buffalo give to grass.

Transfinite recursion: Proof

Let $(J, <)$ be a well-ordered set.
Denote by $[0, \alpha)$ the set ${ j \in J : j < \alpha }$ as suggestive notation. Similarly $[0, \alpha]$ is the set ${ j \in J: j \leq \alpha }$.
Let $r: (\forall \alpha \in J, [0, \alpha) \rightarrow O) \rightarrow O$ be a recursion formula, which when given a function $f: [0, \alpha) \rightarrow O$ which is well defined on $J$ upto $\alpha$, produce a value $r(\alpha) \in O$ that extends $f$ to be well defined at $\alpha$.
We wish to find a function $f(j)$ such that for all $j \in J$, $f(j) = r([0, j))$. So this function $f$ is deterined by the recursion principle $r$. We construct such a function by transfinite induction.
Let $J_0 \subseteq J$ be the set of $j \in J$ such that there exists a function $f_j: [0, j] \rightarrow O$ (see the closed interval!), which obeys the recursion formula upto $j$. That is, for all other $k \leq j$, we have that $f_j(k) = r(f_j|[0, k))$. Choose $k \leq j$ so that we check that $f_j(j) = r(f_j|[0, j))$.
Claim: the set $J_0$ is inductive.
Let $[0, j) \subseteq J_0$. Thus, for all $k < j$, there is a function $f_k: [0, k] \rightarrow O$ such that $f_k(l) = r(f_k|[0, l))$.
We must show that $j \in J_0$. So we must construct a function $f_j: [0, j] \rightarrow O$ such that ... (reader: fill in the blanks).
Handwavy: note that the set of functions ${ f_k : k \in [0, j) }$ all agree on their outputs since their outputs are determined by the recursion formula (Foraal: we can first prove that any function that satisfies the recursion scheme is uniquely defined).
Thus, we can build the function $g_j: [0, j) \rightarrow O$ given by $g_j \equiv \cup_{k \in [0, j)} f_j$. That is, we literally take the "set union" of the functions as ordered pairs, as the functions are all compatible. This gives us a function defined upto $j$.
The value of $f_j$ at $j$ must be $r(g_j)$. So we finally define $f_j \equiv g_j \cup { (j, r(g_j) }$. This is a uniquely defined function as $r$ is a function: $r: [0, j) \rightarrow O$ thus produces a unique output for a unique input $g(j)$.
We have a function $f_j$ that obeys the recursion schema: (1) at $j$, it is defined to obey the recursion schema; At $k < j$, it is written as union of prior $f_k$ which obey recursion schema by transfinite induction hypothesis.
Thus, we have $j \in J_0$, witnessed by $f_j$.
We have fulfilled the induction hypothesis. So $J_0 = J$, and we have a set of function ${ f_j : j \in J }$, all of which are compatible with each other and obey the recursion schema. We take their unions and define $f \equiv \cup_j f_j$ and we are done!

Transfinite induction: Proof

Let $(J, <)$ be a well-ordered set.
Let $S(\alpha) \equiv J < \alpha$, or $S(\alpha) \equiv { j \in J: j < \alpha }$. This is called as the section of $J$ by $\alpha$.
Let a $J_0 \subseteq J$ be inductive iff for all $\alpha \in J$, $S(\alpha) \subseteq J_0$ implies $\alpha \in J_0$. That is:

$$ \text{$J_0$ inductive} \equiv \forall \alpha \in J, S(\alpha) \subseteq J_0 \implies \alpha \in J_0 $$

Then transfinite induction states that for any inductive set $J_0 \subseteq J$, we have $J_0 = J$.
Proof by contradiction. Suppose that $J_0$ is an inductive set such that $J_0 \neq J$.
Let $W$ (for wrong) be the set $J_0 - J$. That is, $W$ is elements that are not in $J_0$.
$W$ is non-empty since $J_0 \neq J$. Thus, consider $w \equiv \min(W)$, which is possible since $J$ is well-ordered, thus the subset $W$ has a minimum element.
$w$ is the smallest element that is not in $J_0$. So all elements smaller than $w$ are in $J_0$. This, $S(w) \subseteq J_0$. This implies $w \in J_0$ as $J_0$ is inductive.
This is contradiction, as we start with $w$ is the smallest element not in $J_0$, and then concluded that $w$ is in $J_0$.
Thus, the set $W \equiv J_0 - J$ must be empty, or $J_0 = J$.

Thoughts on playing Em-Bm

I'm having some trouble playing Eminor followed by Bminor in quick succession. The problem was a type of analysis-paralysis, where I wasn't sure in what order I should barre the chord, and then place my other fingers. I'm trying to change my mental mode, where I keep in mind a "root finger", which for Bminor is the middle finger which I first place on the correct string , and then place all other fingers in relation to it. This seems to help, since the task becomes (a) place root finger (b) naturally place other fingers after it.

An explanation for why permutations and linear orders are not naturally isomorphic

the number of linear orders on a finite set is the same as the number of bijections: the factorial of the cardinality. Every linear order on a set is isomorphic to any other, but a bijection is only isomorphic to another which has the same size and number of cycles. Thus, we have two functors $Perm: Set \rightarrow Set$ which sends a set to its set of permutations, and $Ord: Set \rightarrow Set$ which sends a set to its set of linear orders, such that the functors are equal on all objects (upto set isomorphism --- ie, produce outputs of the same size) but the functors fail to be isomorphic, since they have different criteria for "being equal".

We can't define choice for finite sets in Haskell!

If all you have is a decidable equality relation on the elements, then there seems to be no function which can implement choice. That is, you can’t write a function choose :: Set a -> Maybe (a, Set a)

Concretely, suppose we represent sets as lists of nonrepeated elements. Then, we can write an operation choose :: Set a -> Maybe (a, Set a), which just returns a pair of the head and the tail if the list is nonempty, and returns Nothing if it is empty.

However, this operation does not respect equality on sets. Note that any permutation of a list representing a given set also represents that same set, but the choose operation returns different answers for different permutations. As a result, this operation is not a function, since it does not behave extensionally on finite sets!

I feel there should be an argument involving parametricity which makes this work for arbitrary datatype representations, since all we rely on is the fact that equality can’t distinguish permutations. But I haven’t found it yet.

nLab link

Geomean is scale independent

sqrt(ab) is dimensionally meaningful even if a and b are dimensionally different. I found this interesting, since it implies that Geomean is not "biased": arithmetic mean is more sensitive to large values (eg: (1 + 999)/2 = 500), while harmonic mean is more sensitive to small values. Geomean is neither, so it's more "balanced".

Thoughts on playing Em Bm.

Induction on natural numbers cannot be derived from other axioms

The idea is to consider a model of the naturals that obeys all axioms other than induction, and to then show how this model fails to be a model of induction. Thus, induction does not follow from the peano aximos minus the induction axiom. We build a model of naturals as $M \equiv \mathbb N \cup { * }$ where we define the successor on $M$ as $succ(n \in \mathbb N) = n + 1$ and $succ(*) = $. Now let's try to prove $P(m) \equiv succ(m) \neq m$ for all $m \in M$. $P(0)$ holds as $succ(0) = 1 \neq 0$. It is also true that if $P(m)$, then $P(m+1)$. However, it is NOT true that $\forall m \in M, P(m)$ since it does not hold for $ \in M$. So we really do need induction as an axiom to rule out other things.

Ordinals and cardinals

This a rough sketch of a part of set theory I know very little about, which I'm encountering as I solve the "supplementary exercises" in Munkres, chapter 1.

Ordinals

Two totally ordered sets have the same order type if there is a monotone isomorphism between them. That is, there's a function $f$ which is monotone, and has an inverse. The inverse is guaranteed to be motone (1), so we do not need to stipulate a monotone inverse.
Definition of well ordered set: totally ordered set where every subset has a least element.
Theorem: The set of well ordered sets is itself well ordered.
Definition ordinals: Consider equivalence classes of well ordered sets under order type. of well ordered sets with the same order type. The equivalence classes are ordinals.

(1) Inverse of a Monotone function is monotone.

Let $f: A \rightarrow B$ be monotone: $a < a'$ implies $f(a) < f(a')$. Furthermore, there is a function $g: B \rightarrow A$ such that $g(f(a)) = a$ and $f(g(b)) = b$.
Claim: if $b < b'$ then $g(b) < g(b')$.
Let $b < b'$. We must have (a) $g(b) < g(b')$, or (b) $g(b) = g(b')$, or (c) $g(b) > g(b')$.
If $g(b) < g(b')$ we are done.
Suppose for contradiction $g(b) \geq g(b')$ then we must have $f(g(b)) \geq f(g(b'))$ since $f$ is monotone. Since $f, g$ are inverses we get $b \geq b'$. This contradicts the assumption $b < b'$.
This doesn't work for partial orders because we may get $b$ and $b'$ as incomparable.

Von Neumann Ordinals

Von neumann ordinals: Representatives of equivalence classes of ordinals. Formally, each Von-Neumann ordinal is the well-ordered set of all smaller ordinals.
Formal defn of Von-Neumann ordinal $o$: (1) every element $x \in o$ will be a subset of $o$, since $x$ is itself a set { ordinal < x }, which is a subset of { ordinal < o }. (2) the set $o$ is well ordered by set membership, since two such ordinals will always be comparable, and one must contain the other.
For example of Von Neumann ordinals, consider 0 = {}, 1 = {0}, 2 = {0, 1}, 3 = {0, 1, 2}. We can order 3 based on membership: 0 ∈ 1, 2 so 0 < 1, 2. 1 ∈ 2 hence 1 < 2. This totally orders 3 based on set membership. Next, also see that a member of 3, such as 2, is in face 2 = {0, 1}, which is a subset of 3. So every member of 3 is a subset of 3. (Not vice versa: not every subset is a member! The subset {1, 2} is not a member of 3).

Limit ordinals

A limit ordinal is an ordinal that cannot be written as the successor of some other ordinal.
Theorem: An ordinal must be either zero, or the successor of some other ordinal, or a limit ordinal (2)
References on ordinals

Cardinality and cardinals

We can define cardinality as equivalence classes of sets that are equinumerous: ie, sets with bijections between them. This does not strictly speaking work due to set-theoretic issues, but let's go with it.
In each such equivalence class of sets which are equinumerous, there will be many well ordered sets. The smallest such well ordered set (recall that the set of well ordered sets is itself totally ordered). This is called as the cardinal for that cardinality.
So we redefine cardinality as the smallet ordinal $\alpha$ such that there is a bijection between $X$ and $\alpha$. This is motivated from the "equivalence class of all equinumerous sets", but sidesteps set theoretic issues. For this to work, we need well ordering. Otherwise, there could a set with no ordering that is in bijection with it.

Rank

The rank of the empty set is zero. The rank of a set is recursively the smallest ordinal greater than the ranks of all the members of the set. Every ordinal has a rank equal to itself.
$V_0$ is the empty set.
$V_{n+1} \equiv 2^{V_n}$. This defines $V$ for successor ordinals.
$V_\lambda \equiv \cup_{\beta < \lambda} V_\beta$. This defines $V$ for limit ordinals.
The set $V_\alpha$ are also callled stages or ranks. We can define the rank of a set $S$ to be the smallest $\alpha$ such that $S \subseteq V_\alpha$.

Inaccessible cardinal

A cardinal that cannot be created by adding cardinals, taking unions of cardinals, taking power sets of cardinals. So the set of cardinals smaller than an inacessible cardinal give a model for ZFC. if $\kappa$ is an inaccessible cardinal, the

bollu.github.io bollu.github.io copied to clipboard

Metadata

A Universe of Sorts

Siddharth Bhat

You could have invented Sequents

Fibrational category theory, sec 1.1, sec 1.2

Omega sets

PERs

Split and Cloven Fibrations

Simple Type Theory via Fibrations

Realisability models

Naming left closed, right open with start/stop

Nested vs mutual inductive types:

MiniSketch

Finger trees data structure

HAMT data structure

Embedding HOL in Lean

Module system for separate compilation

Second order arithmetic

Lean4 Dev Meeting

Macro

How to collate info?

Mapsto arrow

ext tactic

colGt

ppSpace

Scoped syntax

Trivia

Tactic development: Trace

Reducible

Categorical model of dependent types

Key ideas

Why is substitution pullback?

Isn't substitution composition?

Using this to do simply typed lambda calculus

Display maps

Categories with families

How should relative changes be measured

Logic of bunched implications

Coends

Ninja Coyoneda for containers

Natural Transformations as ends

Invoking Yoneda

Ends and diagonals

Relationship to haskell

Parabolic dynamics and renormalization

Quantifiers as adjoints

A first try: direct image, find right adjoint

Direct image, left adjoint

Inverse image, left adjoint.

Different types of arguments in Lean4:

Big list of lean tactics

Hyperdoctrine

Algebra of logic

Syntax

Formulas.

A category of types and terms.

Problem: we don't have identity arrows!

Reference

Fungrim

Category where coproducts of computable things is not computable

Homotopy continuation

Relationship between linearity and contradiction

Monads from Riehl

What is a monad

Monad from adjunction

Algebra for a monad $C^T$.

Factoring of forgetful functor of adjunction

Monadic adjunction

Category of models for an algebraic theory

Limits and colimits in categories of algebras

Riehl: Limits and colimits in categories of algebras

Lemma 5.6.1: If U is monadic over F, then U reflects isos

Corollary 5.6.2: Bijective continuous functions in CHaus are isos

Corollary 5.6.4: Any bĳective homomorphism arising from a monadic adjunction which forgets to Set will be iso

Thm 5.6.5.i A monadic functor U: H -> L creates any limits that L has.

Corollary 5.6.6: The inclusion of a reflective subcategory creates all limits

Corollary 5.6.7: Any category monadic over Set is complete

Corollary 5.6.9: Set is cocomplete

Category of models for alg. theory is complete

bollu.github.io
bollu.github.io copied to clipboard

`Mapsto` arrow

`ext` tactic

`colGt`

`ppSpace`

Tactic development: `Trace`

Lemma 5.6.1: If `U` is monadic over `F`, then `U` reflects isos

Corollary 5.6.4: Any bĳective homomorphism arising from a monadic adjunction which forgets to `Set` will be iso

Thm 5.6.5.i A monadic functor `U: H -> L` creates any limits that `L` has.

Corollary 5.6.7: Any category monadic over `Set` is complete

Corollary 5.6.9: `Set` is cocomplete

Why a sentinel of `-1` is sensible

Better `man` Pages via `info`

`simp` in Lean4

`unsafePerformIO` in Lean4: