The syntax of Church's lambda calculus can be described by BNF:
where \lambda x.x is called abstraction and term\ term is called application
The lambda calculus follows is left associative and extend as far as to the right, which means the following equations hold:
 we have: x\ y\ z = ((x\ y)\ z)
 we have: \lambda x. \lambda y.x\ y\ x=\lambda x. (\lambda y.((x\ y)\ x))
Bounded variable and Free variables
A variable x is said to be bounded by abstraction if it appears in the body part of \lambda x.t, or said to be free if it's not bounded by any enclosing abstraction, a lambda term with no free variables is said to be closed, a closed term is also called a combinator
A term is said to be a reducible expression, or redex, if it's in form \lambda x.y\ t because it can be reduced in one step, we use \betareduction to reduce a redex, it can be depicted graphically by:
which means "replace all the free occurrence of x in y by t"
we use term "free occurrence" because all x is free if you only consider t, since \lambda x is really not part of it, be aware that because of the left associativity, all the reductions at the same precedence are evaluated from left to right
There are several order/form of \betareductions:
 Full \betareduction, any redex can be reduced at any time, you can choose to always reduce the innermost or outermost redexes, or reduce them alternatively.
 Normal Order, the outermost redex is always reduced first.
 Call by name, the redexes can be evaluated at any time, except those who is part of the abstraction, which means, in the follow redex:
\lambda x.x\ (\lambda x.x\ (\lambda z.\ (\lambda x.x)\ z))
The parentheses are required, because otherwise the term will be evaluated in a difference way, remember that an abstraction extend as far as to the right. for example the last occurrence of \lambda x will be interpreted as \lambda x.x\ z rather than \lambda x.x and the second occurrence of \lambda x will become part of the abstraction of the first \lambda x which will contradict to the statement that we're about to make
the third occurrence of \lambda x.x can not be reduced to z, because is inside the body of abstraction \lambda z.\lambda x.x\ z, but the first two occurrences can be reduced unquestionably because they are not part of the abstraction. This reduction strategy is also used in langauges like Haskell, it is somehow lazyevaluated, because if you consider the innermost redex as an expression that works as a parameter of a function, then it won't be computed until it's value is required

Call by value, only outermost redex can be reduced after it's right part (the parameter) is already reduced to a value. which means the follow redex:
\lambda x.x\ (\lambda x.x\ (\lambda z.\ (\lambda x.x)\ z))will be reduced to \lambda x.x\ (\lambda z.(\lambda x.x)\ z) first, and then take a step to \lambda z.(\lambda x.x)\ z and then stops. be noticed that \lambda z.(\lambda x.x)\ z is not of form \lambda x.y\ z, so it's not a redex, it's an abstraction, hence won't be reduced.
Church Encoding
Theoretically, any data structure and operations can be encoded solely by lambda terms, such encoding is called Church Encoding
Church Boolean
The boolean values true
and false
can be encoded by:
true&=\lambda t.\lambda f.\ t\\
false&=\lambda t.\lambda f.\ f
\end{aligned}
which works similar to a ternary operator: true\ x\ y will return x and false\ x\ y will return y, so you can write a test function that take an encoded boolean value and two arguments(in lambda calculus, everything is function so these arguments are actually functions, too):
if you feed test with test\ true\ x\ y it will returns x and if you feed test with false it will returns y.
And some operations on booleans:
and&=\lambda c_1.\lambda c_2.\ \ c_1\ c_2\ false\\
or&=\lambda c_1.\lambda c_2.\ \ c_1\ true\ c_2
\end{aligned}
Consider the and function, if c_1 is true then the application c_1\ c_2\ false will yield c_2, which means and returns the value of c_2 if c_1 is true which implies that if both c_1 and c_2 is true then and will effectively yield true, and it will yield false if either c_1 is false or c_1 is true but c_2 is false. or function can be interpreted in a similar way.
Church Pairs
A pair is simply comprised of two values, first and second, we use church boolean we've seen in the last section to select either of them:
pair&=\lambda f.\lambda s.\lambda c.\ \ c\ f\ s\\
first&=\lambda p.\ p\ true\\
second&=\lambda p.\ p\ false
\end{aligned}
THe only subtlety here is the order of the arguments in pair: the conditional argument c is at the last position, because when we constructing a pair we will use pair\ a\ b, which lacks the last parameter c thus effectively yields a predicate, when you apply first or second to it, it is equals to evaluate that predicate to get the real first or second value inside the pair.
Church Numerals
A set can be considered as an encoding of the natural numbers if there is a onetoone correspondence between that set and \mathbb{N}, church numerals is a way to encoding lambda terms to represent natural numbers, we have:
c_0&=\lambda s.\lambda z.\ z\\
c_1&=\lambda s.\lambda z.\ s\ z\\
...\\
c_n&=\lambda s.\lambda z.\ s^n\ z\\
\end{aligned}
where s^n stands for "apply s for n times".
Then we can define arithmetic operators:
succ&=\lambda n.\lambda s.\lambda z.\ s\ (n\ s\ z)\\
plus&=\lambda m.\lambda n.\lambda s.\lambda z.\ m\ s\ (n\ s\ z)\\
times&=\lambda m.\lambda n.\lambda s.\lambda z.\ m\ (plus\ n)\ c_0\\
exp&=\lambda m.\lambda n.\lambda s.\lambda z.\ m\ (times\ n)\ c_1
\end{aligned}
in the above operators, times means "apply plus\ n, m times to c_0", which is exactly what "multiplication" means, and plus basically means "increment n by m times", and exp means "times n to c_1 by m times"
exp function can also be defined by exp=\lambda m.\lambda n.\ n\ m, if you expand it you will get:
\begin{aligned}
exp\ x\ y&=(\lambda m.\lambda n.\ n\ m)\ x\ y\\
&=y\ x\\
&=(\lambda s_1.\lambda z_1.\ s_1^y\ z_1)\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)\\
&=\lambda z_1.\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^y\ z_1\\
&=\lambda z_1.\ (\lambda s_2.\lambda z_2.\ s_2^x\ z_2)\ z_1\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y1}\ z_1)\\
&=\lambda z_1.\ (\lambda z_2.\ z_1^x\ z_2)\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y1}\ z_1)\\
&=\lambda z_1.\ (\lambda z_2.\ z_1^x\ z_2)\ (\lambda z_2.\ z_1^x\ z_2)\ ((\lambda s_2.\lambda z_2.\ s_2^x\ z_2)^{y2}\ z_1)\\
&=\lambda z_1.\ \underbrace{(\lambda z_2.\ z_1^x\ z_2)\ (\lambda z_2.\ z_1^x\ z_2)\ ...\ (\lambda z_2.\ z_1^x\ z_2)}_{(\lambda z_2.\ z_1^x\ z_2)\ \text{repeated for}\ y\ \text{times}}\ z_1
\end{aligned}keep unwinding the last part of the abstraction body, you will get repeated (\lambda z_2.\ z_1^x\ z_2) up to y times, which fits the definition of exponentiation perfectly
and we have pred:
zz&=pair\ c_0\ c_0\\
ss&=\lambda p.\ pair\ (second\ p)\ (plus\ c_1\ (second\ p))\\
pred&=\lambda m.\ first\ (m\ ss\ zz)
\end{aligned}
the pred function will apply ss to zz for m times, if m is c_0, apply ss to zz zero times will get zz itself and then yields c_0, otherwise you will get pair\ m1\ m, then first function will get the first component of the pair, which is m1.
Since we have pred, we can now write subtract:
which basically means "apply pred for n times on m"
Recursion and FixedPoint Combinators
A term is said to be a fixedpoint combinator, if after the reduction it yields exactly itself again, for example, (\lambda.x\ x.x)\ (\lambda x.\ x.x) is a fixedpoint combinator，reduce it will just make it selfreplicate once, this property of fixedpoint combinator allows us to create combinators that can be used for recursion, for example, the following fixedpoint combinator:
can be used to calculate a function's fixpoint, thus effectively supports recursion.
Free Variables
The set of all free variables in a lambda term can be defined inductively:
FV(x)&=\{x\}\\
FV(\lambda x.t_1) &= FV(t_1)\backslash x\\
FV(t_1\ t_2)&=FV(t_1)\cup FV(t_2)
\end{aligned}
Substitution
The substitution rule (\alphaconversion) of lambda terms, denoted by [x\mapsto s], can be defined inductively as:
&[x\mapsto s]x&&=s&&\\
&[x\mapsto s]y&&=y&&\text{if }y\neq x\\
&[x\mapsto s](\lambda y.t_1)&&=\lambda y.\ [x\mapsto s]t_1 &&\text{if }y\neq x\text{ and }y\notin FV(s)\\
&[x\mapsto s](t_1\ t_2)&&=[x\mapsto s]t_1\ [x\mapsto s]t_2
\end{aligned}
The requirement for the third rule is necessary, because if y is in FV(s), then after substitution it will be excluded from FV(s), such substitution will undoubtedly change the meaning of the term. A such phenomenon that a free variable becomes bounded after a subtitution is called variable capture
Operational Semantics
following the "Call by value" strategy, we can add a new term called value to the syntax:
this is because in the call by value strategy, a single abstraction is not reducible, which makes them can be effectively considered as normal forms.
We have inference rules of evaluation for lambda terms:
\cfrac{term_2 \rightarrow term_2'}{value_1\ term_2\rightarrow value_1\ term_2'}\\
(\lambda x.term)\ value_2\rightarrow [x\mapsto value_2]term_2
noticed that the use for value and term controls the order of the inference to fit the call by value strategy: when we are about to evaluate term_1\ term_2, we must use rule 1 first to reduce the term_1 to value_1, then use rule 2 to reduce term_2 to value_2, and finally use rule 3 to perform reduction on value1\ value2.
de Bruijn Index
The \betareduction involves some subtleties when bring them into the real world: the variable capture, which turns out to be harmful because it will change the meaning of the whole term, to avoid this, one way is to assign unique identifiers to each of the variables, de Bruijn Index is one of those approaches.
The de Bruijn Index says that, instead of assigning a string to the variables, we use number k to stands for "the variable bounded by k'th enclosing binder, which is the \lambdaterm", where by "enclosing" means the counting is insideout, for example, the de Bruijn form of
is
To define the de Bruijn Index formally, consider the kfree variable as "the free variable that requires at least k binders to be bounded", thus, we can define a set \mathcal{T} of sets that are indexed by k, and \mathcal{T}_k stands for "all the terms that can be bounded by k binders", let \mathcal{T} be a set of sets indexed by natural number such that:
& \text{1.}k\in \mathcal{T}_n \text{ if }0\leqslant k < n\\
& \text{2. if }t_1\in \mathcal{T}_n\text{, then }\lambda.t_1\in\mathcal{T}_{n1}\\
& \text{3. if }t_1\in\mathcal{T}_n\text{ and }t_2\in\mathcal{T}_n\text{, then }(t_1\ t_2)\in\mathcal{T}_n
\end{aligned}
Naming Context
In order to solve the problem where sometimes we don't know the actual index of a free variable v when substituting, we create a function \varGamma called naming context: Suppose x_0..x_n are variables from \mathcal{V}, the \varGamma=x_n..x_0 assigns each x_i the index i, which means x_n\mapsto n..x_0\mapsto 0, so that we can resolve the index of a particular variable from the context.
Variable Shifting
When performing substitutions on de Bruijn forms^{1}, the substitution may goes under abstraction, which means, according to the definition of de Bruijn Index, they need to be renamed to avoid variable capture, for example, [1\mapsto s](\lambda.\ 2), all the terms in s now have onemore enclosing binder, which means its free variables should be incremented by one, this is called variable shifting, the bound variables should stay intact, e.g., if s=2\ (\lambda.\ 0), only 2 should be lifted to 3 but not for 0, because 0 is a bound variable. The key idea is we keep track on how many binders (denoted c, for example) have been encountered, so all the variables less than c should stay intact because they are bounded.
Define dplace shift of a term t above cutoff c, denoted by \uparrow^{d}_c(t):
&\uparrow^{d}_c(k)&&=&&
\begin{cases}
k &\text{if }k < c\\
k+d &\text{if }k \geqslant c
\end{cases}\\
&\uparrow^{d}_c(\lambda.\ k_1) &&=&& \lambda.\ \uparrow^{d}_{c+1}(k_1)\\
&\uparrow^{d}_c(k_1\ k_2) &&=&& \uparrow^{d}_c(k_1)\ \uparrow^{d}_c(k_2)
\end{aligned}
Specifically, we use \uparrow^{d}(k) and \uparrow^{d}_0(k) interchangeably.
And now we can define the substitution on de Bruijn forms:
&[j\mapsto s](k)&&=&&
\begin{cases}
k &\text{if }j\neq k\\
s &\text{if }j=k
\end{cases}\\
&[j\mapsto s](\lambda.\ k_1)&&=&&\lambda.\ [j+1\mapsto \uparrow^{1}(s)](k_1)\\
&[j\mapsto s](k_1\ k_2)&&=&&([j\mapsto s]k_1\ [j\mapsto s]k_2)
\end{aligned}
And the \betareduction rule for de Bruijn forms:
The reason for \uparrow^{1} and \uparrow^1(v) is because after the reduction, the original \lambda sign just vanished, means there will be one less binder in the whole term, so we need to downshift the variables affected by this \lambda sign by one, which are those inside t, and as a countermeasure of the downshift, the variables inside v must be upshifted by one(because they will become a part of t which will be downshifted after the substitution) so that we won't ended up with some weird situation such as a negative variable.
Implementation
Here is an implementation of the untyped lambda calculus using OCaml
exception Inappropriate_term
exception Bad_index
type term =
 TmVar of string * int * int (* The second 'int' contains the total length of the context, for debugging purpose *)
 TmAbs of string * term (* the first 'string' stands for the name hint of the bound variable, since the internal representation will be de Bruijn index *)
 TmApp of term * term
type binding = NameBind (* For now it carries no extra info *)
type context = (string * binding) list
let ( >>= ) opt f = match opt with Some v > f v  None > None
let rec index_of f = function
 [] > raise Not_found
 head :: tail > if f head then 0 else 1 + index_of f tail
let pick_fresh_name ctx hint =
match List.exists (fun (name, _) > name = hint) ctx with
 true >
let new_name = hint ^ "_" in
((new_name, NameBind) :: ctx, new_name)
 false > ((hint, NameBind) :: ctx, hint)
(* append to frond  innermost has smaller index *)
let ctx_length ctx = List.length ctx
let resolve_opt ctx name = List.find_opt (fun (n, _) > n = name) ctx
let unwind_opt ctx index = List.nth_opt ctx index >>= fun (name, _) > Some name
let unwind_var ctx var_name =
let index = index_of (fun (name, _) > name = var_name) ctx in
TmVar (var_name, index, ctx_length ctx)
let ( * ) ctx var_name = unwind_var ctx var_name
let rec term_to_string ctx = function
 TmAbs (name, t') >
let ctx', name' = pick_fresh_name ctx name in
"(λ" ^ name' ^ ". " ^ term_to_string ctx' t' ^ ")"
 TmApp (lhs, rhs) >
"(" ^ term_to_string ctx lhs ^ " " ^ term_to_string ctx rhs ^ ")"
 TmVar (_, index, env_len) >
if ctx_length ctx = env_len then
let opt = unwind_opt ctx index in
match opt with Some name > name  None > raise Not_found
else raise Bad_index
let variable_shift place term =
let rec walk_shifting cutoff = function
 TmVar (var_name, index, env_len) >
if index >= cutoff then
(* add 'place' to 'env_len' because by shifting variable we effectively extended the context *)
TmVar (var_name, index + place, env_len + place)
else TmVar (var_name, index, env_len + place)
 TmAbs (name, t') >
TmAbs (name, walk_shifting (cutoff + 1) t')
(* Be careful about the definition, a new 'TmAbs' is required here *)
 TmApp (lhs, rhs) >
TmApp (walk_shifting cutoff lhs, walk_shifting cutoff rhs)
in
walk_shifting 0 term
let rec substitute before after = function
 TmVar (_, index, _) as var > if index = before then after else var
 TmAbs (name, t') >
TmAbs (name, substitute (before + 1) (variable_shift 1 after) t')
 TmApp (lhs, rhs) >
TmApp (substitute before after lhs, substitute before after rhs)
let beta_reduction = function
 TmApp (TmAbs (_, t), v) >
variable_shift (1) (substitute 0 (variable_shift 1 v) t)
 _ > raise Inappropriate_term
(* Under the callbyvalue strategy, the normal form is abstraction *)
let is_normal_form = function TmAbs (_, _) > true  _ > false
let rec eval_one_step_call_by_value = function
 TmApp (TmAbs (_, _), v) as app when is_normal_form v > beta_reduction app
 TmApp (lhs, rhs) when is_normal_form lhs >
let rhs' = eval_one_step_call_by_value rhs in
TmApp (lhs, rhs')
 TmApp (lhs, rhs) >
let lhs' = eval_one_step_call_by_value lhs in
TmApp (lhs', rhs)
 _ > raise Inappropriate_term
let rec eval_one_step_call_by_name = function
 TmApp (TmAbs (_, _), _) as app > beta_reduction app
 TmApp (lhs, rhs) >
let lhs' = eval_one_step_call_by_name lhs in
TmApp (lhs', rhs)
 _ > raise Inappropriate_term
let rec eval_one_step_full_beta_reduction = function
 TmAbs (name, term) >
let term', reducible = eval_one_step_full_beta_reduction term in
(TmAbs (name, term'), reducible)
 TmApp (TmAbs (_, _), _) as app > (beta_reduction app, true)
 TmApp (lhs, rhs) >
let lhs', lhs_reducible = eval_one_step_full_beta_reduction lhs in
let rhs', rhs_reducible = eval_one_step_full_beta_reduction rhs in
(TmApp (lhs', rhs'), lhs_reducible  rhs_reducible)
 TmVar (_, _, _) as var > (var, false)
let rec eval_call_by_value term =
try
let t' = eval_one_step_call_by_value term in
eval_call_by_value t'
with Inappropriate_term > term
let rec eval_call_by_name term =
try
let t' = eval_one_step_call_by_name term in
eval_call_by_name t'
with Inappropriate_term > term
let rec eval_full_beta_reduction term =
let t', reducible = eval_one_step_full_beta_reduction term in
if reducible then eval_full_beta_reduction t' else t'
let ctx = [ ("z", NameBind); ("s", NameBind); ("n", NameBind); ("m", NameBind) ]
let plus =
TmAbs
( "m",
TmAbs
( "n",
TmAbs
( "s",
TmAbs
( "z",
TmApp
( TmApp (ctx * "m", ctx * "s"),
TmApp (TmApp (ctx * "n", ctx * "s"), TmVar ("z", 0, 4)) )
) ) ) )
let c' =
TmAbs
( "s",
TmAbs
( "z",
TmApp (TmApp (TmVar ("s", 1, 2), TmVar ("s", 1, 2)), TmVar ("z", 0, 2))
) )
let c'' =
TmAbs
( "s",
TmAbs
( "z",
TmApp (TmVar ("s", 1, 2), TmApp (TmVar ("s", 1, 2), TmVar ("z", 0, 2)))
) )
(* λm. λn. λs. λz. m s (n s z) (λs. λz. s s z) (λs. λz. s s z) *)
let plus_2_and_2 = TmApp (TmApp (plus, c'), c'')
let call_by_name = eval_call_by_name plus_2_and_2
let call_by_value = eval_call_by_value plus_2_and_2
let full_reduction = eval_full_beta_reduction plus_2_and_2
let _ = print_endline ("Call by name: " ^ term_to_string [] call_by_name)
let _ = print_endline ("Call by value: " ^ term_to_string [] call_by_value)
let _ = print_endline ("Full β reduction: " ^ term_to_string [] full_reduction)
 The de Bruijn form here is confusing, it may end up being confused with "de Bruijn notation" which is, the reversal form of common lambda terms, but I'll just use it for simplicity here ↩︎
Comments  NOTHING