
Logjumps is a recently discovered technique for modular reduction over large prime fields. Unlike Montgomery reduction, which requires
Although Montgomery reduction is four decades old, Yuval Domb of Ingonyama first discovered this improvement when optimising client-side proving. To our knowledge, no-one else found this method before Yuval.
This research note offers a low-level explanation of Logjumps. It aims to help cryptography engineers gain intuition not only about the algebraic principles behind Logjumps and classic Montgomery reduction, but also how these algebraic principles translate to code.
Existing literature typically focuses on either high-level concepts or low-level optimisations, often leaving a gap between theory and implementation. To bridge it, I will first present the algebra behind Montgomery reduction, then explain how these principles apply to multiprecision modular reduction. Finally, I will do the same for Logjumps, and discuss its performance.
Reference code for the both algorithms can be found here.
Background
The Montgomery reduction algorithm takes an input
We use Montgomery form to describe field elements
This technique is essential for optimising cryptographic schemes that require consecutive multiplication-heavy field operations, such as elliptic curve arithmetic or algebraic hash functions. The performance gains from Montgomery multiplication should ideally far outweigh the overhead of converting the inputs to and from Montgomery form.
In this note, we use the following symbols:
Symbol | Description | Notes |
---|---|---|
The field modulus. | A prime number. | |
The value to reduce. | ||
The number of bits per word. | Must fit within a single CPU-native data type, such as a 32 or 64-bit unsigned integer. | |
The number of words. | Must be large enough such that |
|
The Montgomery radix. | Coprime to and greater than |
|
The precomputed Montgomery constant. |
The steps of the reduction algorithm are:
- Note: division by
is a right-shift by bits.
- Note: division by
- If
then . - Return
.
Substitute the terms
Since
This means that
The multiprecision algorithm
To implement this algorithm in a CPU, we use arrays of unsigned integers to represent
Variable | Description |
---|---|
p[n] |
The field modulus as an |
c[n*2] |
The value to reduce as a |
w |
The number of bits per word. |
n |
The number of words. |
mu |
The least significant |
The pseudocode which implements this algorithm looks like the following.
def mont_redc(c, p, mu, n, w):
mask = (2 ** w) - 1
t = list(c) # copy c to t
for i in 0..n:
q = (t[i] * mu) & mask
# 1st inner loop: multiply p by q, word by word, while propagating carries.
# Note that `& mask` is equivalent to mod w, and
# `>> w` is equivalent to division by w.
pq = [0] * (n + 1)
for j in 0..n:
prod = q * p[j] + carry
pq[j] = prod & mask
carry = prod >> w
pq[n] = carry
# 2nd inner loop: add q * p to t word by word, while propagating carries.
carry = 0;
for j in 0..n + 1:
s = t[i + j] + pq[j] + carry
t[i + j] = s & mask
carry = s >> w
# Propagate the carry to the rest of t.
k = i + n + 1
while carry > 0 && k < (2 * n):
s = t[k] + carry
t[k] = s & mask
carry = s >> w
k ++
# At this point, t begins with n zeros, followed by n unreduced words
# of the result
lhs = t[n:n*2]
# Assume gte and sub are multiprecision greater-than and subtraction
# functions
if gte(lhs, p):
return sub(lhs, &p, w)
else:
return lhs
Now, let us understand exactly how the pseudocode implements the reduction algorithm. Recall the first two steps of its algebraic description:
Consider that
Also recall that the integer
Our goal is to reduce
We use
Ultimately, the result should have
We need to show that after each outer loop iteration:
a. the least significant word (LSW) is zeroed out
b. and each
Proving that the LSW is zeroed out
We compute pq
using the first inner loop:
pq
pq
is a
Next, recall that when we multiply a multi-digit value by a single word, the LSW of the result is the product of the LSW of both the multiplicand and multiplier, modulo
The second inner loop adds
The LSW is zero because
Proving that is congruent to
Next, we must prove the inductive step to show that our result has
Rewrite the high-level algorithm to reduce
Multiply both sides by
To see the that both sides of the above equation are congruent modulo
This is equivalent to stating that:
which is exactly what we want to prove.
By now, the reader should be familiar with the core principles behind various multiprecision Montgomery multiplication algorithms, such as SOS, CIOS, and FIOS, described in Acar (1996). The differences between these variants only lie in how the reduction and addition steps are interleaved. The complexity of these algorithms vanishes when one grasps the core principles behind how word-by-word reduction corresponds with high-level algebraic reasoning.
The Logjumps algorithm
Now let us examine Logjumps. It uses the precomputed constants
Symbol | Description | Notes |
---|---|---|
This is the inverse of two raised to the power of the word size. It has |
||
The precomputed Montgomery constant. |
The pseudocode for this method, assuming that
# Calculates low * rho[0..4], returning a 5-word result.
def calc_m(low):
carry = 0
res = [0] * 5
for i in 0..4:
prod = low * rho[i] + carry
res[i] = prod & ((1 << 64) - 1)
carry = prod >> 64
res[4] = carry
return res
# The Logjumps algorithm.
def mul_logjumps_sos(c):
res = [0] * 8
mask = (1 << 64) - 1
# Step 1: the first Logjumps iteration
m = calc_m(c[0])
carry = 0
for i in 0..5:
s = c[i + 1] + m[i] + carry
res[i] = s & mask
carry = s >> 64
# Propagate carries to res[6] and res[7]
s = c[6] + carry
res[5] = s & mask
carry = s >> 64
s = c[7] + carry
res[6] = s & mask
carry = s >> 64
res[7] = carry
carry = 0
# Step 2: the second Logjumps iteration
m = calc_m(res[0])
carry = 0
for i in 0..5:
s = res[i + 1] + m[i] + carry
res[i] = s & mask
carry = s >> 64
# Propagate carries to res[6]
s = res[6] + carry
res[5] = s & mask
carry = s >> 64
res[6] = carry
carry = 0
# Step 3: the third Logjumps iteration
m = calc_m(res[0])
carry = 0
for i in 0..5:
s = res[i + 1] + m[i] + carry
res[i] = s & mask
carry = s >> 64
res[5] = carry
carry = 0
# Step 4: use one iteration of Montgomery reduction to reduce res[] by one word
q = (res[0] * U64_MU0) & mask
pq = [0] * 5
for i in 0..4:
prod = q * U64_P[i] + carry
pq[i] = prod & mask
carry = prod >> 64
pq[4] = carry
carry = 0
for i in 0..5:
s = res[i] + pq[i] + carry
res[i] = s & mask
carry = s >> 64
return [res[1], res[2], res[3], res[4]]
Let
We can rewrite
where
Next, compute
and add it to the higher words, but in an aligned way: that is, we shift
This sets
To show that the above is correct, we need to prove the following equivalence:
The proof is simple. First, we substitute the definition of
Next, we expand the terms in the parentheses, and observe that we get the right-hand side as desired:
We can repeat this until we have
Performance
Each inner loop of the Montgomery reduction algorithm performs one multiplication to compute q
, and q * p[j]
. Since there are
The Logjumps algorithm, however, only takes calc_m()
(which does
For 256-bit primes (used in common elliptic-curve cryptographic schemes) on 64-bit CPUs,
Yuval's writeup discusses a further optimisation to Logjumps. It is to compute
Implications
Field arithmetic is widespread in modern cryptography, so Logjumps can speed up a wide range of applications. However, its performance depends heavily on the hardware platform on which it is implemented, so one should be cautious about making concrete predictions of performance gains. For instance, the ability for a CPU to efficiently compute the independent products described above (such as
Further research is therefore needed to benchmark this technique in production systems and refine its implementation for different hardware platforms. Its potential for parallelism may be even more significant for very large
Credits
Many thanks to Yuval Domb, Kobi Gurkan, Guillermo Angeris, and Nicolas Mohnblatt for their valuable comments and feedback on this article.