# Shamir's Secret Sharing

Shamir's Secret Sharing, formulated by Adi Shamir, is one of the first secret sharing schemes in cryptography. It is based on polynomial interpolation over finite fields.[1]

## High-level explanation

Shamir's Secret Sharing (SSS) is used to secure a secret in a distributed way, most often to secure other encryption keys. The secret is split into multiple parts, called shares, which individually should not give any information about the secret.

To unlock the secret via Shamir's secret sharing, a minimum number of shares are needed. This is called the threshold, and is used to denote the minimum number of shares needed to unlock the secret. An adversary who discovers any number of shares less than the threshold will not have any additional information about the secured secret-- this is called perfect secrecy. In this sense, SSS is a generalisation of the one-time pad (which can be viewed as SSS with a two-share threshold and two shares in total).

Consider an example:

Problem: Company XYZ needs to secure their vault's passcode. They could use something standard, such as AES, but the key holder could be unavailable or die. The key could be compromised by a malicious hacker or the holder of the key could turn rogue, and the adversary could use the key to their benefit.

SSS can be used in this situation. It can be used to share the vault's passcode and generate a certain number of shares, where a certain number of shares can be allocated to each executive within Company XYZ. Now, the executives can only unlock the vault if they combine more shares than the threshold. The threshold can be appropriately set for the number of executives, so the vault is always accessible by the authorized individuals. If a small number of shares were compromised, these shares could not be used to find the passcode unless other executives cooperated.

## Mathematical formulation

Shamir's Secret Sharing is an ideal and perfect ${\displaystyle \left(k,n\right)}$ -threshold scheme. In such a scheme, the aim is to divide a secret ${\displaystyle S}$  (for example, the combination to a safe) into ${\displaystyle n}$  pieces of data ${\displaystyle S_{1},\ldots ,S_{n}}$  (known as shares) in such a way that:

1. Knowledge of any ${\displaystyle k}$  or more shares ${\displaystyle S_{i}}$  makes ${\displaystyle S}$  easily computable. That is, the complete secret ${\displaystyle S}$  can be reconstructed from any combination of ${\displaystyle k}$  shares of data.
2. Knowledge of any ${\displaystyle k-1}$  or fewer shares ${\displaystyle S_{i}}$  leaves ${\displaystyle S}$  completely undetermined, in the sense that the possible values for ${\displaystyle S}$  seem as likely with knowledge of up to ${\displaystyle k-1}$  shares as with knowledge of ${\displaystyle 0}$  shares. The secret ${\displaystyle S}$  cannot be reconstructed with fewer than ${\displaystyle k}$  shares.

If ${\displaystyle n=k}$ , then every piece of the original secret ${\displaystyle S}$  is required to reconstruct the secret.

One can draw an infinite number of polynomials of degree 2 through 2 points. 3 points are required to uniquely determine a polynomial of degree 2. This image is for illustration purposes only — Shamir's scheme uses polynomials over a finite field, which are not easy to represent in a 2-dimensional plane.

The essential idea of the scheme is based on the Lagrange interpolation theorem, specifically that ${\displaystyle k\,\!}$  points is enough to uniquely determine a polynomial of degree less than or equal to ${\displaystyle k-1\,\!}$ . For instance, 2 points are sufficient to define a line, 3 points are sufficient to define a parabola, 4 points to define a cubic curve and so forth.

Assume that the secret ${\displaystyle S}$  can be represented as an element ${\displaystyle a_{0}}$  of a finite field ${\displaystyle GF(q)}$  (where ${\displaystyle q}$  is larger than the number of shares being generated). Randomly choose ${\displaystyle k-1}$  elements, ${\displaystyle a_{1},\cdots ,a_{k-1}\,\!}$ , from ${\displaystyle GF(q)}$  and construct the polynomial ${\displaystyle f\left(x\right)=a_{0}+a_{1}x+a_{2}x^{2}+a_{3}x^{3}+\cdots +a_{k-1}x^{k-1}\,\!}$ . Compute any ${\displaystyle n\,\!}$  points out on the curve, for instance set ${\displaystyle i=1,\ldots ,n\,\!}$  to find points ${\displaystyle \left(i,f\left(i\right)\right)\,\!}$ . Every participant is given a point (a non-zero input to the polynomial, and the corresponding output).[2] Given any subset of ${\displaystyle k\,\!}$  of these pairs, ${\displaystyle a_{0}}$  can be obtained using interpolation, with one possible formula for doing so being ${\displaystyle a_{0}=f(0)=\sum _{j=0}^{k-1}y_{j}\prod _{\begin{smallmatrix}m\,=\,0\\m\,\neq \,j\end{smallmatrix}}^{k-1}{\frac {x_{m}}{x_{m}-x_{j}}}}$ , where the list of points on the polynomial is given as k pairs of the form ${\displaystyle (x_{i},y_{i})}$ . Note that ${\displaystyle f(0)}$  is equal to the first coefficient of polynomial ${\displaystyle f(x)}$ .

## Usage

### Example

The following example illustrates the basic idea. Note, however, that calculations in the example are done using integer arithmetic rather than using finite field arithmetic to make the idea easier to understand. Therefore the example below does not provide perfect secrecy and is not a proper example of Shamir's scheme. The next example will explain the problem.

#### Preparation

Suppose that the secret to be shared is 1234 ${\displaystyle (S=1234)\,\!}$ .

In this example, the secret will be split into 6 shares ${\displaystyle (n=6)\,\!}$ , where any subset of 3 shares ${\displaystyle (k=3)\,\!}$  is sufficient to reconstruct the secret. ${\displaystyle k-1=2}$  numbers are taken at random. Let them be 166 and 94.

This yields coefficients ${\displaystyle (a_{0}=1234;a_{1}=166;a_{2}=94),\,\!}$  where ${\displaystyle a_{0}}$  is the secret

The polynomial to produce secret shares (points) is therefore:

${\displaystyle f(x)=1234+166x+94x^{2}\,\!}$

Six points ${\displaystyle D_{x-1}=(x,f(x))}$  from the polynomial are constructed as:

${\displaystyle D_{0}=(1,1494);D_{1}=(2,1942);D_{2}=(3,2578);D_{3}=(4,3402);D_{4}=(5,4414);D_{5}=(6,5614)\,\!}$

Each participant in the scheme receives a different point (a pair of ${\displaystyle x\,\!}$  and ${\displaystyle f(x)\,\!}$ ). Because ${\displaystyle D_{x-1}}$  is used instead of ${\displaystyle D_{x}}$  the points start from ${\displaystyle (1,f(1))}$  and not ${\displaystyle (0,f(0))}$ . This is necessary because ${\displaystyle f(0)}$  is the secret.

#### Reconstruction

In order to reconstruct the secret, any 3 points are sufficient

Consider using the 3 points${\displaystyle \left(x_{0},y_{0}\right)=\left(2,1942\right);\left(x_{1},y_{1}\right)=\left(4,3402\right);\left(x_{2},y_{2}\right)=\left(5,4414\right)\,\!}$ .

Computing the: Lagrange basis polynomials:

${\displaystyle \ell _{0}(x)={\frac {x-x_{1}}{x_{0}-x_{1}}}\cdot {\frac {x-x_{2}}{x_{0}-x_{2}}}={\frac {x-4}{2-4}}\cdot {\frac {x-5}{2-5}}={\frac {1}{6}}x^{2}-{\frac {3}{2}}x+{\frac {10}{3}}\,\!}$
${\displaystyle \ell _{1}(x)={\frac {x-x_{0}}{x_{1}-x_{0}}}\cdot {\frac {x-x_{2}}{x_{1}-x_{2}}}={\frac {x-2}{4-2}}\cdot {\frac {x-5}{4-5}}=-{\frac {1}{2}}x^{2}+{\frac {7}{2}}x-5\,\!}$
${\displaystyle \ell _{2}(x)={\frac {x-x_{0}}{x_{2}-x_{0}}}\cdot {\frac {x-x_{1}}{x_{2}-x_{1}}}={\frac {x-2}{5-2}}\cdot {\frac {x-4}{5-4}}={\frac {1}{3}}x^{2}-2x+{\frac {8}{3}}\,\!}$

Using the formula for polynomial interpolation, ${\displaystyle f(x)}$  is:

{\displaystyle {\begin{aligned}f(x)&=\sum _{j=0}^{2}y_{j}\cdot \ell _{j}(x)\\[6pt]&=y_{0}\ell _{0}(x)+y_{1}\ell _{1}(x)+y_{2}\ell _{2}(x)\\[6pt]&=1942\left({\frac {1}{6}}x^{2}-{\frac {3}{2}}x+{\frac {10}{3}}\right)+3402\left(-{\frac {1}{2}}x^{2}+{\frac {7}{2}}x-5\right)+4414\left({\frac {1}{3}}x^{2}-2x+{\frac {8}{3}}\right)\\[6pt]&=1234+166x+94x^{2}\end{aligned}}}

Recalling that the secret is the free coefficient, which means that ${\displaystyle S=1234\,\!}$ , and we are done.

#### Computationally efficient approach

Using polynomial interpolation to find a coefficient in a source polynomial ${\displaystyle S=f(0)}$  using Lagrange polynomials is not efficient, since unused constants are calculated.

Considering this, an optimized formula to use Lagrange polynomials to find ${\displaystyle f(0)}$  is defined as follows:

${\displaystyle f(0)=\sum _{j=0}^{k-1}y_{j}\prod _{\begin{smallmatrix}m\,=\,0\\m\,\neq \,j\end{smallmatrix}}^{k-1}{\frac {x_{m}}{x_{m}-x_{j}}}}$

#### Problem

Although the simplified version of the method demonstrated above, which uses integer arithmetic rather than finite field arithmetic, works, there is a security problem: Eve gains information about ${\displaystyle S}$  with every ${\displaystyle D_{i}}$  that she finds.

Suppose that she finds the 2 points ${\displaystyle D_{0}=(1,1494)}$  and ${\displaystyle D_{1}=(2,1942)}$ . She still does not have ${\displaystyle k=3}$  points, so in theory she should not have gained any more information about ${\displaystyle S}$ . But she could combine the information from the 2 points with the public information: ${\displaystyle n=6,k=3,f(x)=a_{0}+a_{1}x+\cdots +a_{k-1}x^{k-1},a_{0}=S,a_{i}\in \mathbb {N} }$ . Doing so, Eve could perform the following algebra:

1. Fill the formula for ${\displaystyle f(x)}$  with ${\displaystyle S}$  and the value of ${\displaystyle k:f(x)=S+a_{1}x+\cdots +a_{3-1}x^{3-1}\Rightarrow {}f(x)=S+a_{1}x+a_{2}x^{2}}$
2. Fill (1) with the values of ${\displaystyle D_{0}}$ 's ${\displaystyle x}$  and ${\displaystyle f(x):1494=S+a_{1}1+a_{2}1^{2}\Rightarrow {}1494=S+a_{1}+a_{2}}$
3. Fill (1) with the values of ${\displaystyle D_{1}}$ 's ${\displaystyle x}$  and ${\displaystyle f(x):1942=S+a_{1}2+a_{2}2^{2}\Rightarrow {}1942=S+2a_{1}+4a_{2}}$
4. Subtract (3)-(2): ${\displaystyle (1942-1494)=(S-S)+(2a_{1}-a_{1})+(4a_{2}-a_{2})\Rightarrow {}448=a_{1}+3a_{2}}$  and rewrite this as ${\displaystyle a_{1}=448-3a_{2}}$ . Eve knows that ${\displaystyle a_{2}\in \mathbb {N} }$  so she starts replacing ${\displaystyle a_{2}}$  in (4) with 0, 1, 2, 3, ... to find all possible values for ${\displaystyle a_{1}}$ :
1. ${\displaystyle a_{2}=0\rightarrow {}a_{1}=448-3\times 0=448}$
2. ${\displaystyle a_{2}=1\rightarrow {}a_{1}=448-3\times 1=445}$
3. ${\displaystyle a_{2}=2\rightarrow {}a_{1}=448-3\times 2=442}$
4. ${\displaystyle \,\,\,\,\,\,\,\,\,\vdots }$
5. ${\displaystyle a_{2}=148\rightarrow {}a_{1}=448-3\times 148=4}$
6. ${\displaystyle a_{2}=149\rightarrow {}a_{1}=448-3\times 149=1}$
5. After checking ${\displaystyle a_{2}=149}$ , she stops because would get negative values for ${\displaystyle a_{1}}$  with larger values of ${\displaystyle a_{2}}$  (which is impossible because ${\displaystyle a_{1}\in \mathbb {N} }$ ). Eve can now conclude ${\displaystyle a_{2}\in [0,1,\dots ,148,149]}$
6. Now, Eve can replace ${\displaystyle a_{1}}$  by (4) in (2): ${\displaystyle 1494=S+(448-3a_{2})+a_{2}\Rightarrow {}S=1046+2a_{2}}$ . Now, replacing ${\displaystyle a_{2}}$  in (6) by the values found in (5), she gets ${\displaystyle S\in [1046+2\times 0,1046+2\times 1,\dots ,1046+2\times 148,1046+2\times 149]}$  which leads her to the information: ${\displaystyle S\in [1046,1048,\dots ,1342,1344].}$

Eve now only has 150 numbers to guess from instead of an infinite quantity of natural numbers.

#### Solution

This is a polynomial curve over a finite field—now the order of the polynomial has seemingly little to do with the shape of the graph.

Geometrically this attack exploits the fact that the order of the polynomial is known and thus gives information into the paths the polynomial take between known points. This reduces possible values of unknown points since the points must lie on a smooth curve, and the polynomial must have coefficients that are natural numbers.

This problem can be fixed by using finite field arithmetic. A field of size ${\displaystyle p\in \mathbb {P} :p>S,p>n}$  is used. The figure shows a polynomial curve over a finite field. In contrast to a smooth curve it appears disorganised and disjointed.

In practice this is only a small change. A prime ${\displaystyle p}$  must be chosen that is bigger than the number of participants and every ${\displaystyle a_{i}}$  (including ${\displaystyle a_{0}=S}$ ). The points on the polynomial must also be calculated as ${\displaystyle (x,f(x){\bmod {p}})}$  instead of ${\displaystyle (x,f(x))}$ .

Everybody who receives a point must also know the value of ${\displaystyle p}$ , so it is considered to be publicly known. Therefore, one should select a value for ${\displaystyle p}$  that is not too low to prevent attacks where somebody guesses every possible value for ${\displaystyle S}$ .

For this example, choose ${\displaystyle p=1613}$ , so the polynomial becomes ${\displaystyle f(x)=1234+166x+94x^{2}{\bmod {1613}}}$  which gives the points: ${\displaystyle (1,1494);(2,329);(3,965);(4,176);(5,1188);(6,775)}$

This time Eve doesn't gain any information when she finds a ${\displaystyle D_{x}}$  (until she has ${\displaystyle k}$  points).

Suppose again that Eve finds ${\displaystyle D_{0}=\left(1,1494\right)}$  and ${\displaystyle D_{1}=\left(2,329\right)}$ , and the public information is: ${\displaystyle n=6,k=3,p=1613,f(x)=a_{0}+a_{1}x+\dots +a_{k-1}x^{k-1}\mod {p},a_{0}=S,a_{i}\in \mathbb {N} }$ . Attempting the previous attack, Eve can:

1. Fill the ${\displaystyle f(x)}$ -formula with ${\displaystyle S}$  and the value of ${\displaystyle k}$  and ${\displaystyle p}$ : ${\displaystyle f(x)=S+a_{1}x+\dots +a_{3-1}x^{3-1}\mod 1613\Rightarrow {}f(x)=S+a_{1}x+a_{2}x^{2}-1613m_{x}:m_{x}\in \mathbb {N} }$
2. Fill (1) with the values of ${\displaystyle D_{0}}$ 's ${\displaystyle x}$  and ${\displaystyle f(x):1494=S+a_{1}1+a_{2}1^{2}-1613m_{1}\Rightarrow {}1494=S+a_{1}+a_{2}-1613m_{1}}$
3. Fill (1) with the values of ${\displaystyle D_{1}}$ 's ${\displaystyle x}$  and ${\displaystyle f(x):1942=S+a_{1}2+a_{2}2^{2}-1613m_{2}\Rightarrow {}1942=S+2a_{1}+4a_{2}-1613m_{2}}$
4. Subtracts (3)-(2): ${\displaystyle (1942-1494)=(S-S)+(2a_{1}-a_{1})+(4a_{2}-a_{2})+(1613m_{1}-1613m_{2})\Rightarrow {}448=a_{1}+3a_{2}+1613(m_{1}-m_{2})}$  and rewrites this as ${\displaystyle a_{1}=448-3a_{2}-1613(m_{1}-m_{2})}$
5. Using ${\displaystyle a_{2}\in \mathbb {N} }$  so she starts replacing ${\displaystyle a_{2}}$  in (4) with 0, 1, 2, 3, ... to find all possible values for ${\displaystyle a_{1}}$ :
1. ${\displaystyle a_{2}=0\rightarrow {}a_{1}=448-3\times 0-1613(m_{1}-m_{2})=448-1613(m_{1}-m_{2})}$
2. ${\displaystyle a_{2}=1\rightarrow {}a_{1}=448-3\times 1-1613(m_{1}-m_{2})=445-1613(m_{1}-m_{2})}$
3. ${\displaystyle a_{2}=2\rightarrow {}a_{1}=448-3\times 2-1613(m_{1}-m_{2})=442-1613(m_{1}-m_{2})}$
4. ${\displaystyle \,\,\,\,\,\,\,\,\,\vdots }$

This time she is not able to stop because ${\displaystyle (m_{1}-m_{2})}$  could be any integer modulo ${\displaystyle p}$  (even negative if ${\displaystyle m_{2}>m_{1}}$ ) so there are ${\displaystyle p}$  possible values for ${\displaystyle a_{1}}$ . She knows that ${\displaystyle [448,445,442,\ldots ]}$  always decreases by 3, so if ${\displaystyle 1613}$  was divisible by ${\displaystyle 3}$  she could conclude ${\displaystyle a_{1}\in [1,4,7,\ldots ]}$ . However, ${\displaystyle p}$  is prime she can not conclude this. Thus, using a finite field avoids this possible attack.

#### Python example

"""
The following Python implementation of Shamir's Secret Sharing is
released into the Public Domain under the terms of CC0 and OWFa:
https://creativecommons.org/publicdomain/zero/1.0/
http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0

See the bottom few lines for usage. Tested on Python 2 and 3.
"""

from __future__ import division
from __future__ import print_function

import random
import functools

# 12th Mersenne Prime
# (for this application we want a known prime number as close as
# possible to our security level; e.g.  desired security level of 128
# bits -- too large and all the ciphertext is large; too small and
# security is compromised)
_PRIME = 2 ** 127 - 1
# The 13th Mersenne Prime is 2**521 - 1

_RINT = functools.partial(random.SystemRandom().randint, 0)

def _eval_at(poly, x, prime):
"""Evaluates polynomial (coefficient tuple) at x, used to generate a
shamir pool in make_random_shares below.
"""
accum = 0
for coeff in reversed(poly):
accum *= x
accum += coeff
accum %= prime
return accum

def make_random_shares(secret, minimum, shares, prime=_PRIME):
"""
Generates a random shamir pool for a given secret, returns share points.
"""
if minimum > shares:
raise ValueError("Pool secret would be irrecoverable.")
poly = [secret] + [_RINT(prime - 1) for i in range(minimum - 1)]
points = [(i, _eval_at(poly, i, prime))
for i in range(1, shares + 1)]
return points

def _extended_gcd(a, b):
"""
Division in integers modulus p means finding the inverse of the
denominator modulo p and then multiplying the numerator by this
inverse (Note: inverse of A is B such that A*B % p == 1). This can
be computed via the extended Euclidean algorithm
http://en.wikipedia.org/wiki/Modular_multiplicative_inverse#Computation
"""
x = 0
last_x = 1
y = 1
last_y = 0
while b != 0:
quot = a // b
a, b = b, a % b
x, last_x = last_x - quot * x, x
y, last_y = last_y - quot * y, y
return last_x, last_y

def _divmod(num, den, p):
"""Compute num / den modulo prime p

To explain this, the result will be such that:
den * _divmod(num, den, p) % p == num
"""
inv, _ = _extended_gcd(den, p)
return num * inv

def _lagrange_interpolate(x, x_s, y_s, p):
"""
Find the y-value for the given x, given n (x, y) points;
k points will define a polynomial of up to kth order.
"""
k = len(x_s)
assert k == len(set(x_s)), "points must be distinct"
def PI(vals):  # upper-case PI -- product of inputs
accum = 1
for v in vals:
accum *= v
return accum
nums = []  # avoid inexact division
dens = []
for i in range(k):
others = list(x_s)
cur = others.pop(i)
nums.append(PI(x - o for o in others))
dens.append(PI(cur - o for o in others))
den = PI(dens)
num = sum([_divmod(nums[i] * den * y_s[i] % p, dens[i], p)
for i in range(k)])
return (_divmod(num, den, p) + p) % p

def recover_secret(shares, prime=_PRIME):
"""
Recover the secret from share points
(points (x,y) on the polynomial).
"""
if len(shares) < 3:
raise ValueError("need at least three shares")
x_s, y_s = zip(*shares)
return _lagrange_interpolate(0, x_s, y_s, prime)

def main():
"""Main function"""
secret = 1234
shares = make_random_shares(secret, minimum=3, shares=6)

print('Secret:                                                     ',
secret)
print('Shares:')
if shares:
for share in shares:
print('  ', share)

print('Secret recovered from minimum subset of shares:             ',
recover_secret(shares[:3]))
print('Secret recovered from a different minimum subset of shares: ',
recover_secret(shares[-3:]))

if __name__ == '__main__':
main()


## Properties

Some of the useful properties of Shamir's ${\displaystyle \left(k,n\right)\,\!}$  threshold scheme are:

1. Secure: The scheme has Information theoretic security.
2. Minimal: The size of each piece does not exceed the size of the original data.
3. Extensible: When ${\displaystyle k\,\!}$  is kept fixed, shares ${\displaystyle D_{i}\,\!}$  can be dynamically added or deleted without affecting the other pieces, because computing new points on the polynomial does not affect the currently computed points.
4. Dynamic: Security can be easily enhanced without changing the secret, but by changing the polynomial occasionally (keeping the same free term) and constructing new shares for the participants.
5. Flexible: In organizations where hierarchy is important, each participant can be assigned different numbers of shares according to their importance inside the organization. For instance, the president could unlock the safe alone, whereas 3 secretaries would be required to combine their shares to unlock the safe.

A known issue in Shamir's Secret Sharing scheme is the verification of correctness of the retrieved shares during the reconstruction process, which is known as verifiable secret sharing. Verifiable secret sharing aims to verify that shareholders are honest and not submitting fake shares.