Cryptology

Cryptology - Spring 2015

Part I

The first part will be taught by
Tanja Lange
Coding Theory and Cryptology
Eindhoven Institute for the Protection of Information
Department of Mathematics and Computer Science
Room HG 9.92
Technische Universiteit Eindhoven
P.O. Box 513
5600 MB Eindhoven
Netherlands

The easiest ways to reach me where ever I am:
e-mail:tanja@hyperelliptic.org

The second part (March 24 onwards, note that May 05 is a public holiday in the Netherlands) will be taught by Marc Stevens. The webpage for the second part is here.

Details are filled in as time permits. The official home page is here.
The course takes place Tuesdays at the University of Utrecht in Minnaertbuilding room 211, except for March 10 when it is replaced by a reading assignment.
The first part consists of 7 weeks of lectures, the dates are Feb 03, Feb 10, Feb 17, Feb 24, Mar 03, Mar 10 (no lecture but reading), and Mar 17. Each class takes from 14:15 - 17:00. These 3 hours are lectures with integrated exercises, so there are no separate exercise sessions.

Goal
The goal of this course is to provide insight into cryptography secure against quantum computers (post-quantum cryptography) as well as various methods for the mathematical cryptanalysis of cryptographic systems

Description
Cryptology deals with mathematical techniques for design and analysis of algorithms and protocols for digital security in the presence of malicious adversaries. For example, encryption and digital signatures are used to construct private and authentic communication channels, which are instrumental to secure Internet transactions.
This course in cryptology consists of two main topics:
The first part focuses on post-quantum cryptography dealing with cryptographic systems that are secure even given the existence of quantum computers and the second part focuses on cryptanalysis, the analysis of the security of cryptographic systems.
After a general introduction to cryptography (the constructive side of cryptology) and cryptanalysis the course introduces the main contenders for secure systems: lattice-based encryption, code-based encryption, hash-based signatures, and multivariate-quadratic-based signatures. These systems are examples of public-key systems and this is the main area affected by quantum computers; symmetric-key systems, such as hash functions and block and stream ciphers) are used as building blocks inside them and for the transmission of data in bulk.
The second part of the course will cover various generic attacks against common cryptographic primitives (e.g., block ciphers, hash functions) and cover important cryptanalytic attack techniques like time-memory tradeoffs, linear cryptanalysis, differential cryptanalysis and algebraic cryptanalysis.

Announcements

Here are some test exercises. Sadly the link in the last exercise has stopped working. The Internet archive still has a copy at https://web.archive.org/web/20150418221433/http://www.larc.usp.br/~pbarreto/PQC-3.pdf.

Examination

There will be a written exam on June 9, 14:00-17:00, Educatorium Beta (at University of Utrecht). The retake exam will take place on June 30, 14:00-17:00, BBG (Buys Ballot Building) 061.

Here are some test exercises.

The exercises for the first exam are here. I will not make solutions available for my part but you can email me what you think is the correct solution and I'll give you feedback.

The exercises for the second exam are here.

Literature

There are a few books covering topics we touch in class but there is no 'textbook' that covers everything. The following list will be updated as the course proceeds. For general background on cryptography the Handbook of Applied Cryptography by Menezes, van Oorshot, and Vanstone has everything you wanted to know (and more); you can download the chapters freely from the web.
Lots of references on post-quantum cryptography are collected at pqcrypto.org.
A (only slightly outdated) reference on post-quantum crypto is Post-Quantum Cryptography by Bernstein, Buchmann and Dahmen as editors. Some sample pages are available on the Springer page and the authors of the chapters are free to host their chapters on their personal webpages.
It is not necessary to purchase any book to follow the course.

Class notes

03 Feb 2015
General introduction to cryptography; concepts public key and symmetric key cryptography, interactive example of Kid-RSA, real RSA.
I took pictures of all black boards before erasing them. They are here.

10 Feb 2015
Attacks on RSA, in detail Pollard-rho for factoring. Diffie-Hellman key exchange in finite fields, comments on elliptic curves; Pollard-rho as an attack on the discrete logarithm problem. Both these attacks have a function that is periodic. Using slides by Daniel J. Bernstein (vertical pdf slides or horizontal pdf slides (used to display in lecture)) I explained details of quantum algorithms. Simon's algorithm and Shor's algorithm (similar to Shor, not covered on slides) find the period of periodic functions in polynomial time. Grover's algorithm finds a root of a function in squareroot time. Attacks on factoring and the DLP run in polynomial time on a quantum computer; exhaustive search (if modeled appropriately to match root finding of a function) runs in exponential time, just the exponent gets smaller.
In the exercise part we did Pollard rho to factor 403 and to break the DLP modulo p=1013, with base g=3 and h=g^a=245. If you work without a computer, try p=23, g=5, h=17 instead.
Anne Smeets was so nice to take pictures of all black boards. They are here.
Later addition: one board was missing, I added one of my pictures for that.

17 Feb 2015
Quantum computers can factor and compute discrete logs in polynomial time; using forward secure encryption does not help against an attacker who has stored all connection data and has access to a quantum computer (I mentioned https://stopdebewaarplicht.nu/ in this context). Common answers in the search for alternatives are to use quantum crypto or post-quantum crypto. I gave a short explanation and a short rant on quantum crypto. For a very positive view see the Everything you need to know about Quantum Cryptography by a company dealing in quantum crypto. A general-audience article on what goes wrong in practice appeared in Wired.
This course is about post-quantum crypto, which uses mathematics and not physics to generate secure systems. Some warnings that we cover only schoolbook versions while for implementations you want to use CCA-II secure versions that remain secure even if the attacker has access to a decryption oracle. I mentioned RSA OAEP as an example and we showed that RSA is vulnerable against oracle attacks. RSA OAEP is really as hard to break as the RSA problem we studied, even if the attacker is very creative.
Short introduction to coding theory and code-based crypto (linear block codes, length, dimension, minimum distance, [n,k,2t+1] code, Hamming weight, generator matrix, parity check matrix, y=mG+e, Hy=He).
General decoding problem is NP-complete; this seems to hold also on average. McEliece crypto system (1978) uses this to encrypt. Public key is generator matrix G, encrypt m as mG+e using some random error e of weight t, decryption is some secret way of decoding. Can actually build this: given G' which is easy to decode, pick random invertible kxk matrix S, and nxn permutation matrix P and put G=SG'P. You can decrypt this by decoding yP^-1=mSG'+eP^-1 because mS is just another message and eP^-1 has the same weight as e.
Break the system by finding e -- which corresponds to finding a word of weight t in the code C+{0,y} (e is in this code); Lee-Brickell attack using the parity check matrix of this code. More attacks next week. Also some examples of good codes.
I took pictures of all black boards before erasing them. They are here.

24 February 2015
McEliece and Niederreiter system. Note that these are not semantically secure in the school book versions explained -- we saw that y+c gives and oracle attack (pick any code word c) and that in general we get information from whether or not y+ei decrypts about the positions that are non-zero in e.
Finding words of small weight in a code -- algorithms by Prange, Lee-Brickel, Leon, and Stern. We didn't do the running time analysis, I gave some motivation why the improvements are improvements but didn't trace all the binomial coefficients. Here is a taste of this. I will denote the binomial coefficient n choose k by C(n,k) to save my sanity in typing this.
There are n choose t ways the t error positions can be distributed over the n positions. When we select the n-k columns to make a left-right split and to transform the right hand side into the identity matrix we determine, whether step 2 can be successful. On that level, the likelihood of a selection to work changes from C(k,p)*C(n-k,t-p)/C(n,t) for Lee-Brickel to C(k,p)*C(n-k-z,t-p)/C(n,t) for Leon (because there are p columns chosen out of the k columns on the left, and t-p of those on the right; for Leon in addition we insist on having z positions to be zero in return for a cheaper step). For Stern the probability is (C(k/2,p))^2*C(n-k-z,t-2p)/C(n,t). If this does not scare you, take a look at the original Stern paper or our paper reporting on our 2008 attack. Actually you should totally do this.
I gave an introduction to Goppa codes and proved some properties of them. Note that G(x) should be defined over the big field (I was getting in trouble reconstructing the example because 10 and 50 were not coprime; choosing G(x) in F_{2^10} makes that much easier.) I've fixed it on the photos. One student asked after the lecture about an alternative definition which uses S(x)=Σc*h(x)/(x-ai), where h(x)=Π(x-ai). Since G(x) is coprime to h(x) this does not change when S(c) is congruent to 0 modulo G(x) and avoids talking about rational functions or 1/(x-ai) mod G(x). I needed the Gi(x) anyways for the proof, so that simplification didn't help. We showed that in general Γ(L,G) is a [n,≥n-ms,≥s+1] code and that for F_2 and G(x) squarefree we get a [n,≥n-ms,≥2s+1] code. The original McEliece parameters were n=1024=2^10, m=10, s=50, G(x) was an irreducible polynomial of degree 50 over F_{2^10}. For more details on coding theory and the decoding algorithm for Goppa codes see Henk van Tilborg's book page 64 ff.
Further reading

Henk van Tilborg made his book Coding Theory
In 2009 Springer published an overview book on post-quantum crypto. The chapter on code-based cryptography was written by Raphael Overbeck and Nicolas Sendrier; if you're on a university network you should be able to download it from here (at least this works for TU/e).
In 2009 Paulo Barreto gave a nice introductory lecture on code-based cryptography.
Christiane Peters wrote her PhD thesis on curves (not post-quantum) and code-based crypto in 2011. You can download her thesis from her publications page.
In 2012 I helped organize a workshop on post-quantum cryptography at the Lorentz Center in Leiden. Slides from the talks are available, in particular Nicolas Sendrier gave a tutorial on code-based cryptography.

Anne Smeets and I took pictures of all black boards before erasing them. They are here.

Sorry, I wasn't doing to well the past weeks, so got far behind on posting. I'm including suggestions for further reading into the March 3 post. Exercises will come soon but not tonight.

03 March 2015
Lecture given by Daniel J. Bernstein while I was sick.

Electronic signatures are used to prove the authenticity of statements, e.g. of a key on the Internet, of a software update, or or a passport. Current signatures are usually based on RSA. This lecture is about post-quantum signatures. We use hash functions as functions that are hard to invert and for which it is hard to find collisions. You will learn more about hash functions in the second half of this course. For the post-quantum aspects note that Grover handles the problem of finding preimages in squareroot time but that this does not get easier with Shor's algorithm.

Dan explained how to use a hash function to sign exactly one message (and explained that even that can be useful in the sense of a warrant canary. Then he explained how to expand this to signing two messages, 0 and 1 or 'yes' and 'no', by revealing the preimage of exactly on hash value. This signature scheme was first proposed by Leslie Lamport. This idea can be generalized to signing arbitrarily many messages -- e.g. publish 2²⁰ hash values to sign one of 2²⁰ different messages by revealing one preimage. More efficiently, publish 40 hash values, each corresponding to a bit position between 1 and 20 and a bit value in {0,1}. Now you can sign a 20-bit long a message (m1, m2, ...., m20) by revealing 20 preimages, namely exactly those dictated by m1, m2, etc. This means that the signature got 20 times longer but the public key shrunk from 2^20 hash values to just 40 while we have the same space of messages that we can sign. Of course this can be extended to n bits while increasing the keysize linearly with n.

These signatures are one-time-signatures, meaning that if you use the same key (=set of hash values) twice an attacker can most likely forge a signature by mixing and matching bits from both messages. The only case where this does not lead to a forgery is when the signed messages differ in just one bit. If the attacker has access to an oracle he can query for a signature on (0,0,...,0) and one on (1,1,...,1) after which he knows preimages for every hash value in the public key and can sign any message.

One-time signatures are OK e.g. for code updates because the new code (which is signed by the old key) could include the next key, but this requires users to get all updates in sequence and would be a deployment nightmare.

Now that we've understood hash-based one-time signatures let's go for many-time signatures. To have a key that can be used multiple times, Ralph Merkle suggested using trees. E.g. to use a key 16 times build a tree with 16 leaves, each consisting of a one-time signature as above which can be used to sign n bits. Then take the public keys for the 1st and 2nd message and hash them together into one value, place this at the node with children the 1st and 2nd message. Do the same for the 3rd and 4th message, the 5th and 6th, etc. till the 15th and 16th message. Now we have a layer above the leaf layer. Build a complete binary tree leading a single root four levels up. Take the 1st and 2nd hash values on this layer, hash them together, and place the result in the node above them. Do the same for any node in the tree -- it gets the value obtained by hashing together its children. The root of the tree encapsulates contributions from each of the 16 signatures on the leaf layer, use this one value as the public key for 16 signatures.

Of course a signature now needs to include not only the n hash preimages from the one-time signature that's used in this run but also values from the other nodes so that a complete pass up to the root can be verified. In our example with the 16 signatures this means giving the value of the sibling and of every sibling of the nodes on the two levels above the leaf layer on the way to the root. Then the verifier can check that a the signature matches the one-time signature used at the leaf and that the public key for this leaf together with the other node values computes the same value at the root. See the slide set in the next section for an illustration. Dan also did some drawing on the board.

Further reading:

In December I gave a short presentation on hash-based signatures. The first part contains some pictures of mathematical trees illustrating the text above.
Together with a bunch of people we designed SPHINCS: practical stateless hash-based signatures dealing with the most annoying part of hash-based signatures, namely that the signer needs to keep state to know which leaf-level signature has been used. The design uses Winternitz signatures instead of Lamport signatures.
Adam Langley from Google wrote a blog post giving more motivation for hash-based signatures ... should you need any.

Anne Smeets took pictures of all black boards before erasing them. They are here.

Short of letting you trace through trees the interesting questions which have enough math in them are to compute how secure the system is. Dan has posed these questions during the lecture (check out the blackboard pictures); you'll have a better understanding of the security of hash functions after the second part of the course.

17 March 2015
Lecture on lattice-based cryptography.

This lecture had three main parts -- the first in which I explained how NTRU works, the second in which I explained some basics of lattice algorithms, and a third of applying lattice algorithms to solving NTRU.

The NTRU system has three general public parameters: namely positive integers N, p, and q where gcd(p,q)=1 and q is much larger than p. In class I mentioned p=3, q=64, N=107 as toy parameters and p=3, q=128, and N=503 as the low end of reasonable parameters. For hand calculations you can use p=3, q=101, and N=7. If you want something more challenging, try the NTRU challenge; they do offer cash prizes!

All computations take place in the ring R=Z[x]/(x^N-1), i.e. all elements are represented by polynomials of degree <N. Some computations additionally reduce modulo p or modulo q.

The private key of user Alice is a polynomial f(x) in R which satisfies that f is invertible in R/p=(Z/p)[x]/(x^N-1) and in R/q=(Z/q)[x]/(x^N-1).

To generate her public key, Alice picks a polynomial g(x)in R and computes f_p=f^{-1} in R/p, f_q=f^{-1} in R/q and h=f_q* g in R/q. Alice's public key is h along with the public parameters p,q, and N.

To encrypt message m(x) which is an element of R with coefficients in the smaller interval [-(p-1)/2,(p-1)/2] to a user with public key h take a random polynomial φ(x)\in R$ and compute c=p*φ * h +m in R/q.

To decrypt ciphertext c in R/q use private key f and compute a=f * c in R/q, choosing coefficients in [-(q-1)/2,(q-1)/2]. [If you're a mathematician, lift a to R, i.e. forget about the reduction modulo q]. Then compute m'=a* f_p in R/p, taking coefficients from [-(p-1)/2,(p-1)/2].

The idea here is that the computation modulo p removes the first part, so that we are left with f*f_p*m which is congruent to m modulo p, because f_p is the inverse of f modulo p. However, this mixes computations modulo q and p, which usually is a bad idea. The reason that this is still likely to work is that the first part is not just some multiple of p but that h=f_q*g and thus the first part of a(x) equals f*p*φ*h=p*φf*f_q*g which is equivalent to p *φ*g modulo q. Now, if φ and g are sparse (have very few nonzero coefficient), the existing coefficients are small and p is small (all relative to q) then is is very likely that p *φ*g in R (i.e. without reduction) equals p *φ*g with reduction to coefficients in [-(q-1)/2,(q-1)/2], which means that all coefficients are indeed divisible by p and thus the first part vanishes.

NTRU chooses to have all random polynomials have coefficients in {-1,0,1} and to limit the number of non-zero coefficients. Let Lf be the set of polynomials in R with df coefficients equal to 1, df-1 coefficients equals to -1, and the remaining coefficients equal to 0. The secret key f is chosen to be an element of Lf. Similarly let Lg (and Lφ) be the sets of polynomials with exactly dg (and dφ) coefficients equal to 1, the same number equal to -1 and the rest equal to 0. For the parameters with N=107 choose df=15, dg=12, and dφ=5; for N=503 choose df=216, dg=72 and dφ=55.

We did some quick estimate that this is usually OK and discussed why f cannot have the same number of -1 and +1 coefficients (else it is divisible by x-1 and thus cannot be invertible).

A lattice is a discrete subgroup of R^n, usually given by some basis B={b1, b2, ..., bn}. Then L={Σ aibi | ai in Z}. In cryptography one is usually given a very skewed basis and the computational task is to find a nice basis in which the vectors are close to orthogonal. If the coefficients were in R rather than in Z we could do this easily with the Gram-Schmidt orthonormalization algorithm, but for lattices of sufficiently high dimension it is even hard to find the shortest vector in a lattice. This Shortest-Vector Problem (SVP) or the related problem of finding the closest lattice point to a given point in R^n, the Closest Vector Problem (CVP) form the basis of several lattice-based cryptosystems. See the blackboard pictures for some drawing.

One of the oldest methods to find shortest vectors is LLL, named after H. Lenstra, A. Lenstra, and L. Lovasz. The algorithm finds a short vector that is at most a factor d longer than the shortest. This approximation factor d is important when checking whether a system is secure. The shortest vector must be such that there are other vectors more likely to be found.

To illustrate the working of LLL I showed slides by Thijs Laarhoven, a PhD student at TU/e. Thijs made his slides available on his webpage.

lecture 1 on basics of lattices, the GGH cryptosystem (a system designed to work with lattices, but far less used than NTRU), nice illustration of LLL (click on the arrow to play) and of BKZ, a more advanced algorithm.
lecture 2 with very nice illustrations of the main algorithms for computing shortest vectors (not just short vectors as in LLL); covering both enumeration algorithms and sieving algorithms. The latter brings you all the way to research results (by him) from a paper that will appear at Crypto

Finally we showed how breaking NTRU can be seen as a shortest vector problem. We wrote B as a basis matrix containing known entries -- the public key h (one power of x per position), general parameter q and some optimization parameter α -- and showed that the lattice spanned by B contains (α f,g) as a vector. Choose α to balance the sizes. If this vector is short enough so that LLL can find it, the system is broken. If not, more sophisticated lattice attacks seem necessary -- which brings us back to the NTRU challenge from before.

Further reading: Thijs' slides are a good start to see one more cryptosystem and a signature scheme based on lattices and to see algorithm for computing the SVP. There is also a page on Lattice challenges including a page on SVP challenge.

The lattices used in cryptography are often not general lattices but carry more structure. E.g. the lattice in NTRU comes from computing in the polynomial ring modulo x^N-1. Such ideal lattices might be easier to break than general lattices. The ideal lattice challenge does not show a drawback but there are hot discussions right now on cryptanalytic algorithms google group and we will have a talk by Leo Ducas on Friday May 22 at the Cryptography Working Group in Utrecht.

Anne Smeets took pictures of all black boards before erasing them. They are here.