Joinmarket.me archivehttps://joinmarket.me/2020-12-07T00:00:00+01:00Proof of Work, a pictorial essay2020-12-07T00:00:00+01:002020-12-07T00:00:00+01:00Adam Gibsontag:joinmarket.me,2020-12-07:/blog/blog/pow-a-pictorial-essay/<p>an investigation of proof of work</p><h2>Proof of Work, a pictorial essay</h2>
<p>In the modern nation state, courts ("justice") operate with the threat of violence. The exact way a dispute is settled varies (jury, single judge, etc.), but the "finality" of the resolution they provide is based on the fact that the state asserts the final say, and if you don't agree, it doesn't matter, because they have men with guns. Importantly, they don't just have <em>some</em> men with guns (that's something <em>you</em> might have, too), they have an overwhelming number of men and increasingly large guns depending on how much you don't want to accept their say.</p>
<p>If you are thinking "that's a stupidly over-simplified way of thinking about modern justice systems", then that's fine; a lot of what follows is precisely about the <em>finesse</em> around this simplified view of the world.</p>
<h3>The threat is stronger than its execution</h3>
<p>This aphorism comes from the world of chess (as far as I gather); it's somehow both obvious but also subtle at the same time.
Forgive the sidetrack, but it reminds me of the possible originator of this phrase, a controversial figure in the history of chess - Aron Nimzowitsch. One (apocryphal?) <a href="http://www.caissa.com/ext/bulletin/ms/tlda0000/fms1221935698008892000004-%22Why-must-I-lose-to-this-IDIOT-%22">story</a> of him features him pointing at his grandmaster opponent and loudly proclaiming "Why must I lose to this idiot?!". But to the matter in hand - Nimzowitsch wrote one of the most famous early-ish treatises on chess strategy called "My System".</p>
<p><img src="../../../images/aronsystem.jpeg" width="436" height="436"/></p>
<p>in this book he popularised (amongst many other less controversial things) the idea of <strong>overprotection</strong> - so for example, you have a pawn on square e5, and he considered it good strategy to protect it with a knight on f3, a rook on e1 and a bishop on g3 etc. etc. Many players - indeed top grandmasters - today find this idea faintly ridiculous. What interests me is that when they deride the idea they essentially never give credence to the element of truth it surely contains, which is this: if N > 1 pieces protect a key square, it means that <em>any</em> of them can move without losing protection of that square. In other words the <em>potential</em> for <em>any</em> piece to move is preserved (contrasted with a single protector, which is therefore bound and cannot move away from protection).</p>
<p><img src="../../../images/overprotection.png" width="436" height="436"/></p>
<p>The idea behind the phrase "the threat is stronger than its execution" - which sounds paradoxical at first - is similar. Potentiality is more difficult for the adversary to handle than actuality, mostly because the adversary may have to handle many potential actions, rather than the one that is actualised.</p>
<p>This can manifest psychologically - inducing <em>fear</em> in an enemy is often a very excellent strategy, rather than directly attacking them. But it's important to understand that it's not just mind games. It's a matter of economy, and a matter of abstraction. Just like in the development of quantitative disciplines (all the way from early mathematics to modern computer programming, and much modern science), the development of abstractions allows for economy, which maximally leverages resources, and even open up whole new dimensions that were previously inaccessible.</p>
<h3>The virtualization of violence</h3>
<p>This line of thinking naturally leads us to how violence as a concrete actualization of will tends to get abstracted away. This is seen across nature, <em>not</em> only in human societies. It's well understood how access to mating in social animals (various types of mammals) is "arbitrated" through combat, and importantly, the combat is rarely to-the-death. Even more strikingly, the "combat" is reduced to competition over attributes <em>which might allow for better success in combat</em>. The obvious example is the stag's antlers:</p>
<p><img src="../../../images/antlers1.jpeg" width="436" height="436"/></p>
<p>That's a lot of physical raw material, requiring a lot of nutrition, and with unclear direct utility (very inefficient!). These are only a quasi-abstraction from real horns designed to kill - they <em>can</em> kill other stags, but apparently it's very rare, but they have evolved to be very visually assessable by other stags, and their complex geometry allows for battles ("horns locked") which are more of an assessment of the ability to kill, than an attempt to do so. There are of course many further examples that are more obviously abstractions. Further, displays may not be of attributes designed to inflict violence, but attributes that display high levels of general fitness, which itself is an abstraction from "ability to gather a lot of resources". The peacock being the most famous visually striking example, albeit the exact mechanisms involved may not be a matter of settled science.</p>
<p>i<img src="../../../images/peacock.jpg" width="436" height="436"/></p>
<p>I will leave the biologists to extend this list further with more obscure examples...</p>
<p>A rather interesting summary of one perspective on these phenomena is <a href="https://en.wikipedia.org/wiki/Handicap_principle">the handicap principle</a> (which somehow I had not read before writing most of this document, and as you will understand from reading it, I now rather wish I had!).</p>
<p>Also, the two examples we've seen so far show how there are two sides to this phenomenon: show dominance by showing the <em>ability</em> to win a fight without fighting; show superior suitability by showing the <em>ability</em> to gather resources without actually gathering resources. They are clearly not <em>completely</em> distinct.</p>
<h3>The same in modern human society</h3>
<p>We can see both types of 'abstraction' very clearly even in modern society. The violence-competition is virtualized in sport, most obviously:</p>
<p><img src="../../../images/gridiron.jpeg" width="436" height="436"/></p>
<p>(It's probably not an accident that in many places, sportsmen are pretty much at the top of the mating hierarchy, at least to some females!). But lest we get <em>too</em> abstract, let's not forget that just generally, displays of violence <em>capability</em> are also a big part of human society, even at the nation-state level:</p>
<p><img src="../../../images/militaryparadewithtanks.jpeg" width="436" height="436"/></p>
<p>Once we start thinking about human behaviour this way, we can see it everywhere. Consider the engagement ring and ask yourself where this tradition might come from:</p>
<p><img src="../../../images/engagementring.jpeg" width="436" height="436"/></p>
<p>We are not so different from peacocks here ... and it's not of course "irrational" in any but the most inane sense. Look at the analogy between the peacock's display and the engagement ring more closely, both:</p>
<ul>
<li>are costly in resources to create</li>
<li>are visually appealing</li>
<li>are (visually) very distinct from the normal "stuff" in the environment</li>
<li>are very immediately and easily recognized <em>by any viewer, on their own</em></li>
<li>signal that the creator is part of a group (genetic, or cultural)</li>
</ul>
<p>Much less obvious I think is that the <em>permanence</em> of these displays is not central. Sometimes they are very delicate and fragile (think: flowers, think: sports displays with substantial risk of injury that would cause permanent inability to repeat them), and this is <em>almost</em> the point - to the receiver of the signal, what matters is that the signal was unambiguously difficult to create. What matters much less is the "substrate", i.e. what the signal is "made of". This is part of the insight in the "handicap principle":</p>
<blockquote>
<p><strong>what matters is what you <em>couldn't</em> do, because you did this.</strong></p>
</blockquote>
<p><img src="../../../images/smallbirdinjungle.jpeg" width="436" height="436"/></p>
<p>Further, imagine yourself as a small bird in the jungle - to find mating partners in this <strong>extremely</strong> noisy environment, you are looking for a small signal - a patch of bright colour - in this high noise environment. You want the signal to be unambiguous - blue where everything is green, yellow, brown, red - and costly. You don't want to have to compare it to something else to check it's correct (is it the same color as something else on the other side of the jungle? sheesh!). You really don't care about semantics - you don't care what the signal "means", except specifically that it's in some sense pre-agreed (perhaps genetically? see the last bullet point); to an outsider the signal could be outlandish or ridiculous, and that doesn't matter.</p>
<h3>The court, the bank, and the abstraction of money</h3>
<p><img src="../../../images/bankofmontreal.jpeg" width="436" height="436"/></p>
<p>You can see it in architecture; there is of course a very good reason why historically banks were built of extremely sturdy materials, at considerable cost. They were originally actually vaults for high concentrations of physical wealth, and had to protect well from direct frontal assault.</p>
<p><img src="../../../images/hongkong.jpeg" width="436" height="436"/></p>
<p>Nowadays this is another abstraction of the type already mentioned; if an organization put <em>that</em> much money into building such an imposing building or flashy skyscraper, they're hardly likely to steal my pathetic little stash! Court buildings likewise represent an abstraction of the state's power, and so do government offices (this is particularly obvious in more authoritarian states like China, where the government buildings in smaller towns look almost absurd (this example is very typical, in Luxian):</p>
<p><img src="../../../images/luxian.jpeg" width="436" height="436"/></p>
<p>The whole "virtualization" paradigm, taking concrete physical force and replacing it with "threats, stronger than executions" has entirely entered the realm of money too. Not to dip my toe into the various debates about the origin of money, I'll just talk about recent history: we moved from bearer instruments and certificates for bearer instruments, through to certificates representing pure "fiat" in the literal sense - fiat meaning the will of the governing power, essentially. So "fiat is backed by men with guns", the famous <a href="https://www.youtube.com/watch?v=MJWi8VUHUzk">Krugman</a> quote,</p>
<p><img src="../../../images/krugman.jpeg" width="436" height="436"/></p>
<p>... is certainly right <em>in essence</em> even if you quibble over various details. In the same way that courts are "backed by men with guns".</p>
<p>But I think it's important to step back and consider what role money takes in a society - its purpose is always exactly abstraction. It solves the "double coincidence of wants" problem by creating an entirely new class of good that no one originally wanted (this is the counter intuitive about much mathematics - to solve a problem involving 2 or 3 things you add another thing, superficially making everything more complicated, but suddenly everything 'falls into place', creating a new structure with more symmetry. For example, if the general solution of polynomials is tortuous and even insoluble (see: <a href="https://en.wikipedia.org/wiki/%C3%89variste_Galois">Galois</a>) for real numbers, by adding an apparent complexity: a new made-up solution to x^2=-1, you suddenly find that everything cleans itself up (see: <a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra">the fundamental theorem of algebra</a>). Money does something similar, because we replace an O(N^2) pricing problem with an O(N) problem - i.e. you don't have to price everything against everything, you can price everything against money. To be clear that is not the <em>whole</em> story, as not <em>any</em> good serves the role of money well, so money itself is not a pure abstraction, it's just that its purpose is abstraction.).
Its role being abstraction, it is about <em>reducing the cognitive load of needing to form relations amongst things, from something unworkable to something reasonable</em>. To give a .. non-abstract ... example: in days before anything like money, to trade bread for shoes, I (the baker) needed to know Fred the shoemaker and establish a long term semi-trust relationship with him (this was of course mediated via a tribe - woops, that's an abstraction!). So, even if there is obviously social stuff around it, money itself is intrinsically not social - it is precisely because of the limitations of social group formation that we even need money at all. To the Bitcoiners out there, a reminder "all the trust required to make it work" - that's a phrase you should have ingested.</p>
<p>The above paragraph is something that somehow many Bitcoiners and other cryptocurrency advocates very often seem to have failed to internalise, and it's crucial. For example, "it's fundamentally a social phenomenon", "it's money controlled by the people" and many similar phrases I hear again and again, are categorically wrong. You have it exactly opposite to the way it actually is. Money is <strong>intrinsically not social</strong>, I repeat - even though it of course exists in a social context, its purpose is to counteract the limitations of that.</p>
<h3>Reification and information</h3>
<p>The verb "to reify" is pretty obscure, but one which I think is very apposite, and under-used in this context. To save you a look up:</p>
<blockquote>
<p>transitive verb To regard or treat (an abstraction) as if it had concrete or material existence.
To make into a thing; make real or material; consider as a thing.</p>
</blockquote>
<p>I think it's often used almost pejoratively (i.e. you treat something as real when it really isn't), but I'm preferring the second, positive version of the meaning above. Perhaps my use is whimsical, but I'm sticking with it, because there aren't really good words for this.</p>
<p>I am somehow reminded of a conversation with a childhood friend, who was fascinated by computers even in the early 80s when there basically weren't any around; he asked me 'is there anything that computers, using binary, won't be able to copy?' - thinking of music, art etc. The debate about what is "virtual" vs what is real is of course ongoing to this day.</p>
<p>The designation "virtual currency", beloved of certain lawmakers and other sophists, is relevant, but I won't be able to explain my full perspective until we get to the end of this essay. What's crucial as a starting point is to recognize that calculations aren't "weightless" (costless).</p>
<p>Any calculation requires actual work, however slight (I've no idea how many joules it takes your brain to turn 3+8 into 11 but it's not zero). While this makes calculations appear to be a poor candidate for the creation of costly signals, it only takes a short reflection to see the <em>tremendous</em> advantage of this type of work for creating signals: they are extremely unambiguous.</p>
<p>Moreover, in the realm of cryptography specifically, rather than computation in general, a long history of research has yielded us techniques for creating calculations with this huge lopsided asymmetry: they can be exponentially quicker to "do", compared with how long they take to "undo" (this is a sidetrack but: in a way, this is one of <em>the</em> stories of the twentieth century: how the development of fast computation opened up windows into areas of mathematics that allowed this. The canonical example is prime factorization: it was probably known hundreds of years ago that the factorization of semi-primes N into primes p_1, p_2 becomes (sometimes) stupidly hard for large N. But nobody really cared until we developed computers. At that point it (very slowly; see Clifford Cocks, then people like Diffie et al, in the 70s) started to become clear that we could create <a href="https://en.wikipedia.org/wiki/Trapdoor_function">"trapdoor" one way functions</a>, i.e. functions that were fast to undo if you know a secret, and effectively impossible to undo otherwise, and do it in the real world. Etc. This simulates an asymmetry of force or power.)</p>
<p>The story for cryptographic hash functions is a bit different from my parenthetical sidetrack above, but in the interests of avoiding a Crypto 101 course (I advise you go <a href="https://toc.cryptobook.us/">here</a>), let's just say that they share the main point of being <em>very</em> easy one way and "hard to reverse" only in the sense that you can't easily find <em>another</em> input for the same output.</p>
<p>But that doesn't mean that a hash function calculation satisfies the needs of the "signalling" mechanism we saw in nature, above. Because the method of verifying it is just re-calculating it, and therefore it's cheap, full stop. To make a new "function" H*, trying to take that role of a signalling mechanism, from an existing hash function H, we need it to be:</p>
<ul>
<li>unpredictable in its output, given its input</li>
<li>not just unpredictable in its output, but actually (deterministically) random in its output</li>
<li>require that the output has a special low entropy form</li>
</ul>
<p>(the weird 'determinstically random' phrase comes from the fact that it's properly a function: even if the output 'looks random', you will still get the same output for the same input, always).</p>
<p>If you think about it, our H* which comes from a cryptographic hash function H, will satisfy the first 2 bullet points above just from the properties of H, but by adding that last requirement we get, overall, something very similar to the first 4 bullet points above in the section on "modern human society". And that, precisely because of this additional "low entropy" requirement. (For the non-physicists, think of "low entropy" = "high degree of structure or symmetry" which is at least loosely connected with beauty or aesthetics).</p>
<p>So in summary, demanding a low entropy output from a hash function results in a <em>costly signal which is very unambiguous and easy to verify</em> - even if the <em>content</em> of that signal is utterly meaningless, in itself.</p>
<p>And that's exactly what Bitcoin's proof of work algorithm does:</p>
<p><img src="../../../images/bitcoinblockhash.png" width="436" height="436"/></p>
<p>Most readers probably know it, but for completeness: what's strange about these long hexadecimal strings is that they have "a lot of leading zeros" which really just means that as an integer, the value is <em>much, much</em> less than the maximum value it could be, and that will happen only with <em>extremely</em> low probability. The inference is that this output of H* requires <em>trillions</em> (actually a lot more but whatever) of calculations of the underlying hash function (SHA256) H, with overwhelmingly large probability.</p>
<p>While we're on the topic, let's address the constant cringe-inducing "solving super-hard/sophisticated mathematical problems" that you see in much bad journalism about Bitcoin. The deeper reason it's wrong is not a quibble about the exact construction of a hash function, it's that the calculation is inherently random and dumb (see "progress-freeness"), all internal structure is an artifact, which is why we have often plaintively requested that it is analogised as a lottery rather than a 'mathematical problem'. Unfortunately this generally falls on deaf ears and curious learners no doubt find themselves confused how on Earth this 'Bitcoin' thing makes any sense ...</p>
<p>The creation of these hashes represents a kind of <strong>reification of information</strong>. The zeros in the above block hash digest are just a pattern, but hidden in that pattern is a real energetic raw cost, that can be quantified (though, see caveat below).</p>
<p>In some sense my childhood friend's question was about the distinction between a digital (binary?) world and a real one. A mechanism like proof of work acts as a bridge between the two, and I'd argue, a bridge very similar to the one between the hard physical reality in which an animal like a peacock operates, and the abstract/virtual realm of the signalling involved in their mating rituals. In an adversarial environment, one in which there are stakes, picking out the "real" from the "fake" means identifying signals which are objective, and the only signals that are objective are the ones that are <strong>demonstrably costly</strong>.</p>
<p>Proof of work hashes are demonstrably costly (technically it's only statistically true but the law of large numbers applies overwhelmingly). They are not just analogous to earlier examples from nature, and human society (like beautiful buildings or engagement rings), but they are even <strong>vastly better</strong> - because their verification vs cost asymmetry is absurdly larger (consider the cost of assessing the diamond in a ring, consider the extreme subjectivity of assessing the military might of a nation - with the concomitant huge cost).</p>
<p>The caveat : the real world energetic cost of PoW hashes is variable and actually difficult to measure, but that only drags it back somewhat to the same realm as some of the other good examples; taken as a whole it is far superior, as a signal, at least if deployed at global scale, as it is today in Bitcoin (so that the cost is subject to very large scale market forces).</p>
<h3>What proof of work replaces</h3>
<p>With our biological and technical diversions largely complete, let's come back to the idea of courts, justice and violence.</p>
<p>While a system like Bitcoin cannot replace most societal structures, it <em>does</em> attempt to change the function of money creation (debatable, but most agree) and money transfer (not debatable; it is pointless otherwise). To do the latter it needs to resolve conflicts, which are inevitable in an adversarial environment. To do that it needs to have a tiebreaker mechanism, one which we would like to be as neutral as possible (but: people often forget that this model doesn't magically allow "perfect fairness" unless you properly contextualize it: miners can form blocks however they like, so pre-confirmation you are in the wild west, your payment may be superseded by another one), and the proof of work function provides a tiebreaker mechanism that is based on an objective, easily verifiable, very costly signal.</p>
<p>Let us again revisit our bullet point list for "good signals":</p>
<ul>
<li>are costly in resources to create</li>
<li>are visually appealing</li>
<li>are (visually) very distinct from the normal "stuff" in the environment</li>
<li>are very immediately and easily recognized <em>by any viewer, on their own</em></li>
<li>signal that the creator is part of a group (genetic, or cultural)</li>
</ul>
<p>The cost (point 1) has been heavily emphasized, but remember the 4th: the independent, objective and easy verification of the signals is paramount. In a human legal system, context is <em>everything</em> - see the idea of <a href="https://en.wikipedia.org/wiki/Case_law">case law</a> for example - a thing is considered true by reference to another thing. This is practical and logical because the intrinsic nature of such justice systems is subjective - contracts are written in human language and ethics is at least substantially (if not totally) a matter of convention - even when laws are agreed by legislators, it is still a matter of interpretation ... so overall, what is legal is also subjective, in a sense my scientist brain flinches from.</p>
<p><img src="../../../images/blindjustice.jpeg" width="436" height="436"/></p>
<p>That's not to say that attempts to make legal systems more objective are worthless, far from it - see e.g. <a href="https://en.wikipedia.org/wiki/Judicial_independence">independent judiciary</a> which is an extremely important element of civilization. But, again, remember "all the trust required to make it work".</p>
<p>(Danger: personal opinion mode on)
The philosophy behind Bitcoin isn't <em>necessarily</em> anarchism in any particular form, even if many anarchists align with Bitcoin, the philosophy is simply to make money independent of human control, because the system of money <strong>can</strong> be made independent of human control in the presence of strong cryptography and the proof of work signalling mechanism.</p>
<p>If you accept that it's at least possible, it's worth investigating further, as this essay does, what is the deeper meaning behind proof of work and why and to what extent it's an essential building block for this form of money.</p>
<p>So, if you have such a "reification of information" tiebreaking mechanism, which doesn't depend on context, then you have a way to build a fixed unique "truth timeline" - a sequence of transactions moving money from A to B, to put it most crudely, that everyone can verify quite trivially, and can see was very costly to create. The cost that was imposed represents each participant's defence against attack - changing history requires substantial <em>energy</em> cost, not simply a change in other people's opinion or a different subjective framework (as is the case with legal systems, or voting systems, or anything based on human social agreements).</p>
<p>Does proof of work then, replace ATM machines and bank tellers, as people sometimes say? I hope this now induces the appropriate face palm response. Comparing the cost of proof of work with these costs is very wrongheaded, not so much because it's totally unconnected, but because it's a fairly trivial cost, compared to what really matters!</p>
<p>If anything banking related, compare it with the cost of the huge skyscraper that the bank put in Hong Kong or Manhattan. You get trust from "weight", the "non-virtual" - the things that can't easily be copied, rewound or otherwise reversed. This is why, as earlier mentioned, calling Bitcoin a "virtual currency" is asinine. It's precisely the opposite to the truth: as I once put it to Superintendent Lawsky in an AMA: "which currency sounds more virtual, the one whose supply can be doubled overnight at the stroke of a pen, or the one that requires millions of dollars of electricity to create 25 new units?" (etc. you get the idea; modern fiat currency is absolutely virtual, whether printed on paper or not).</p>
<p>Continuing down this line of thinking, does proof of work as a signal replace standing armies, too?</p>
<p><img src="../../../images/aircraftcarrier.jpeg" width="436" height="436"/></p>
<p>I think it's a non-trivial question, and probably related to the distinction I made earlier, in the biological realm: sometimes signals for mating suitability demonstrate potential for violence, sometimes potential for resource gathering, albeit the two are not purely distinct. The purpose of an army and other force-projection arms of a nation state is after all not primarily to ensure that the currency is used, it's mostly for competition with other states over resources, but things like quelling uprisings/revolutions/political dissent (in some authoritarian states), and exerting force relating to taxation, as well as resolving large scale disputes all merge somewhat. So I don't think bitcoin's cost should be compared with the cost of the US military for example, but you could imagine (albeit it's ludicrously speculative) somehow apportioning part of the cost as being comparable. Then there is the legal system; again, it's not as if Bitcoin replaces legal functions, but there is enough of a connection there that it has to be considered. It's not a simple question :)</p>
<h3>So is all this waste bearable?</h3>
<p>The discussion of whether proof of work is wasteful is closely related to the question of what its purpose is, which is what the entirety of the preceding sections discussed.</p>
<p>Please understand that many people will sincerely ask this question, but will intrinsically be presupposing the answer unconsciously: they don't understand <em>at all</em> what the purpose of proof of work is (see "complex mathematical problems" as per above), and so to them it's <em>almost</em> obvious that it's far too wasteful, they're just double checking. If you want a good response, just say this:</p>
<blockquote>
<p>"Are stag's antler's wasteful? Is the peacock's display wasteful? If you can answer these questions, I can answer yours."</p>
</blockquote>
<p>Anyway, if you agree with the previous section ("What proof of work replaces"), then we can discuss more seriously. The biggest energy consumers are often militaries (which isn't really surprising), but also bureaucracies generally if we take all government functions in toto. Comparing bitcoin's energy consumption to all the bits and pieces of infrastructure it might replace is very, very tricky (for the third time, consider that beautiful phrase "all the trust required to make it work" - trust costs energy).</p>
<p>Does it help that bitcoin's energy usage is more "fungible"? (using the term very loosely - I mean bitcoin can convert any form of energy without dependence on location, which is emphatically not the case for most energy usage in society, and energy is infamously non-portable with the exception of petroleum). I think it does, but I wouldn't overstate it. Similarly, it's true that bitcoin takes up slack such as stranded hydro. It's true that it incentives innovation in energy conversion technology (see recent initiatives for burning off natural gas).</p>
<p>But these arguments are supplemental rather than central - they won't and shouldn't convince a skeptic. The key point is to understand (a) that proof of work actually has a useful function; if you don't see that and think it's pointless and stupid then of course even 1MW spent on it is a disaster and (b) that there is little value in thinking in terms of reduced energy
consumption, long term, for the human race (that is ENTIRELY another essay though ...)</p>
<h3>Signals in the jungle and signals in the lab.</h3>
<p>Remember our small bird in the jungle looking for a mate? Her (his? insert biological knowledge to your taste) problem was not really distinguishing a signal with a certain meaning, rather it was to find the signal at all, and to have it be "real" in the sense we just elaborated on, and not have to rely on something else. But here I want to emphasize the word <em>jungle</em> - "it's a jungle out there!" they say, whoever "they" are ... and what do they mean? Essentially that in society a lot of people are out to get whatever advantage they can, by any method they can. This is particularly true in urban environments and nowadays on the internet, where we have strayed far from Dunbar's number:</p>
<p><img src="../../../images/dunbarsnumber.jpeg" width="436" height="436"/></p>
<p>... meaning that for many, it's a sensible tradeoff to ditch personal ethics and try to screw over every stranger you come across. Basically, especially when it comes to money, we should assume we are working in an environment which is:</p>
<ul>
<li>Very adversarial</li>
<li>Very noisy</li>
</ul>
<p>This applies to things like P2P networks, to social media, to software stacks, and certainly to a blockchain purporting to hold your wealth!</p>
<p>People usually try to address these issues with reverting to trust based models. They use systems they trust because they know the creators (those who remember 2017 ICOs will remember the hilarious noise around the faces on websites,</p>
<p><img src="../../../images/icoadvisor.png" width="436" height="436"/></p>
<p>... including sometimes the insertion of faces of celebrities who had nothing to do with the projects), or hope somehow that the system has some fallback into pre-existing systems. That's all fine and good but it essentially means just throwing up your hands and giving up, in a Bitcoin context. We are trying to build a system that can actually survive in the context of the above two bullet points, not retreat from the jungle and stay at home.</p>
<p>"In the lab", however, there is a long history of people trying to build systems that address the need for a tiebreaker mechanism (let's just call it "consensus") amongst a group, but without thinking in quite such a radical way. See for example <a href="https://en.wikipedia.org/wiki/Byzantine_fault">Byzantine Fault Tolerance</a> which is a much, much older area of study than cryptocurrencies, and addresses what happens if some members of a set, trying to maintain consistency on some state, deal with failures and adversarial behavior. It's interesting to muse on the fact that some of the earliest results in this field ended up with something vaguely along the lines of "no more than 1/3 dishonest actors"; this presumably comes out of the fact that that's the threshold at which there are 2 honest for every 1 dishonest. Remembering we are considering distributed systems, having 2 good for every 1 bad would allow for tiebreaks (apologies for the crudeness of the description - I am not very educated in this area).</p>
<p><img src="../../../images/pbft.jpeg" width="436" height="436"/></p>
<p>But notice: these lines of thinking are based around "a set of actors", but the idea of proof-of-work-in-the-jungle is to step outside even that framework: we are building a system in which <strong>there are no identities at all</strong> - you can't, and don't want to, in Bitcoin, assume that there are a specific set of entities who are in the "quorum" to make decisions. There is no decision-by-group in Bitcoin, really, because there is no group. To quote the Bitcoin <a href="https://bitcoin.org/bitcoin.pdf">whitepaper</a> ,</p>
<blockquote>
<p>The network is robust in its unstructured simplicity. Nodes work all at once with little coordination. They do not need to be identified, since messages are not routed to any particular place and only need to be delivered on a best effort basis. Nodes can leave and rejoin the network at will, accepting the proof-of-work chain as proof of what happened while they were gone.</p>
</blockquote>
<p>That latter point, that there is no need for liveness, is a <em>direct</em> result of the "reification of information" property. You cannot get that by any kind of sophisticated quorum, because you're back to <em>subjective</em> decision making. Bitcoin's <em>ruleset</em> may be just "the will of the people" - and that's exactly why it has to be both very simple, and extraordinarily difficult to change - but Bitcoin's <em>history</em> is emphatically not "the will of the people", it is brought into existence by costly, and deliberately meaningless, work.</p>
<h3>Proof of Stake</h3>
<p>Attempts to reinvent Bitcoin using "proof of stake" instead of proof of work ultimately fall into the "in the lab" category above. By presupposing existing sets of actors (which can of course change, but nevertheless) you can create increasingly sophisticated quorum systems, but they rely on context, i.e. they are inherently subjective in the absence of the "reification" mechanism. This will work in certain contexts, but not in the jungle; they can work if some liveness is assumed (a bit like how <a href="https://lightning.network/">Lightning network</a> can get a bunch of extra desirable properties, but only by requiring more liveness (and relying on the underlying blockchain to keep doing its thing, of course)).
Note that the more modern proof of stake design attempts at least try to address the kind of reasoning seen in this essay, in particular the handicap principle - remember our "what you <em>couldn't</em> do because you did this"? In a naive proof of stake the answer to that is basically nothing, giving rise to what is known as "nothing at stake", which means a participant can choose to live in multiple realities simultaneously, at no (or very little) cost. In less naive attempts like that explained <a href="https://eth.wiki/concepts/proof-of-stake-faqs">here</a>, under the "What is the nothing at stake problem..." section, there is an addressing of the point in some detail, with diagrams (so I strongly recommend giving it a read if you're interested), ending with:</p>
<blockquote>
<p>The intuition here is that we can replicate the economics of proof of work inside of proof of stake. In proof of work, there is also a penalty for creating a block on the wrong chain, but this penalty is implicit in the external environment: miners have to spend extra electricity and obtain or rent extra hardware. Here, we simply make the penalties explicit.</p>
</blockquote>
<p>I disagree <em>in general</em> with this viewpoint: you cannot replicate (which is really, simulate) the economics of the "reification of information" by simulation, at least not per se; because the cost remains <em>inside the simulation</em>.</p>
<p>What you can do of course, is beg the question, by assuming some form of ground truth outside the simulation; and use that as the tiebreaker.</p>
<p>If you want the more critical takes on this, I can recommend Poelstra's old write up on proof of stake <a href="https://download.wpsoftware.net/bitcoin/alts.pdf">here</a>, see Section 6.4, and also a more economically focused analysis by Sztorc <a href="https://www.truthcoin.info/blog/pow-cheapest/">here</a> - Sztorc has a radically different intellectual background to someone like me or Poelstra, and while he tends to write long (look who's talking), I think he's definitely worth reading.</p>
<p>Anyway. At best, modern attempts to make such systems are simply refinements on pre-existing mechanisms for consensus finding in groups from the '80s, as mentioned above. At worst, they will degrade in one of two directions: either towards obfuscated proof of work, where the decision making of the history is a function of how much effort is spent simulating differently favourable histories, or alternatively (more likely), towards an increasingly fixed set of owning/staking entities (see: rentier class, rich-get-richer) who get to decide disputes in their own favour. It's also notable and perhaps revealing that they presuppose the existence of a bootstrapping mechanism for distributing ownership of coins, which proof of work does not need.</p>From MAC to Wabisabi2020-11-05T00:00:00+01:002020-11-05T00:00:00+01:00Adam Gibsontag:joinmarket.me,2020-11-05:/blog/blog/from-mac-to-wabisabi/<p>new coinjoin coordination mechanism based on MACs</p><h2>From MAC to Wabisabi</h2>
<!-- vim-markdown-toc GFM -->
<ul>
<li><a href="#preamble---big-and-randomized">Preamble - big-and-randomized</a></li>
<li><a href="#wabisabi-from-the-ground-up">Wabisabi, from the ground up.</a><ul>
<li><a href="#what-the-wabisabi-paper-and-this-article-do-not-cover">What the Wabisabi paper, and this article, do not cover</a></li>
</ul>
</li>
<li><a href="#signatures-keyed-macs-and-credentials">Signatures, keyed MACs and credentials</a><ul>
<li><a href="#creating-a-mac">Creating a MAC</a></li>
<li><a href="#algebraic-macs">Algebraic MACs</a></li>
<li><a href="#security-notions-needed-for-algebraic-macs-used-for-anonymous-credentials">Security notions needed for algebraic MACs used for anonymous credentials</a><ul>
<li><a href="#algebraic-mac-attempt-1">Algebraic MAC attempt 1</a></li>
<li><a href="#algebraic-mac-attempt-2">Algebraic MAC attempt 2</a></li>
<li><a href="#algebraic-mac-attempt-3">Algebraic MAC attempt 3</a></li>
<li><a href="#algebraic-mac-attempt-4">Algebraic MAC attempt 4</a></li>
</ul>
</li>
<li><a href="#mac-ggm---a-vector-of-messages-different-security-arguments">MAC-GGM - a vector of messages; different security arguments</a></li>
</ul>
</li>
<li><a href="#key-verified-anonymous-credentials-kvac">Key-Verified Anonymous Credentials (KVAC)</a><ul>
<li><a href="#how-does-issuance-work">How does issuance work?</a><ul>
<li><a href="#without-any-blinding">Without any blinding:</a></li>
<li><a href="#side-note-what-are-these-mysterious-zero-knowledge-proofs">Side note: what are these mysterious "zero knowledge proofs"?</a></li>
<li><a href="#with-blinding-of-attributes">With blinding of attributes:</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#chase-perrin-zaverucha-2019">Chase-Perrin-Zaverucha 2019</a></li>
<li><a href="#wabisabi-credentials-on-amounts-with-splitting">Wabisabi: credentials on amounts with splitting</a><ul>
<li><a href="#range-proofs">Range proofs</a></li>
</ul>
</li>
<li><a href="#final-thoughts-on-the-security-and-functionality-proposed-in-wabisabi">Final thoughts on the security and functionality proposed in Wabisabi</a></li>
</ul>
<!-- vim-markdown-toc -->
<p><em>Thanks to nothingmuch for answering several questions about the mechanics of Wabisabi.</em>
<a name="preamble"></a></p>
<h3>Preamble - big-and-randomized</h3>
<p>First, assume we think it's valuable to have big coinjoins with random amounts for all the inputs and outputs, and probably for this one specific reason: we want to make payments <em>from</em> coinjoins and, possibly, to make payments <em>within</em> coinjoins (the latter is literally what is meant by payjoin, so that is part of this discussion, note however we'd be talking about payjoin batched together with other sub-transactions, so a lot of earlier analysis of payjoin doesn't apply).</p>
<p>Second let's <em>partially</em> address why this is, at least superficially, a bad idea, even a terrible one: previous discussion of the subset sum problem pointed out that <em>some of the time</em> (being deliberately vague about how much of the time!) a coinjoin with non-equal amounts can be easily analyzed to find the sub-transactions which are really happening, removing any privacy boost. So that's not good.</p>
<p>Then, let's mention, without writing an essay about it (though it's a fascinating topic), that there are surprising outcomes of scaling the number of inputs and outputs (or just "coins") in such a model. Due the combinatorial nature of the subset sum problem (or more generally the "knapsack problem"), even having numbers like 50-100 on the input and output side (remember: these may be batched payments! not like separately created coinjoins, extra to payment transactions) can lead to a ridiculous combinatorial blowup making calculation of subsets near impossible. To illustrate: he set of subsets of a set is called the "power-set" and its size is \(2^N\) where \(N\) is the number of elements of the set; but the number of <em>partitions</em> of a set is found using Bell's number \(B_n\), which scales (or doesn't!) even faster than exponential (i.e. faster than \(a^N\) where a is constant, here \(a\) is a function of \(N\), although it's pretty complex. \(B_2 = 2, B_10 \simeq 115000\), while \(B_100\) has 116 <em>digits</em>, in decimal. So it's easy to see than even at 50 inputs and 50 outputs, the enumeration process <em>by brute force</em> is not possible.</p>
<p>This point is expanded on in some detail in the <a href="https://github.com/cashshuffle/spec/blob/master/CASHFUSION.md">cashfusion writeup</a>.</p>
<p>However the point is definitely controversial, basically because brute force is not the only way to approach an attempt to deanonymize a coinjoin. A simple thing like a rounded value (0.25000000 BTC) substantially (sort of exponentially) reduces the search space.</p>
<p>Those who completely dismiss this approach based on the idea "well sure, it's worst case impossible to analyze, but not average, typical case!" should notice a really key subtlety though: the claim is not <em>only</em> that such constructions are computationally impractical to analyze - it's also that they have <strong>multiple, and in fact a huge number of mathematically valid solutions</strong>, at least when we scale up to very large numbers of utxos. Moreover <a href="https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-maurer-trustcom-coinjoin.pdf">this</a> paper ("Knapsack") from 2017 tries to construct a framework for deliberately creating such obfuscation (with the same goal - unequal sized coinjoins, allowing payment).</p>
<p>This deserves more argument and discussion, see e.g. <a href="https://www.mail-archive.com/bitcoin-dev@lists.linuxfoundation.org/msg08575.html">this</a> discussion on the bitcoin mailing list from February this year. But we are going to move on to other elements of this story.</p>
<p><a name="wabisabi-from-the-ground-up"></a></p>
<h3>Wabisabi, from the ground up.</h3>
<p>The paper is <a href="https://github.com/zkSNACKs/WabiSabi/releases/download/build-70d01424bbce06389d2f0536ba155776eb1d8344/WabiSabi.pdf">here</a> for those who don't need context. I suspect that group is quite small though!</p>
<p>With respect the previous section, the plan for Wabisabi is <strong>not to use unconstrained random input and output sizes</strong>, as I understand it from the developers, but to use sophisticated ideas based partly on the "Knapsack" style approach mentioned above, but <strong>this is a topic for another blog post</strong>.</p>
<p>But let's say we buy into the basic idea: a large coinjoin, we'll say for simplicity, 50 inputs and 50 outputs, where there may be a complex ruleset about the values of those inputs and outputs, which we aren't specifying here. Some users will just be mixing and some may be paying someone for a specific good or service, with the output. Probably rarer, but particularly cool, will be if Alice is paying Bob but Bob also participates in the coinjoin, i.e. he is also contributing input utxos to the coinjoin, but gets more out and Alice gets less, effecting a payjoin.</p>
<p>Scenario #1 : Server as coordinator, meaning a server-defined schedule, and no privacy for users w.r.t. server</p>
<p>If we don't care if the server knows everything, each user can just securely connect and pass (set of inputs, set of outputs); they can be random amounts as per the preamble, and the server will accept if the inputs are verifiable on the blockchain, and if the total payment balances. This would be tricky for the payjoin style payments as that means interaction between the users, but in principle that could work too.</p>
<p>Note how this is hardly a new idea, even the earliest implementation SharedCoin did something similar to this (it's a long story! but let's say).</p>
<p>However this SPOF scenario seems unacceptable to anybody. The server could be keeping a record of <em>every</em> linkage, ever, of the coinjoins created in this system. Ultimately this level of centralization breaks, anyway, via external pressure or otherwise.</p>
<p>Scenario #2: Taker as coordinator, meaning taker chooses time of event, and privacy only for takers, not for makers</p>
<p>The description of what is done is exactly as above, except substitute Taker for server. The outcome is practically different: at least one user (who likely pays for it) gets a privacy guarantee. How is this different from Joinmarket today? First, it hasn't been considered seriously to use randomized amounts; second, 50 party joins have not been at all practical (until recently it was not very practical, due to low participation rate (unless you chose a narrow range of amount), however that has increased; but, the IRC message channel used is not really able to handle the traffic needed for 50 party joins, see <a href="https://joinmarket.me/blog/blog/oct-2020-update/">this</a> earlier blog post for some thoughts on that). But if you take away those issues, this scenario is <em>possible</em>.</p>
<p>But notice something - exactly what makes this new "random payments, large numbers of counterparties" paradigm attractive <strong>is the possibility of multiple payments going on at once</strong> - and that's counter to Joinmarket's original concept of "there is a guy paying for the privilege of controlling everything". More on this later.</p>
<p>Scenario #3: Current Wasabi, Chaumian coinjoin</p>
<p>I have only passing familiarity with the technical underpinnings of Wasabi as is, but essentially it is based on blinded signatures of a coinjoin output (see <a href="https://github.com/nopara73/ZeroLink/#a-simplified-protocol">Chaumian coinjoin</a> for a pretty intuitive diagrammatic explanation). This fairly simple cryptographic primitive (the blinded signature) is in itself enough, because Wasabi currently is only blinding the specific set of outputs (utxos-to-be) which all have equal size and are indistinguishable. As long as the Wasabi coordinating server is prevented from knowing those linkages, due to the blinding, then the later full construction of the transaction will not expose ownership of the equal-sized outputs ("coinjoin outputs").
On the other hand, let's not have <em>too</em> simple of a mental model of Wasabi - it's crucial in this that the users make separate network connections (effectively, have separate pseudonyms) for when they present their cleartext outputs, and when they earlier presented their inputs (and change)); otherwise the cryptography would be sidestepped and the server would know all the mappings.</p>
<p>Can you get the same protection, i.e. keeping the linkages private from the server, in a big-and-randomized model, using current Wasabi blind signatures?</p>
<p>It's easy to see the problem: when the user comes along with a new identity and says "here are my outputs: 0.29511342 BTC and 0.01112222 BTC' the server has no way of knowing that these amounts correspond to anything in the inputs. If the blind signature is being used as a token to say "I am entitled to add outputs to this coinjoin", fine, but in this scenario: a token of <em>what</em>, exactly?</p>
<p>The difference is clear: in equal-output coinjoin there is only one kind of thing you could be entitled to: a single output of the prescribed amount ("denomination"); typically it's things like 0.1BTC.</p>
<p>Here, if we were to preserve the tokenization approach, we'd have to have a more sophisticated object ... something similar to a supermarket gift card : it gives you the right to have a certain amount of stuff, restricted perhaps in time and space, but quantified. It's something that's issued to you, which you can use under the given conditions, but which does not have your name attached. I realise the analogy is a bit of a stretch, but you can see that gift cards have divisibility, which is crucial here in our big-and-randomized model. They usually also have anonymity which is clearly necessary.</p>
<p>What we need here is a homomorphic anonymous credential with attributes:</p>
<ul>
<li>
<p>homomorphic - here it means we could linearly split and combine credentials. Take a credential for 10 and turn it into two credentials for 3 and 7, for example.</p>
</li>
<li>
<p>anonymous - if the credential presented could be linked to the one issued earlier, the coordinating entity can see all the linkages in the coinjoin</p>
</li>
<li>
<p>credential - this term is used in cryptography for any of a number of schemes that give rights to holders. The rights are usually <em>with respect to</em> some centralized entity, usually holding a private key that allows them to create such credentials (modulo a nuance about who is verifying them; we'll get in to that).</p>
</li>
<li>
<p>attributes - a credential could in its simplest form be simply binary: you are allowed to do X if you have the credential, and not otherwise. But sometimes attaching metadata inside the credential (think of e.g. a signature from a server that proved you should have access, but also that you are a level 3 user not a level 1 user, by including the level in the message that was signed).
<a name="do-not-cover"></a></p>
</li>
</ul>
<h4>What the Wabisabi paper, and this article, do not cover</h4>
<p>What follows is a detailed review of the crypto constructions leading to the possibility of building a coinjoin system, with such a credential system. A full protocol however must cover other things:</p>
<ul>
<li>The rules for transaction construction</li>
<li>Valid choices of amounts for inputs and outputs</li>
</ul>
<p>This is not covered here, other than some general thoughts as outlined above.
<a name="signatures-keyed-macs-and-credentials"></a></p>
<h3>Signatures, keyed MACs and credentials</h3>
<p>Digital signatures are probably very familiar to any reader of this blog, and there is a detailed discussion of some fundamentals in <a href="https://joinmarket.me/blog/blog/liars-cheats-scammers-and-the-schnorr-signature/">this</a> post. MACs, or Message Authentication Codes can be thought of as the symmetric crypto equivalent. In symmetric crypto, there is only a secret key, no public key, and that means there is no such thing as "public verification". The owner or owners of such a secret key can create a (probably unique; this is a nuance of the theory) "tag" on a message, which only a holder of the same key can verify.</p>
<p>On its face, such tagging might seem pointless without public verifiability, but the classic use case is for point to point communications over the public internet, in which both endpoints of the communication hold the secret key; by tagging messages in this way, integrity is assured, and the message is authenticated as coming from the intended source. Such secret keys can be pre-shared over a public communication channel using techniques like <a href="https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman">ECDH</a>.
<a name="creating-a-mac"></a></p>
<h4>Creating a MAC</h4>
<p>A simple and currently very common way of making a MAC is to use a cryptographic hash function as a PRF: just hash the key <em>and</em> message together (<a href="https://en.wikipedia.org/wiki/HMAC">HMAC</a> is a bit more complicated than this, but that's the basic idea): H(message || key).</p>
<p>At first sight it may seem weird that I'm talking about this construct - how is this related to credentials?</p>
<p>The most natural way to create a credential of the type described above, is to use a signature, which simply signs the rights of the holder. That's effectively what Wasabi's original design ("Chaumian coinjoin") does, but with the crucial extra feature that the signature is <em>blind</em>, so that the credential's redemption is not linked to its creation. Early ecash designs (indeed, from David Chaum as well as others) were heavily sophisticated variants of that basic idea. Just as original Wasabi uses fixed denominations, so did those ecash designs.</p>
<p>This is where we get some interesting twists, which bring in MAC as an alternative to signatures, here.</p>
<p>First, traditionally, MACs were preferable to signatures for performance reasons: they use hash functions, not expensive crypto math operations like RSA or - less expensive but still a lot more so than hashes - elliptic curve calculations. This is less a consideration today, but still relevant. Second, the more restrictive model of the MAC w.r.t. verification does create a different effect: such MACs are repudiable, whereas digital signatures are not repudiable (if you think about it, this is the same property as transferrability, which is of course a key property of signatures).</p>
<p>This plain vanilla style of MAC though (hash based), trades off functionality in favour of performance - hashes like SHA256 are intrinsically black boxy and not "algebraic". They are functions which do not allow composition; as I've had occasion to remark many times before, there is no such formula as \(H(a+b) = H(a) + H(b)\) for these traditional hash functions.
<a name="algebraic-macs"></a></p>
<h4>Algebraic MACs</h4>
<p>The <em>other</em> approach to building a MAC might be to use discrete log or elliptic curve hardness assumptions, for example in the crudest case take \(\textrm{MAC}_{k}(m) = m^{k} \textrm{mod} p\) for the discrete log case. Comparing the two approaches, Dodis et al in <a href="https://eprint.iacr.org/2012/059">Mesage Authentication, Revisited</a> have this to say:</p>
<blockquote>
<p>The former have the speed advantage, but cannot be reduced to simple number-theoretic hardness assumptions (such as the DDH assumption for NR-PRF), and are not friendly to efficient zero-knowledge proofs about authenticated messages and/or their tags, which are needed in some important applications, such as compact e-cash [12]. On the other hand, the latter are comparably inefficient due to their reliance on number theory.</p>
</blockquote>
<p>Here NR-PRF refers to the <a href="https://en.wikipedia.org/wiki/Naor%E2%80%93Reingold_pseudorandom_function">Naor-Reingold</a>. construction for a pseudorandom function.</p>
<p>The point about zero knowledge proofs is the trump card though: in building something like a <em>anonymous credential with attributes</em>, you are perforce required to be able to make attestations, using such proofs, in zero knowledge.
<a name="security-notions"></a></p>
<h4>Security notions needed for algebraic MACs used for anonymous credentials</h4>
<p>MACs generally want to have something called UF-CMA (unforgeability under chosen message attack) ; something we already discussed for signatures <a href="https://joinmarket.me/blog/blog/liars-cheats-scammers-and-the-schnorr-signature/">here</a>. There are several nuances that are MAC-specific but we won't delve into too much detail (I recommend Section 6.1, 6.2 of <a href="https://toc.cryptobook.us/">Boneh and Shoup</a> for an excellent rigorous description); the bottom line is that MACs must not be forgeable by a non-key holders, just like signatures.</p>
<p>For our use case (and some others), such a MAC will also need to have a kind of "hiding" property : <em>indistinguishability</em> (under chosen message attack, or IND-CMA) - the tags output should not allow an attacker to guess anything about the message being tagged.</p>
<p>So concretely how can we use simple discrete log to build a MAC? Let's use an elliptic curve group of the type we're familiar with, generator \(G\), order \(p\). We'll try the simplest versions first and see what we need to do to make it secure:
<a name="mac-1"></a></p>
<h5>Algebraic MAC attempt 1</h5>
<ul>
<li>Keygen: choose a scalar \(k\) at random</li>
<li>Tag: given a message \(m\), set the tag to \(T = mkG\).</li>
<li>Verify: not a relevant definition for such a deterministic MAC; we didn't add randomness so it's the same calculation as "Tag".</li>
</ul>
<p>Note how this "determinism" is the same for familiar existing MAC functions like <a href="https://en.wikipedia.org/wiki/HMAC">HMAC</a>. Since it's the same information needed (the secret key \(k\)) and the same calculation, the distinction is not interesting. Shortly we'll be looking at probabilistic MACs.</p>
<p>Attempt 1 clearly fails, and here's one reason why: if the attacker gets to query the algorithm and ask for any MAC it likes it can choose to ask for the MAC of the message 1. That MAC is \(kG=K\). It can then take that curve point and create forgeries on any message m' it likes: \(m'K\). Secondly this kind of deterministic MAC clearly can't have the kind of hiding property we want, since it's like a commitment without any blinding factor: if you guess the value of \(m\) correctly, you can verify your guess.
Thirdly, extend the above case of message '1' and we can see that it's non-resistant to forgery more generally: whenever you know the message that was tagged, you can take the output tag given by the signer, \(T = mkG\) and multiplicatively tweak the message to \(m_2 = a \times m\) by just outputting \(aT\) as the new tag. So this is very insecure.
<a name="mac-2"></a></p>
<h5>Algebraic MAC attempt 2</h5>
<ul>
<li>Keygen: choose a scalar \(k\) at random</li>
<li>Tag: choose a curve point \(U\) at random, and message \(m\), output \((U, T = mkU)\)</li>
<li>Verify: Given \((U, T)\) and message \(m\), check if \(T == mkU\)</li>
</ul>
<p>This addresses the second part of our complaint with Attempt 1, by making the MAC "probabilistic". Each new MAC is generated with a fresh random curve point (or equivalently a scalar \(u\)).</p>
<p>Unfortunately, Attempt 2 fails just as Attempt 1 did, when it comes to preventing forgeries (bearing in mind the previous sentence), because we can still tweak created tags in the same way. Perhaps slightly less obvious is that we can not just tweak \(T\) but also \(U\). (But it's important to bear in mind that our security game is also concerned with whether an attacker can do something clever with re-used values of that U.)
<a name="mac-3"></a></p>
<h5>Algebraic MAC attempt 3</h5>
<ul>
<li>Keygen: choose a scalar \(k\) at random</li>
<li>Tag: choose a curve point \(U\) at random, and message \(m\), output \((U, T = (m+k)U)\)</li>
<li>Verify: Given \((U, T)\) and message \(m\), check if \(T == (m+k)U\)</li>
</ul>
<p>This prevents the multiplicative tweaking which killed our first two attempts; even supposing the attacker has a given \(T\) on a given, known message \(m\), multiplying \(T\) by any constant \(a\), will create \(T^{*} = aT = (am + ak) U\) which is not a tag on any message he can state (it is a tag on \(am + (a-1)k\) but he doesn't know \(k\), so even if he knows, or guesses, \(m\), he is stuck).
However this construction still allows trivial forgery, (and fundamentally for the same reason: the additive homomorphism of the group). Here, because the key is "additively separate" from the message, you can just insert new messages using addition. If you happen to know \(m\) and you want a tag on \(m_2\) instead, just make \(T_2 = T + m_2 U - m U\) (see previous note : reusing \(U\) is in-scope for our attacker).</p>
<p>So if we review these first 3 attempts, it's fairly clear what's going on; it's a paradigm we've seen before in the Schnorr protocol. If you only <em>add</em> a random secret, you allow additive forgery, while if you only <em>multiply</em> a random secret, you allow multiplicative forgery, but if we add both ...
<a name="mac-4"></a></p>
<h5>Algebraic MAC attempt 4</h5>
<ul>
<li>Keygen: choose two scalars \(k_1, k_2\) at random</li>
<li>Tag: choose a curve point \(U\) at random, and message \(m\), output \((U, T = (mk_1+k_2)U)\)</li>
<li>Verify: Given \((U, T)\) and message \(m\), check if \(T == (mk_1+k_2)U\)</li>
</ul>
<p>To expand on the Schnorr analogy, it's as if one of the keys were the randomizing nonce, and the other were the private key (the analogy is not exact, to be clear). Now neither the additive nor the multiplicative tweak gives the attacker a way to forge new tags on messages that the genuine key holder never created.</p>
<p>The construction in attempt 4 is one of several elucidated by Dodis et al in their 2012 paper <a href="https://eprint.iacr.org/2012/059">"Symmetric Key Authorization, revisited"</a>. They identify exponentiation in a group of prime order as an example of a "weak PRF", and moreover, specifically a <em>key-homomorphic weak PRF</em>, and build the above construction in abstract from such a function. Then they prove by quite sophisticated arguments (see 4.3 of the full paper), that this construction has "suf-CMA" (or "suf-CMVA" with a transformation) where the "s" refers to <em>selective</em> security. This is a weaker notion of security; the idea is that we only defend against the attacker who has to choose the message he will forge on, before he gets to query the signer/tagger to see a bunch of other messages. Their proof strategy is basically to show that with clever use of linear transformations you can reduce the security argument to that of the underlying weak PRF; its randomness gives you both the unforgeability and the hiding (indistinguishability) properties that we want.
<a name="mac-ggm"></a></p>
<h4>MAC-GGM - a vector of messages; different security arguments</h4>
<p>In 2013 Zaverucha, Chase and Meiklejohn described, in <a href="https://eprint.iacr.org/2013/516">this paper</a> (which we will sometimes abbreviate to CMZ13), a small but meaningful finesse on the above construction from 2012, which they call "MAC-GGM" (they also describe MAC-DDH in the same paper, which we won't cover here):</p>
<ul>
<li>Instead of a MAC on a single message, the MAC is designed to support multiple distinct messages, and this is specifically to allow the credentials we'll describe next, to support <em>attributes</em>.</li>
<li>The paper gives an argument in the so-called "generic group model" (GGM) that this construction has the full UF-CMVA security property (the original argument for only <em>selective</em> security is not really OK in any scenario where users can query verification on tags).</li>
</ul>
<p>As is probably obvious, in this new construction, the tag is calculated by: \(T = (U, (k_1 m_1 + k_2 m_2 + \ldots + k_n m_n + k_0)U)\); it's easy to see that the "multiplicative and additive" arguments mentioned above still apply (note the presence of \(k_0\)) (although the security argument is very different, see Appendix A of the paper). This looks superficially similar to a vector Pedersen commitment of the form seen in constructions like Bulletproofs (only superficially: here also, the vector is blinded, but at the level of scalars; we don't use different base points).</p>
<p>The main reason this is even interesting is how it naturally supports <strong>selective revelation</strong> - zero knowledge proofs that certain of these messages have particular values or perhaps are in a range.
Other previous MAC constructions couldn't do this in any reasonable way (although there was a big literature of achieving similar properties using (blind) signatures).
<a name="key-verified"></a></p>
<h3>Key-Verified Anonymous Credentials (KVAC)</h3>
<p>Now we have the theoretical basis, we can construct a credential system with two of the properties we want - anonymity, and attributes. And that's what the meat of the Chase et al. paper does. It describes a credential system, using MAC-GGM as a primitive. The functionality of this credential system can be boiled down to:</p>
<ul>
<li>Keygen: generate secret keys and public parameters for the protocol instance (called <em>iparams</em>, short for issuer parameters). These parameters include public commitments to the secret keys.</li>
<li>Blind Issuance: a user can request and the issuer can provide credentials on a set of attributes (\(m_i\)) in the above, where some of the attributes are allowed to be hidden from the issuer.</li>
<li>Show-Verification: a user can prove to the issuer (or, any other holder of the secret key material), in zero knowledge, that they possess a credential whose attributes satisfy a specific set of constraints.
<a name="how-does-issuance-work"></a></li>
</ul>
<h4>How does issuance work?</h4>
<p>Because we've laid the foundations, this is pretty easy to <em>describe</em>, albeit the concrete steps of mathematically creating the credential, isn't.
<a name="without-any-blinding"></a></p>
<h5>Without any blinding:</h5>
<p>(From here the private key set of the issuer is denoted with \(x\) rather than \(k\).</p>
<p><strong>Issuance</strong> - We issue a credential consisting of a MAC-GGM style of MAC, combined with a proof of its validity. Form, on a set of messages \(m_i\): \((U, (x_1 m_1 + x_2 m_2 + \ldots + x_n m_n + x_0)U, \pi)\) - the proof \(\pi\) exists because the credential must be accompanied by a proof that it is correctly formed with respect to the issuer parameters that were decided at the start of the protocol, but without revealing the issuer's secret key material.</p>
<p><strong>Show/Verify</strong> - this is where it gets interesting. The user does not just "present his MAC" as that would violate our intention to make the credentials anonymous. Instead, he presents <em>commitments to his MAC</em> along with a zero knowledge proof of correct formation. He presents \((U, {C_{m_i}}^{n}<em u_="u^{'">{i=1}, C</em>}, \Pi)\). Taking those elements in order:</p>
<ul>
<li>\(U\) - this is the base point of the MAC which was issued as credential, but it will have been rerandomised as \(U = aU_0\) for some \(a\). (There is a point of confusion in the paper here; in Appendix E the detailed treatment correctly notes that \(U, U'\) must be re-randomised by multiplication with a random scalar, in order to prevent trivial linkability between the Issue and Show/Verify steps, but this is not mentioned in Section 4.2).</li>
<li>\({C_{m_i}}^{n}_{i=1}\) - these are Pedersen commitments to the individual attribute messages (note - the plaintext messages can be sent instead for those messages which are not hidden/encrypted, to save communication - we will talk about hidden attributes next). The blinding value for each commitment is \(z_i\).</li>
<li>\(C_{U'}\) is a single Pedersen commitment to the second element of the tag. The blinding value is \(r\).</li>
<li>\(\Pi\) - as mentioned, we need a zero knowledge proof of correct formation - this consists of a proof that the commitments \(C_{m_i}\) and \(C_{U'}\), when combined with the secret keys that only the verifier holds, will give the same outpoint group element \(V\) from the calculation \(x_0U + \sum_{i} x_i C_{m_i} - C_{U'} = V\) as the prover obtained from the calculation with public issuer parameters \(X_i\), i.e. \(V = \sum_{i} z_i X_i - rG\).</li>
</ul>
<p>That last point is very tricky so I'm going to expand on it. What makes this credential construction special is its requirement to hide something from both sides - the user wants to hide the attributes \(m_i\) (in general if not always) from the issuer, and the issuer of course wants to hide the secret keys from the user.</p>
<p>This is dealt with algebraically by using something similar to ECDH keys, where \(S = pqG = pQ = qP\), i.e. both sides have their own secret they keep from each other, but still create a shared secret. The variable \(V\) represents this, but to keep the following simple we'll imagine \(n=1\), i.e. that there's only one message/attribute.</p>
<p>On the user side, we are summing elements of the form \(z_i X_i = z_i x_i H\), but the blinding terms in the message commitments \(C_{m_i}\) are also \(z_i H\), so that they can be converted into part of a term \(x_i C_{m_i}\) that the issuer can verify using the secret keys. The remaining term in the commitments \(C_{m_i}\) is \(m_i U\) which is converted into part of \(U'\) by the same multiplication by the secret key: \(x_i C_{m_i} = x_i m_i U = U' - x_0 U\). This equality is worked through in detail in the paper, but notice that basically, the group homomorphism can be used to allow the issuer to verify, using his own secret values, what the user constructed as message commitments, with his secret values.
<a name="side-note"></a></p>
<h5>Side note: what are these mysterious "zero knowledge proofs"?</h5>
<p>The proof systems used for these kind of statements are all variants of the basic Schnorr protocol + Fiat-Shamir transform that I explained in great detail <a href="https://github.com/AdamISZ/from0k2bp/blob/master/from0k2bp.pdf">here</a> (Section 3), though I also strongly recommend <a href="https://toc.cryptobook.us/">Boneh and Shoup</a> Chapter 19 for more rigorous treatments. Note that very often we are using the "AND of sigma protocols" paradigm, in which multiple statements are proved concurrently, and this is achieved by committing to all the statements in the first step, including all those commitments in the hash challenge preimage before constructing the response. As well as the aforementioned links, you can see a good simple example of this paradigm in Appendix E of CMZ13, albeit there are two serious errors in the description of the verification algorithm, as I explained on stackexchange <a href="https://crypto.stackexchange.com/a/85952/14985">here</a>.
<a name="with-blinding"></a></p>
<h5>With blinding of attributes:</h5>
<p>These attributes can be encrypted using <a href="https://en.wikipedia.org/wiki/ElGamal_encryption">El Gamal encryption</a> in such a way that a credential can still be issued without revealing (some of) them. The mechanics of El Gamal are about as simple as an encryption scheme gets:</p>
<ul>
<li>Key: a normal public/private key pair from the group, say \(P = pG\)</li>
<li>Encrypt: take as message point \(M\), create new randomness r, output \((rG, rP + M)\) (remember, asymmetric encryption, so not necessarily private key holder)</li>
<li>Decrypt: take ciphertext \(c_1, c_2\) as per above and note \(p(rG) = r(pG) = P\) (the Diffie Hellman shared secret primitive), so that \(c_2 - pc_1 = M\)</li>
</ul>
<p>Note that this system encrypts and decrypts <em>group elements</em> \(M\) rather than scalars, \(m\). In cases where it's the latter that needs to be encrypted, sometimes this hiccup can be circumvented with a ZKP of the underlying message scalar. However the next paper, CPZ19, addresses this point (see next section) along with a lot of other things.</p>
<p>Now, how could this El Gamal scheme be used to aid getting credentials on hidden attributes?</p>
<p>(EC) El Gamal encryption has an additive homomorphism (this was noted in one of my earlier blog posts <a href="https://web.archive.org/web/20200428225915/https://joinmarket.me/blog/blog/finessing-commitments/">here</a>. (additive here means for elliptic curve point <em>addition</em>): \(E(A) + E(B) = E(A+B)\) (the notation is very poor here: encryptions have attached randomness, but anyway). Whenever you have this additive homomorphism, you also have the scalar multiply, trivially: \(aE(A) = E(aA)\). We can leverage this to pass the encryption "through" the MAC procedure.</p>
<p>The user would give the El Gamal encryption of one or more attributes \(m_i\) to the issuer to be tagged. They would give \((P, (r_1 G, m_1 G + r_1 P))\) as the two elements of the ciphertext, where we stick with just one attribute index for simplicity. The issuer would pick \(U\) here as \(bG\) where \(b\) is a random scalar (we'll see why this is needed rather than NUMS in a moment), and create an <em>encrypted</em> tag on the <em>encrypted</em> attribute message: \(E(U') = E(x_0 U + m_1 x_1 U) = E(x_0 U) + x_1 E(m_1 U) = E(x_0 U) + x_1 b E(m_1 G)\) and \(E(m_1 G)\) is exactly what the user gave them. Thus they can easily create an encryption of \(U'\) on the message \(m_1\) *without ever seeing \(m_1\).
This encryption must then be blinded, but that's easy, by adding extra randomness in the form of an encryption of 0. Again a ZKP will need to be attached when the issuer returns this encryption to the user, but if it is valid, the user can know that when he decrypts \(E(U')\) to \(U'\), he will have a valid credential (tag) \((U, U')\) for his attribute/message.</p>
<p><a name="CPZ19"></a></p>
<h3>Chase-Perrin-Zaverucha 2019</h3>
<p>The scope of this paper (CPZ19 for short), which is intended to provide a credential system for the Signal messenger system, is much larger, but part of it is creating a more powerful credential design (albeit an inheritor; the security proof for CPZ19 uses a reduction to CMZ13) than that found in CMZ13. These credentials support <em>both</em> scalar attributes and group elements as attributes - the latter can be appealing for general purposes of creating efficient ZKPs (or a more elementary aspect of the same thing: easier El-Gamal encryption of the type described in the previous section - indeed having attributes encrypted in this way is a fundamental part of their design).</p>
<p>The credential construction looks much more complicated as presented since it uses different <a href="https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number">NUMS</a> base points for multiple different components: the secret key elements (x_0, x_1) as before for the basic idea of the Dodis et al algebraic MAC, but there are then base points for each of a vector of secret keys, one per attribute (and more, see paper for full setup). Notably the construction of the MAC tag itself, looks quite different:</p>
<p>\((t, U, (W + (x_0 +x_1 t)U + \sum_{i=1}^n y_i M_i )\)</p>
<p>here \(t\) and \(U\) are generated by the issuer at random, while the \(y_i\) are the aforementioned vector of secret keys corresponding to each message/attribute.</p>
<p>While the construction is significantly more complex, the basic principle of how crendentials are issued, and then show/verified, is essentially the same, and that encludes the idea of encrypting credentials using El Gamal. The same construct carries over as was described under "with blinding of attributes", but the authors have a slightly different approach in mind:</p>
<p>Given an existing credential/MAC, you can quite elegantly prove that an ElGamal encryption of a specific attribute is in fact attested to by the MAC, using again a ZKP about a relationship between the MAC and the encryption. However the authors do caution:</p>
<p>"We caveat that this is only a promising direction for a new (public-key) verifiable encryption scheme, since the above basic Elgamal scheme is not CCA secure, and we have not carefully analyzed its security."</p>
<p>(here <a href="https://en.wikipedia.org/wiki/Chosen-ciphertext_attack">CCA</a> means "chosen ciphertext attack"; security under this condition is the main goal of provably secure encryption schemes).</p>
<p>At the beginning of this section I mentioned that the security argument for this flavor of algebraic MAC is based on a reduction to the case of CMZ13 above, which was proven UF-CMVA secure in the generic group model. However this reduction only produces SUF-CMVA, which is to say 'selective security' - here, we only consider an attacker who specifies the message \(m^{*}\) in advance of their message tagging and verification queries. I'm not sure if this is sufficient.</p>
<p><a name="wabisabi"></a></p>
<h3>Wabisabi: credentials on amounts with splitting</h3>
<p>Wabisabi uses basically exactly the CPZ19 construction for its credentials. The main "twist" is a simplification: only 'value' (value in BTC) attributes are needed, and they are of course integer values. These credentials will allow a coinjoin participant to follow the workflow mentioned at the start of this article:</p>
<ul>
<li>As one pseudonym, register 1 or more inputs and request N credentials for the input, with the values of each <em>credential</em> hidden, but accompanied with a zero knowledge proof that the sum of those values is as it should be (the input's value).</li>
<li>As another pseudonym, present the credentials and the intended coinjoin outputs, with a proof that the sum of the redeemed credentials tallies up to the total of the outputs.</li>
</ul>
<p>Serial numbers are also used as part of the credential, to prevent double spend of the same credential (remember, the credentials are specifically designed to be <em>unlinkable</em>).</p>
<p>However the paper is careful to build up to what it calls a "unified registration protocol" where it generalises the whole process of both creating and redeeming these credentials, and makes the interaction more efficient.</p>
<p><a name="range-proofs"></a></p>
<h4>Range proofs</h4>
<p>Any former student of the ideas behind <a href="https://en.bitcoin.it/wiki/Confidential_transactions">Confidential Transactions</a> will find this part obvious. Simply presenting <em>commitments</em> to integer amounts (in satoshis, say) doesn't provide the intended security: since in modular arithmetic, a very large integer is mathematically equivalent to a small negative integer, it would be easy for users to cheat the amounts they get out by requesting commitments on (effectively) negative amounts. The way round this is again a ZKP, of a particular flavor known as a range proof: you prove that the integer \(a\) is between say 1 and \(2^{32}\) or whatever suits. This can be done e.g. with <a href="https://eprint.iacr.org/2017/1066">Bulletproofs</a> but also the range proof can be embedded as another statement in the overall ZKP provided by the user.</p>
<p>A relevant question, though, and one worth pondering: are the range proofs actually necessary (ZmnSCPxj has raised this in review, and it occurred to me too)? As discussed in the next section, there isn't a risk of funds loss in the basic coinjoin construct, with or without this extra crypto magic of credentials. So a malicious user constructing credentials in invalid negative amounts is not going to be able to claim more money, but this does represent a DOS vector, one that is usually addressed just with the requirement of users to provide and sign off on a valid utxo.</p>
<p>However there still may be further room for thought here; the range proof could be provided as part of a blame phase of a protocol, and avoided in the happy path of correct coinjoins being presented for signing. Apparently the authors have considered this.</p>
<p><a name="final-thoughts"></a></p>
<h3>Final thoughts on the security and functionality proposed in Wabisabi</h3>
<p>This article has just been a survey of some of the technical (cryptographic) underpinnings; the paper itself is specifically more about the theoretical construct, and not a fully fleshed out system spec as would be needed for a full software instantiation.</p>
<ul>
<li>How secure is it?</li>
</ul>
<p>As the paper notes in the final section 5, we should not forget the fundamental security inherent in Coinjoin, however it is coordinated: users only sign what does not rob them of money, and a single transaction does not suffer from anything related to blockchain mechanics (delays, reorgs etc). So what risks exist will be around DOS (inconvenience, lost time) and much more importantly, privacy loss:</p>
<ul>
<li>How strong are the privacy guarantees?</li>
</ul>
<p>First, to state the obvious, there is a dependency on discrete log hardness, but that's just at basis, more exactly, there is a DDH hardness assumption (see 6.2 of CPZ19) underlying the security of this MAC construction. As mentioned in the previous bullet point, this is <em>effectively</em> only relevant to the privacy of the users w.r.t. the issuer (here the coinjoin coordinator) of the credentials, although nominally a breakage of that security (assume in the worst case, ability forge MACs arbitrarily) would "allow the user to forge credentials for arbitrary bitcoin amounts", but that is a DOS vector only as it creates invalid coinjoins that won't be signed.</p>
<ul>
<li>How much defence against the issuer is there, i.e. is trust in the coordinator required for privacy?</li>
</ul>
<p>This is actually a fairly tricky point. Restricting the coordinator's ability to tag (pun intended) or selectively censor is quite critical, and non trivial.</p>
<p>The splitting into multiple credentials helps; it is less easy for the malicious coordinator to figure out how to jam individual participants if they are going through multiple rounds of credential issuance and redemption. From conversations with nothingmuch it appears that quite a lot of thought is being put into this aspect of the protocol; those interested may want to read <a href="https://github.com/zkSNACKs/WabiSabi/blob/master/protocol.md">this</a> protocol spec document for the latest. Also along the same lines, the paper notes:</p>
<blockquote>
<p>A malicious coordinator may also tag users by providing them with different issuer parameters. When
registering inputs a proof of ownership must be provided. If signatures are used, by covering the issuer
parameters and a unique round identifier these proofs allow other participants to verify that everyone was
given the same parameters.</p>
</blockquote>
<p>Basically what is going on here is that there is a kind of "public" aspect to input registration; users sign the issuer parameters for the round, and then these signatures, at a certain point in the negotiation, are broadcast to all participants (with encryption), so that a malicious coordinator can be prevented from tagging users by giving them all different round issuer parameters.</p>Joinmarket update for Oct 20202020-10-25T00:00:00+02:002020-10-25T00:00:00+02:00Adam Gibsontag:joinmarket.me,2020-10-25:/blog/blog/oct-2020-update/<p>Joinmarket update Oct 2020</p><h2>About this post</h2>
<p>It seems like a good idea to start using this blog to spread a little bit more information
to users and other interested parties, about Joinmarket, in particular about how it might
change.</p>
<p>First, please note this is a <em>personal</em> blog, there is nothing "official" here (and the same
would go for anyone else's blog about Joinmarket! - this is an open source project).</p>
<p>Second, please note that for years now I have been microblogging <a href="https://x0f.org/web/accounts/1077">here</a>; so,
if you're interested to keep in touch with what I'm doing (and often, reading, or just thinking) day by day,
you're welcome to follow that account. I personally like keeping track of people over RSS with <a href="https://fraidyc.at">fraidyc.at</a>,
but whatever suits you. Just know that Joinmarket related announcements are often made there first (I don't and will not use any corporate-owned social media sites).</p>
<h3>Joinmarket status.</h3>
<p>0.7.1 of Joinmarket was released 12 days ago, and introduced <em>receiving</em> BIP78 payjoins, on the GUI and on command line.</p>
<p>In the next few days 0.7.2 will be released. It is principally a bugfix release.</p>
<p>(Although there will be one small
new feature - not-self broadcasting is finally reimplemented. You'll want to be careful about using it, especially
to start with (since it'll only work with counterparties that also have the latest release); there will of course
be advice about this in the release notes. Consider it an advanced feature, and consider using tor-only in your
Core node if the base level of privacy in broadcasting transactions isn't enough for you.)</p>
<p>The bugs fixed are things that came out of interoperability tests on BIP78.</p>
<p>Over the last few weeks I, Kristaps Kaupe and some people on other dev teams have been running a variety
of testnet, mainnet, regtest tests of Payjoin functionality between btcpayserver, Wasabi and Joinmarket.</p>
<p>We found various edge cases, like hex instead of base64 being transferred (not in spec but people were doing
it anyway), incorrectly shuffled output ordering (my bad!), combinations of parameters in the HTTP request
that <em>I</em> interpreted the BIP as saying was not allowed, but btcpayserver was sending anyway (but: not always! -
testing can be a real pain sometimes!) and a few more.</p>
<p>Remember two things about Payjoin though:</p>
<ol>
<li>It is a protocol designed to accept a failure to negotiate as a common event - <em>the payment goes through anyway, it's just not a coinjoin then</em>.</li>
<li>The most common incompatibility between wallets will be different address types. Then nothing can be done, as it would be slightly silly to do a Payjoin like that - we fall back, as per (1).</li>
</ol>
<p>So hopefully we will have some wallets that can send and receive Payjoins up and running by .. well, now actually! It is already possible and working, we are just smoothing out edge cases here.</p>
<p>If you didn't get a chance, please watch this demo video of sending and receiving payjoins between Joinmarket wallets (note: the dialog is now improved, as I comment here):</p>
<p><a href="https://video.autizmo.xyz/videos/watch/7081ae10-dce0-491e-9717-389ccc3aad0d">JM-JM Payjoin demo video</a></p>
<p>It only has 31 views, many of which were me, so I guess not many people saw it :)</p>
<p>About point (2) above, note that you'll probably need to be using a Joinmarket bech32 wallet (yes, we've had them for quite a while!) if you want to send or receive with Wasabi. So, more on that next:</p>
<h3>Joinmarket future plans (tentative!)</h3>
<h4>Bech32 in 0.8.0</h4>
<p>We have <a href="https://github.com/JoinMarket-Org/joinmarket-clientserver/pull/656">this</a> PR open from jules23 and it represents a very impactful (but happily, not large technically) change that is proposed: to switch to a "bech32 orderbook", by which we mean making changes like this</p>
<ul>
<li>The default wallet changes to native segwit (bech32, bc1.. addresses)</li>
<li>Joinmarket coinjoins (i.e. maker/taker coinjoins) are offered as <code>sw0reloffer</code>, <code>sw0absoffer</code> in the trading pit</li>
</ul>
<p>Both of these changes would not be "mandatory", just as when we switched to segwit in 2017, it was not mandatory, but would be default in the new version. The fees for coinjoins will be significantly reduced from the current "wrapped segwit" addresses, and we would gain better compatibility with Wasabi and a number of other modern wallets that default to bech32.</p>
<p>The general problem with these updates (which we've only done once before) is that they cause a "liquidity split" temporarily, as not everyone migrates to the new address type at the same time. This is unfortunate, but I feel less concerned about it than last time, as the amount of maker liquidity is <em>much</em> larger (more on that below, about IRC).
Another reason to be slightly unsure about this update is that taproot activation may be coming quite soon, but it seems unlikely that the real activation on the live network will take less than 1 year from now (does it)?, so probably we should do this anyway. That's my opinion.</p>
<p>The general idea would be to make a new 0.8.0 version next after this, including this change. More testing is needed, but it's mostly ready. If you have opinions about the technical implementation of this, feel free to discuss on the above github PR thread. For more general discussion I'd suggest using #joinmarket on freenode.</p>
<h4>New message channel implementations vs IRC</h4>
<p>This part is far more speculative. We have had several discussions about message channels over the years. As early as 2016/17 I abstracted out the message channel "layer" so that IRC was just an implementation (see <code>jmclient/jmclient/message_channel.py</code>) of a few key methods. Alternative implementations have always been possible, but nobody either found time, or found a practical way, to make an alternative implementation. This issue is becoming more pressing. As a simple example, only this week we had IRC ops come to us complaining (very politely, it wasn't a disaster) that about 450 bots had suddenly shown up in our joinmarket test pit. This is in some ways less interesting than the real scalability problem: Joinmarket uses broadcast for offers, but also a sort of "anti-broadcast" mechanism: when a new Taker shows up, they ask <em>every</em> Maker for their current offers, and the Makers <em>all</em> send them at the same time, to that one Taker. So this doesn't scale very well and IRC as a messaging layer doesn't like it; this is the main reason negotiation of a Joinmarket coinjoin takes ~ 1 minute instead of 1-5 seconds (we have to deliberately throttle/slow down messages).</p>
<p>We rather badly need a more scalable messaging layer. I'd appeal for help on this, and I'd also appeal for public discussion of ideas on github (we've had such threads in the past, but nothing really happened).</p>
<p>Let's not forget that related to all that is DOS. Depending on implementation, DOS attacks can be a real problem. Chris Belcher's fidelity bond wallets were implemented within Joinmarket's code already, earlier this year, see <a href="https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/c1f34f08c52452c229319e7421bfd930f8d70a7c/docs/fidelity-bonds.md">here</a> for documentation explaining this, but implementing it as a requirement for Makers is another step, and it might be an important part of the puzzle of getting a scalable messaging layer right.</p>
<p>Getting this right won't just help Joinmarket coinjoins, but also various other systems we might want to integrate over time (SNICKER? CoinjoinXT? CoinSwap? something else?).</p>The 445BTC gridchain case2020-06-15T00:00:00+02:002020-06-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2020-06-15:/blog/blog/the-445-btc-gridchain-case/<p>analysis of gridchain blockchain analysis and implications for Joinmarket usage.</p><h3>The 445 BTC gridchain case</h3>
<p>For those time-constrained or non-technical, it may make sense to read
only the <a href="index.html#summary">Summary</a> section of this article. It goes
without saying that the details do matter, and reading the other
sections will give you a much better overall picture.</p>
<h2>Contents</h2>
<p><a href="index.html#background">Background - what is the "gridchain case"?</a></p>
<p><a href="index.html#change-peeling">Toxic change and peeling chains</a></p>
<p><a href="index.html#change-joinmarket">Change outputs in a Joinmarket context</a></p>
<p><a href="index.html#toxic-recall">The toxic recall attack</a></p>
<p><a href="index.html#size-factor">The size factor</a></p>
<p><a href="index.html#sudoku">Joinmarket sudoku</a></p>
<p><a href="index.html#maker-taker">Reminder on the maker-taker tradeoff</a></p>
<p><a href="index.html#address-reuse">Address reuse</a></p>
<p><a href="index.html#summary">Summary; lessons learned; advice to users</a></p>
<p><a href="index.html#already">Already implemented improvements</a></p>
<p><a href="index.html#still-needed">Still needed improvements</a></p>
<p><a href="index.html#recommendations">Recommendations for users</a></p>
<h2 id="background">Background - what is the "gridchain case"?</h2>
<p>This is a reflection on a case of reported theft as outlined
<a href="https://old.reddit.com/r/Bitcoin/comments/69duq9/50_bounty_for_anybody_recovering_445_btc_stolen/">here</a>
on reddit in early 2017 by user 'gridchain'.</p>
<p>What I won't do here is discuss the practical details of the case;
things like, whether it was a hack or an inside job, nor anything like
network level metadata, all of which is extremely important in an actual
criminal investigation. But here I'm only focusing on the role played
by Joinmarket specifically and blockchain level activity of the coins,
generally.</p>
<p>The reason for this blog post was
<a href="https://research.oxt.me/the-cold-case-files/1">this</a>
recent report by OXT Research - specifically by analyst
<a href="https://bitcoinhackers.org/@ErgoBTC">ErgoBTC</a>
(they require an email for signup to read the full report, otherwise you
only see the summary).</p>
<p>A short note of thanks here to ErgoBTC and LaurentMT and others
involved, since this kind of detailed analysis is badly needed, I hope
will we see more, specifically in public, over time (we cannot hope for
such from the deeply unethical blockchain analysis companies).</p>
<p>I'm [not]{style="text-decoration: underline;"} going to assume here
that you've read that report in full, but I am going to be referring to
its main set of conclusions, and analyzing them. Obviously if you want
to properly assess my statements, it's rather difficult - you'd need
full knowledge of Joinmarket's operation <em>and</em> full details of the OXT
Research analysis - and even then, like me, you will still have some
significant uncertainties.</p>
<p>So the case starts with the claimed theft in 2 parts: 45 BTC in <a href="https://blockstream.info/tx/2f9bfc5f23b609f312faa60902022d6583136cc8e8a0aecf5213b41964963881">this
txn</a>
(note I will use blockstream.info for my tx links because I find their
presentation easiest for single txs specifically; note that oxt.me 's
research tool is of course a vastly superior way to see a large network
of txs, which plays a crucial role in this analysis), and a
consolidation of 400BTC in <a href="https://blockstream.info/tx/136d7c862267204c13fec539a89c7b9b44a92538567e1ebbce7fc9dd04c5a7f0">this other
txn</a>
.</p>
<p>We'll assume that both of these utxos are under the control of a single
actor/thief, henceforth just <em>A</em>.</p>
<p>Setting aside the (in some ways remarkable) timing - that <em>A</em> did not
move the coins for about 2 years - let's outline roughly what happened,
and what the report tells us:</p>
<ul>
<li>The 400BTC went into joinmarket as a maker, and did a bunch (11 to
be precise) of transactions that effectively "peeled down" (more
on this later) that 400 to perhaps 335 BTC (with the difference
going into coinjoins).</li>
<li><em>A</em> then switched to a taker role for a while, focusing on higher
denominations, ranging from \~ 6BTC to as high as \~58BTC. Many of
these coinjoins had very low counterparty numbers (say 3-5 being
typical).</li>
<li>At some point some maker activity is seen again in this same
"peeling chain"; the report terms this phase as "alternating",
but it's hard to say for sure whether some particular script is
running, whether <em>A</em> is just randomly switching roles, or what.</li>
</ul>
<p>Be aware that this simplified narrative suggests to the careless reader
that one can easily trace all the coins through all the coinjoins, which
of course is not true at all - each subsequent transaction moves some
portion into a "mixed state", but (a) we'll see later that just
"moved into mixed state" is not the end of the story for some of those
coins and (b) while this narrative is misleading for Joinmarket in
general, it is not <em>as</em> misleading in this particular case.</p>
<p>The distinction between the "second" and "third" phase as listed in
those bullet points is pretty much arbitrary, but what is not in doubt
as important is: that second phase marks a clear jump in coinjoin amount
average size (this could be read as impatience on <em>A</em>'s part - but
that's just speculation), and this resulted in small anonymity sets in
some txs - 4 and 3 in two txs, in particular. Let's continue:</p>
<ul>
<li>Within the second regime, the OXT analysis narrows in on those small
anon set, large denomination txs - can they figure out which equal
sized output belongs to <em>A</em> here? The "toxic replay attack"
(explained below) allows them to identify one coinjoin output
unambiguously - but that goes into another coinjoin. But in a
second case it allows them to reduce the anonymity set (of the equal
sized coinjoin outputs) to 2, and they trace forwards both of those
outputs.</li>
<li>One of those 2 coinjoin outputs (<a href="https://blockstream.info/tx/2dc4e88685269795aafe7459087ab613878ce7d857dd35760eefeb9caf21371b">this
txn</a>
, output index 2) pays, after several hops, into a Poloniex deposit
address in <a href="https://blockstream.info/tx/ab1e604cd959cc94b89ab02b691fe7d727d30637284e5e82908fb28b8db378f4">this
txn</a>
). Although this is several hops, and although it does not deposit
all of that \~58BTC into Poloniex (only about half of it),
nevertheless this can be (and is) treated as a significant lead.</li>
<li>So the next step was to trace back from that specific Poloniex
deposit address, which it turned out had a bunch of activity on it.
See
<a href="https://blockstream.info/address/16vBEuZD54NzqnnSStPYxFF2aktGhhuaf1">16vBEuZD54NzqnnSStPYxFF2aktGhhuaf1</a>
. Indeed several other deposits to that single address are connected
to the same Joinmarket cluster, and specifically connected to those
smaller-anon set taker-side coinjoins. In total around 270BTC is
eventually linked from <em>A</em>'s joinmarket coinjoins to that deposit
address. Even though some of those connections are ambiguous, due to
address reuse the evidence of co-ownership appears very strong.</li>
<li>Some further evidence is provided (though I am still fuzzy on the
details, largely just because of the time needed to go through it
all) linking more of the coins to final destinations, including some
from the 45BTC original chunk. The claim is that 380BTC is linked at
final destinations to the original 445BTC set. In the remainder
I'll focus on what is already seen with this 270BTC set and only
peripherally mention the rest - there is already a lot to chew on!</li>
</ul>
<h2 id="change-peeling">Toxic change and peeling chains</h2>
<p>The general idea of a "peeling chain" on the Bitcoin blockchain isn't
too hard to understand. Given 100 BTC in a single utxo, if I have to
make a monthly payment of 1 BTC and never use my wallet otherwise, then
clearly the tx sequence is (using (input1, input2..):(output1,
output2..)) as a rudimentary format): ((100):(1,99), (99):(1, 98),
(98:(1, 97)...). Ignoring fees of course. What matters here is that I
just always have a single utxo and that on the blockchain <em>my</em> utxos
<em>may</em> be linked as (100-99-88-97...) based on a <a href="https://en.bitcoin.it/wiki/Privacy#Change_address_detection">change
heuristic</a>
such as "round amount for payment". To whatever extent change
heuristics work, then to that extent ownership can be traced through
simple payments (especially and mostly if transactions have exactly two
outputs, so that the very <em>idea</em> of change, let alone a change
heuristic, applies straightforwardly).</p>
<p><img alt="peeling chain simple
example" src="https://web.archive.org/web/20200713230834im_/https://joinmarket.me/static/media/uploads/.thumbnails/PeelingChain1.png/PeelingChain1-418x296.png">{width="418"
height="296"}</p>
<p>In peeling chains, sometimes, the primary heuristic is the <strong>size</strong> of
the output. If you start with 1000 btc and you peel 0.1 btc hundreds of
times, it's obvious from the "size pattern" what the change is (and
indeed it's this case that gives rise to the name <em>peel chain</em> because
"peel" refers to taking off a <em>small</em> part of something, usually its
surface). The above diagram is more similar (but not the same, exactly)
as the initial flow in the gridchain case, with one very large utxo
gradually getting peeled off.</p>
<p>In some cases timing may factor in; sometimes hackers will do hundreds
of such peels off a main originating utxo in a short time.</p>
<p>You can think of a peeling chain as the lowest effort ownership
obfuscation out there. Notice how literally any, even the simplest,
Bitcoin wallet, has to offer the feature required to carry this out -
just make a vanilla payment, for which there is (almost always, but not
always) a change output, back to your wallet.</p>
<p>So in Bitcoin's history, this technique has very often been seen used -
by hackers/thieves moving coins "away" from the original site of the
theft (I remember the <a href="http://www.techienews.co.uk/973470/silk-road-like-sheep-marketplace-scams-users-39k-bitcoins-worth-40-million-stolen/">case of Sheep
Market</a>
for example). Each "peel" raises additional uncertainty; the
non-change output is going somewhere, but who owns that? But the change
outputs represent a link allowing someone, in theory, to keep tracing
the activity of the original actor. Notice here how we talk about one
branch (our ((100):(1,99), (99):(1, 98), (98:(1, 97)...) example
illustrates it); but one could keep tracing the payment outputs (the
'1's in that flow) and see if they themselves form other peel chains,
leading to a tree.</p>
<p>We mentioned a 'change heuristic' element to this - which is the
"main branch" if we're not sure which output is the change?</p>
<h3 id="change-joinmarket">Change outputs in a Joinmarket context</h3>
<p>A reader should from this point probably be familiar with the basics of
Joinmarket's design. Apart from the
<a href="https://github.com/Joinmarket-Org/joinmarket-clientserver">README</a>
and <a href="hhttps://github.com/JoinMarket-Org/joinmarket-clientserver/blob/master/docs/USAGE.md">usage
guide</a>
of the main Joinmarket code repo, the diagrams showing the main
Joinmarket transaction types
<a href="https://github.com/AdamISZ/JMPrivacyAnalysis/blob/master/tumbler_privacy.md#joinmarket-transaction-types">here</a>
may be useful as a refresher or a reference point for the following.</p>
<p>We have: \(N\) equal outputs and \(N\) or \(N-1\) non-equal change
outputs, where \(N-1\) happens when the taker does a "sweep",
emptying the mixdepth (= account; joinmarket wallets have 5 accounts by
default) without a change output. [This last feature is specific to
Joinmarket, and specific to the taker role: there's no other coinjoin
out there that provides the facility to sweep an arbitrary amount of
coins out to an equal-sized output, with no
change.]{style="text-decoration: underline;"} (I am emphasizing this not
for marketing, but because it's crucial to this topic, and not widely
understood I think).</p>
<p>As an example of why it's important, here is one line from the OXT
Research article:</p>
<blockquote>
<p><em>Fees taken directly in a mix transaction result in deterministic
links ("unmixed change").</em></p>
</blockquote>
<p>This is false as an absolute statement; fees can be paid by a taker,
inside the transaction, with no unmixed change for the taker (this is
the Joinmarket 'sweep'). Deterministic links between inputs and change
outputs <em>do</em> result from change, and fees <em>do</em> create an additional flag
that can help make those linkages, in cases where there would be more
ambiguity. But a zero fee coinjoin with change outputs still has
deterministic links, usually.</p>
<p>Why does the OXT Research article heavily focus on <em>toxic unmixed
change</em> as a concept and as a key weakness of such protocols as
Joinmarket, and why do I disagree?</p>
<p>As we discussed peeling chains offer a low quality of obfuscation, and
to unpack that: the problem is that if you have any relatively viable
change heuristic (it doesn't <em>have</em> to be large amounts as discussed),
it can let you keep knowledge of ownership of a whole chain of
transactions. That basically gives the blockchain analyst (we'll call
<em>B</em>) a very large attack surface. He can look at <em>all</em> the information
flowing out of, or associated with, a whole chain of transactions. Any
later recombination of outputs from that "large attack surface" is
either a coinjoin or a "smoking gun" that different outward paths were
actually under the control of one owner (this comes back to that central
heuristic - common input ownership, and all the nuance around that).</p>
<p>In Joinmarket or any other coinjoin protocol that does allow change
outputs, "change heuristic" doesn't really apply, it kind of morphs
into something else: it's very obvious which outputs are change, but it
is only <em>in some cases</em> easy to disentangle which change outputs are
associated to which inputs, and that's actually what you need to know
if you want to trace via the change (as per "peeling chains"
description above). In high anonymity sets, it starts to get difficult
to do that disentangling, but more on that ("sudoku") later.</p>
<p>The analysis done in the OXT Research report smartly combines a long
peeling chain with other specific weaknesses in the way <em>A</em> acted, which
we will discuss in the next section.. So all this is very valid in my
view.</p>
<p>[But I think going from the above to the conclusion "coinjoins which
have unmixed change are fundamentally inferior and not viable, compared
to coinjoins without unmixed change" is just flat out
wrong]{style="text-decoration: underline;"}. Consider yourself in the
position of <em>A</em>. You have let's say 400BTC in a single utxo. If you run
a coinjoin protocol that insists on no change always, and without a
market mechanism, you are forced to use a fixed denomination, say 0.1
BTC (an example that seems common), now across thousands of
transactions. In order to create these fixed denomination utxos you are
faced with the same problem of trying to avoid a trivial peeling chain.
By insisting on no deterministic links within the coinjoin, you simply
move the problem to an earlier step, you do not remove it.</p>
<p>Fixed denomination does not solve the problem of having an unusually
large amount to mix compared to your peers.</p>
<p>Having said that, fixed denomination with no change at all, does create
other advantages - I certainly don't mean to disparage that model!
Without going into detail here, consider that a large set or network of
all-equal-in all-equal-out coinjoins can create similar effects to a
single, much larger, coinjoin (but this is a topic for another article).</p>
<h2 id="toxic-recall">The toxic recall attack</h2>
<p>Earlier we explained that one of the steps of the OXT Research analysis
was to identify a low liquidity regime where <em>A</em> was acting as taker,
and we mentioned the "toxic recall attack" was used to reduce the
anonymity sets of the coinjoin outputs, during this, to a level low
enough that simple enumeration could find good candidates for final
destinations of those coins.</p>
<p>Embedded in this was a crucial piece of reasoning, and I think this was
a both excellent, and very important idea:</p>
<ul>
<li><strong>Joinmarket does not allow co-spending of utxos from different
accounts</strong></li>
<li>That means that if a coinjoin output <em>X</em> is spent along with a utxo
from the "peeling chain" (i.e. they are both inputs to the same
tx), then <em>X</em> is not owned by <em>A</em> (assuming correct identification
of <em>A</em>'s peeling chain)</li>
<li>Every time such an event occurs, that <em>X</em> can be crossed off the
list of coinjoin outputs that <em>A</em> might own, thus reducing the
anonymity set of that earlier coinjoin by 1.</li>
</ul>
<p>The reasoning is not perfectly watertight:</p>
<p>First, as the report observes: the first assumption behind it is "A
user can only run one mixing client at a time." This is clearly not
literally true, but like many things here, a good-enough guess is fine,
if it eventually leads to outcomes that further strengthen the case. And
that is definitely true here: while a smart operator probably would be
running more than one instance of Joinmarket code, it is not default
behaviour and requires both a little coding and some careful thought.
Most people would not do this.</p>
<p>(Second, nothing stops a user from making a coinjoin to an address in
the same mixdepth (at least in the current software). It's just that
(a) that is heavily discouraged and (b) it's not easy to see a good
reason why someone would <em>try</em> to do that. Still it is possible as a
mistake. But I don't think this is a reason to doubt the effectiveness
of the "toxic recall attack", just, it should be noted.)</p>
<p>So overall the bolded sentence is the most interesting - Joinmarket's
intention is to prevent co-spending outputs which would ruin the effect
of any single coinjoin - i.e. it tries (caveat: above parenthetical) to
prevent you using both a coinjoin output and the change output (or any
other utxo in the same account as the change output and the original
inputs) together. And this small element of 'rigidity' in how coins
are selected for spending is actually another 'bit' of information
that <em>B</em> can use to make deductions, at least some of the time.</p>
<p>The following diagram tries to illustrate how these conditions lead to
the possibility of the attack, to reduce the anonymity set of coinjoin
outputs:</p>
<p><img alt="Toxic recall attack
illustration" src="https://web.archive.org/web/20200713230834im_/https://joinmarket.me/static/media/uploads/.thumbnails/ToxicRecall1.png/ToxicRecall1-692x490.png">{width="692"
height="490"}</p>
<p>So in summary we see 4 really important factors leading to the attack's
viability:</p>
<ol>
<li>Joinmarket's strict account separation</li>
<li>Linkability via change - as we'll describe in the next section
"Joinmarket sudoku", this is <em>usually</em> but not always possible, so
while (1) was 99% valid this is more like 75% valid (entirely vague
figures of course).</li>
<li>Reusing the same peers in different coinjoin transactions</li>
<li>Low number of peers</li>
</ol>
<p>Of course, 3 and 4 are closely tied together; reuse of peers happened a
lot precisely because there were so few peers available for large
coinjoin sizes (to remind you, it was between 6 and 58 BTC, and the
average was around 27, and there are/were few Joinmarket peers actually
offering above say 10BTC).</p>
<h2 id="size-factor">The size factor</h2>
<p>This is a thread that's run through the above, but let's be clear
about it: in practice, typical Joinmarket coinjoins run from 0.1 to 10
BTC, which is unsurprising. There are a fair number of much smaller
transactions, many just functioning as tests, while <em>really</em> small
amounts are not very viable due to the fees paid by the taker to the
bitcoin network. Larger than 10 BTC are certainly seen, including up to
50 BTC and even beyond, but they appear to be quite rare.</p>
<p>The actions of <em>A</em> in this regard were clearly suboptimal. They started
by taking 4 x 100 BTC outputs and consolidating them into 1 output of
400 BTC. This was not helpful, if anything the opposite should have been
done.</p>
<p>Second, as a consequence, they placed the entirety of this (I'm
ignoring the 45 BTC output for now as it's not that crucial) in one
mixdepth. For smaller amounts where a user is just casually offering
coins for joining, one output is fine, and will rapidly be split up
anyway, but here this very large size [led to most of the large-ish
joining events forming part of one long peeling
chain<em>.</em>]{style="text-decoration: underline;"} This part probably isn't
clear so let me illustrate. A yield generator/maker usually splits up
its coins into random chunks pretty quickly, and while as a maker they
do <strong>not</strong> get the crucial "sweep, no change" type of transaction
mentioned above, they nevertheless do get fragmentation:</p>
<p><code>Initial deposit --> After 1 tx --> After 2 txs --> After many txs</code></p>
<p><code>0: 1BTC --> 0.800 BTC --> 0.800 BTC --> 0.236 BTC</code></p>
<p><code>1: 0 BTC --> 0.205 BTC --> 0.110 BTC --> 0.001 BTC</code></p>
<p><code>2: 0 BTC --> 0.000 BTC --> 0.100 BTC --> 0.555 BTC</code></p>
<p><code>3: 0 BTC --> 0.000 BTC --> 0.000 BTC --> 0.129 BTC</code></p>
<p><code>4: 0 BTC --> 0.000 BTC --> 0.000 BTC --> 0.107 BTC</code></p>
<p>(Final total is a bit more than 1BTC due to fees; the reason it gets
jumbled, with no ordering, is: each tx moves coinjoin output to <em>next</em>
mixdepth, mod 5 (ie it wraps), but when a new tx request comes in it
might be for any arbitrary size, so the mixdepth used as <em>source</em> of
coins for that next transaction, could be any of them. This is
illustrated in the 'after 2 txs' case: the second mixdepth was chosen
as input to the second tx, not the first mixdepth).</p>
<p>This dynamic does <strong>not</strong> remove the "peeling chain" or "toxic
change" dynamic emphasized in OXT Research's report - because every tx
done by the maker still has its change, [precisely because as maker you
don't have the privilege of choosing the
amount]{style="text-decoration: underline;"}.</p>
<p>But it does result in more so to speak "parallelisation" of the mixing
activity, instead of the largest chunk all being in one long chain.</p>
<p>A question remains, if we imagine that we use much smaller amounts - can
the analyst always follow the "peeling chain of each mixdepth" (to
coin a phrase which at this point hopefully makes sense)?</p>
<p>I think actually the answer is more 'no' than you might at first
think. The next section will illustrate.</p>
<h2 id="sudoku">Joinmarket sudoku.</h2>
<p>This concept including its origination is covered in some detail in my
earlier article
<a href="https://github.com/AdamISZ/JMPrivacyAnalysis/blob/master/tumbler_privacy.md#jmsudoku-coinjoin-sudoku-for-jmtxs">here</a>.
Essentially we are talking about making unambiguous linkages between
change outputs and the corresponding inputs in any given Joinmarket
coinjoin. I reproduce one transaction diagram from that article here to
help the reader keep the right idea in mind:</p>
<p><img alt="Coinjoin
canonical" src="https://web.archive.org/web/20200713230834im_/https://joinmarket.me/static/media/uploads/cjmtx.svg">{width="550"
height="389"}</p>
<p>So to effect this "sudoku" or disentangling, let's suppose you don't
have any sophistication. You're just going to iterate over every
possible subset of the inputs (they're randomly ordered, of course) and
see if it matches any particular change output (you assume that there is
exactly one change output per participant). In case it wasn't obvious,
"matches" here means "that change output, plus the coinjoin size (3
btc in the diagram above), equals the sum of the subset of inputs".</p>
<p>Now none of them will <em>actually</em> match because there are fees of two
types being paid out of (and into) the change - the bitcoin network fees
and the coinjoin fees (which add to most and subtract from one, at least
usually). So since you don't know the exact values of those fees, only
a general range, you have to include a "tolerance" parameter, which
really complicates the issue.</p>
<p><a href="https://gist.github.com/AdamISZ/15223a5eab940559e5cf55e898354978">This
gist</a>
is a quick and dirty (in the sense it's barely a 'program' since i
just hardcoded the values of the transaction) example of doing such a
Joinmarket sudoku for one of the transactions in the OXT Research
analysis of flows for this case. The pythonistas out there might find of
interest particularly this code snippet for finding the "power set"
(the set of all subsets):</p>
<p><code>def power_set(l):</code>\
<code>iil = range(len(l))</code>\
<code>return list(chain.from_iterable(combinations(iil, r) for r in range(len(iil)+1)))</code></p>
<p>As per a very beautiful piece of mathematical reasoning, the power set
of a set of size \(N\) is \(2\^{N}\) (every member of set is either
in, or not in, each subset - think about it!). So this matters because
it illustrates, crudely, how we have here an exponential blowup.</p>
<p>That particular transaction had 24 inputs, so the power set's
cardinality would be \(2\^{24}\) - but the beginning of the analysis
is to take a subset, of size 4, you already conclude to be linked, thus
reducing the size of the search space by a factor of 16. Now, there's a
lot more to it, but, here's what's interesting: <strong>depending on the
tolerance you choose, you will often find there are multiple sudoku
solutions</strong> if the size of the set of inputs is reasonably large (let's
say 20 and up, but it isn't possible to fix a specific number of
course). In the first couple of attempts of finding the solution for
that transaction, I found between 3 and 7 different possible ways the
inputs and outputs could connect; some of them involve the pre-grouped 4
inputs acting as taker (i.e. paying fees) and some involve them acting
as maker.</p>
<p>Now, if this ambiguity isn't enough, there's another significant
source of ambiguity in these sudokus: previous equal-sized coinjoin
outputs. For example take <a href="https://blockstream.info/tx/5f8747a3837a56dd2f422d137b96b1420fd6885be6d1057f3c4dca102a3138b6?output:5">this
txn</a>:</p>
<p><img alt="un-sudoku-able
tx" src="https://web.archive.org/web/20200713230834im_/https://joinmarket.me/static/media/uploads/.thumbnails/tx5f8747.png/tx5f8747-849x617.png">{width="849"
height="617"}</p>
<p>There are 21 inputs, which is already in the "problematic" zone for
sudoku-ing, as discussed, in that it will tend to lead to multiple
possible solutions, with a reasonable tolerance parameter. But in this
case a full sudoku is fundamentally impossible: notice that inputs index
7 and 21 (counting from 0) both have amount 6.1212 . This means that any
subset that includes the first is identical to a subset that includes
the second. Those two outputs are, unsurprisingly, from the same
previous Joinmarket coinjoin (they don't have to be, though).</p>
<p>In any long "peeling chain" these ambiguities will degrade, perhaps
destroy, the signal over time - unless there is some very strong
watermark effect - such as huge size, which is precisely what we see
with <em>A</em>.</p>
<p>To summarize, we these key points about the Sudoku concept for
identifying chains of ownership:</p>
<ul>
<li>As long as you don't sweep, a Joinmarket account, thus not emptied,
will keep creating this chain of ownership via change - though the
size of that linked amount dwindles over time.</li>
<li>Thus makers (who cannot sweep) have no guarantee of not having that
specific ownership trace persist, for each of their 5 accounts (but
<em>not</em> across them - the 5 accounts will not be connected on chain,
at least not in a trivial way).</li>
<li>If you use a very large size then this acts as a strong enough
watermark that such tracing is pretty much guaranteed to work (i.e
the Sudoku works much more reliably if you put in a 400BTC utxo and
everyone else in the coinjoin only uses 10BTC at max).</li>
<li>Otherwise, and in general, such tracing is a bit unreliable, and
over a long series of transactions it becomes very unreliable (but
again - this is no kind of privacy guarantee! - we just observe that
there will be increasing uncertainty over a long chain, including
really fundamental ambiguities like the transaction above).</li>
<li>Whenever you <em>do</em> sweep, you create what I called in the previous
article a <a href="https://github.com/AdamISZ/JMPrivacyAnalysis/blob/master/tumbler_privacy.md#joinmarket-wallet-closures">"completed mixdepth
closure"</a>;
there is no change for you as taker, and so an end to that
"chain". This only exists for takers. (you can of course sweep
<em>without</em> coinjoin at all, also).</li>
</ul>
<h3 id="maker-taker">Reminder on the maker-taker tradeoff</h3>
<p>This illustrates another aspect of the more general phenomenon -
Joinmarket almost by definition exists to serve takers. They pay for
these advantages:</p>
<ul>
<li>As coordinator, they do not reveal linkages to their counterparties.
Makers must accept that the taker in each individual coinjoin <em>does</em>
know <em>their</em> linkages (the maker's), even if they're OK with that
over a long period because there are many disparate takers; that's
a weakness.</li>
<li>They choose the time when the coinjoin happens (within a minute or
so, it's done, if all goes well)</li>
<li>They choose the amount of the coinjoin, so can have a payment as a
coinjoin outpoint.</li>
<li>Corollary of the above: they can control the size of their change,
in particular, reducing it to zero via a "sweep"</li>
<li>Since they run only when they want to coinjoin, they have a smaller
time footprint for attackers (makers have an "always on hot
wallet" <em>which responds to requests rather than initiates them</em> ,
so it's more like a server than a client, which is by definition
difficult to keep properly secure).</li>
</ul>
<p>These 4+ advantages are what the Taker pays for, and it's interesting
that in practice that the <em>coinjoin</em> fee has fallen to near-zero
<strong>except for larger sizes</strong> .I will point to my earlier thoughts on low
fees
<a href="https://x0f.org/web/statuses/104123055565241054">here</a>
to avoid further sidetrack.</p>
<p>Therefore the cool sounding idea "oh I have this bunch of bitcoin
sitting around, I'll just passively mix it for a while and actually get
paid to do it!" (I have noticed people <em>mostly</em> get interested in
Joinmarket from this perspective) is more limited than it seems.</p>
<h2>Address reuse</h2>
<p>This will probably be the shortest section because it's so obvious.</p>
<p>The fact that 270BTC of the 445 BTC going "into" Joinmarket ended up
at <code>16vBEuZD54NzqnnSStPYxFF2aktGhhuaf1</code>is kind of a big facepalm moment;
I don't think anyone reading this blog would have trouble understanding
that.</p>
<p>I don't dismiss or ignore that such things happen for a reason, and
that reason is mainly actions of centralized exchanges to deliberately
reduce the privacy of their customers ("KYC/AML"). Sometimes, of
course, sheer incompetence is involved. But it's the exception rather
than the rule, since even the most basic consumer wallets do not
generally reuse addresses nowadays. I'll consider these real world
factors out-of-scope of this article, although they will matter in your
practical real life decisions about keeping your privacy (consider <em>not</em>
using such exchanges).</p>
<p>What has to be said though: 270 does not equal 445 (or 400); it is not
impossible to imagine that such a set of deposits to one address may not
be traced/connected to the original deposit of 400 (+) into Joinmarket
(although it would really help if that total wasn't so very large that
there are only a few Joinmarket participants in that range anyway). And
indeed, my own examination of the evidence tells me that the connections
of each individual final deposit to
`16vBEuZD54NzqnnSStPYxFF2aktGhhuaf1```back to that original 445 is <em>not</em>
unambiguous. The problem is of course the compounding effect of
evidence, as we will discuss in the next, final section.</p>
<h2 id="summary">Summary; lessons learned; advice to users</h2>
<p>So we've looked into details, can we summarize what went wrong for <em>A</em>?
Albeit we don't actually know with certainty how much of the
attributions in the OXT Research are correct, they appear to be <em>broadly
correct</em>.</p>
<ol>
<li>400 BTC is a very large amount to move through a system of perhaps
at best a couple hundred users (100 makers on the offer at once is
typical), most of whom are not operating with more than 10 BTC.</li>
<li>One large chunk of 400 is therefore a way worse idea than say 10
chunks of 40 across 10 Joinmarket wallets (just common sense really,
although starting with 400 is not in itself a disaster, it just
makes it harder, and slower). This would have been more hassle, and
more fees, but would have helped an awful lot.</li>
<li>Running passively as a maker proved too slow for <em>A</em> (this is an
assumption that the report makes and that I agree with, but not of
course a 'fact'). This is Joinmarket's failing if anything; there
are just not enough people using it, which relates to the next
point:</li>
<li>When switching to a taker mode (which in itself was a very good
idea), <em>A</em> decided to start doing much larger transaction sizes, but
found themselves unable to get more than a few counterparties in
some cases. This should have been a sign that the effect they were
looking for might not be strong enough, but it's very
understandable that they didn't grok the next point:</li>
<li>The "toxic replay attack" very heavily compounds the low anonymity
set problem mentioned above - reuse of the same counterparties in
successive transactions reduced the anonymity set from "bad" to
"disastrously low" (even down to 1 in one case).</li>
<li>Even with the above failings, all needn't really be lost; repeated
rounds are used and the '1' anonymity set mentioned output was
sent to another coinjoin anyway. The first chunk of coins identified
to be sent to Poloniex address (first to be identified, not first in
time) was in an amount of about 28 BTC via several hops, then part
of the 76 BTC in <a href="https://web.archive.org/web/20200713230834/https://joinmarket.me/blog/blog/the-445-btc-gridchain-case/%22https://blockstream.info/tx/ab1e604cd959cc94b89ab02b691fe7d727d30637284e5e82908fb28b8db378f4">this
txn</a>,
and even the first hop only had a 50% likelihood assigned. So it's
a combination of (a) the address being marked as in the POLONIEX
cluster, the size of the deposit and then the reuse allowing tracing
back to other transactions, that caused a "high-likelihood
assignment of ownership", which leads into ...</li>
<li>Address reuse as discussed in the previous section is the biggest
failing here. If all the deposits here were to different exchange
addresses, these heuristics would not have led to any clear
outcomes. A few guesses here and there would exist, but they would
remain guesses, with other possibilities also being reasonable.</li>
<li>Circling back to the beginning, notice how making educated guesses
about deposits on exchanges a few hops away from Joinmarket might
already be enough to get some decent guesses at ownership, if the
sizes are large enough compared to the rest of the Joinmarket usage.</li>
</ol>
<p>So overall the post mortem is: <strong>a combination of at least three
different things leads to a bad outcome for <em>A</em> : large (much bigger
than typical JM volume) size not split up, heavy address reuse (on a
centralized exchange) and a small anonymity set portion of the
sequence.</strong></p>
<p>This issue of "combination of factors" leading to a much worse than
expected privacy loss is explained well on the bitcoin wiki Privacy page
<a href="https://en.bitcoin.it/wiki/Privacy#Method_of_data_fusion">here</a>.</p>
<h3 id="already">Already implemented improvements</h3>
<p>When running as a taker and using the so-called <a href="https://github.com/JoinMarket-Org/joinmarket-clientserver/blob/master/docs/tumblerguide.md">tumbler
algorithm</a>
users should note that in 2019 a fairly meaningful change to the
algorithm was implemented - one part was to start each run with a sweep
transaction out of each mixdepth containing coins as the first step
(with longer randomized waits). This makes a peeling chain direct from a
deposit not possible (you can always try to guess which coinjoin output
to hop to next of course, with the concomitant difficulties).
Additionally average requested anonymity sets are increased, which, as
an important byproduct tends to create larger input sets which are
harder to sudoku (and more likely to have substantial ambiguity). There
are several other minor changes like rounding amounts, see <a href="https://gist.github.com/chris-belcher/7e92810f07328fdfdef2ce444aad0968">Chris
Belcher's document on
it</a>
for more details.</p>
<h3 id="still-needed">Still needed improvements</h3>
<p>Clearly the toxic recall attack concept matters - it is going to matter
more, statistically, as the anonymity set (i.e. the number of coinjoin
counterparties) is reduced, but it matters per se in any context -
reusing the same counterparties <strong><em>in a sequence of coinjoins from the
same mixdepth closure</em></strong> reduces the anonymity set. Notice there are a
couple of ways that situation could be remediated:</p>
<ol>
<li>Reduce the number of coinjoin transactions within the same mixdepth
closure - but this is not clear. If I do 1 coinjoin transaction with
10 counterparties and it's a sweep, closing the mixdepth closure,
is that better than doing 2 coinjoin transactions from it, each of
which has 6 counterparties, if there is a 10% chance of randomly
choosing the same counterparty and thus reducing the anonymity set
of the second coinjoin by 1? That is pretty profoundly unclear and
seems to just "depend". 1 transaction with 12 counterparties <em>is</em>
clearly better, but very large sets like that are very difficult to
achieve in Joinmarket today (particularly if your coinjoin amount is
large).</li>
<li>Actively try to prevent reusing the same counterparty for multiple
transactions in the same mixdepth closure (obviously this is for
takers; makers are not choosing, they are offering). Identification
of bots is problematic, so probably the best way to do this is
simply for a taker to keep track of its earlier txs (especially
within a tumbler run, say) and decide to not include makers when
they provide utxos that are recognized as in that set. This is still
a bit tricky in practice; makers don't want their utxos queried all
the time, but takers for optimal outcomes would like full
transparent vision into those utxo sets - see <a href="https://web.archive.org/web/20200713230834/https://joinmarket.me/blog/blog/poodle/">earlier discussion of
PoDLE</a>
and
<a href="https://web.archive.org/web/20200713230834/https://joinmarket.me/blog/blog/racing-against-snoopers-in-joinmarket-02/">here</a>
on this blog for the tricky points around this.</li>
</ol>
<p>(2) is an example of ideas that were discussed by Joinmarket
developers years ago, but never really went anywhere. Takers probably
<em>should</em> expand the query power given them by the PoDLE tokens to have a
larger set of options to choose from, to gauge the "quality" of what
their counterparties propose as join inputs, but it's a delicate
balancing act, as mentioned.</p>
<h3 id="recommendations">Recommendations for users</h3>
<p>For the final section, some practical advice. Joinmarket can be a
powerful tool - but it's unfortunately not very easy to understand what
you <em>should</em> do, precisely because there is a lot of flexibility.</p>
<ol>
<li>The taker role, and in particular the tumbler role, are designed to
be used to actively improve your privacy. We explain above that it
gives certain advantages over the maker role. So: <strong>use it!</strong> - with
at least the default settings for counterparty numbers and
transaction numbers, and long waits - in fact, increase these
factors above the defaults. Note that tumbles can be safely
restarted, so do it for a day, shut it down and then restart it a
few days later - that's fine. See the docs for more on that. Be
sensitive to bitcoin network fees - these transactions are very
large so they'll be more palatable at times when the network is
clearing 1-5 sats/byte. However ...</li>
<li>... mixing roles definitely has advantages. The more people mix
roles the more unsafe it is to make deductions about which coinjoin
output belonged to the taker, after it gets spent (consider what you
can deduce about a coinjoin output which is then spent via an
ordinary wallet, say to a t-shirt merchant).</li>
<li>The maker role isn't useless for privacy, it's rather best to
think of it as (a) limited and (b) taking a long time to have an
effect. It's most suitable if your threat model is "I don't want
a clear history of my coins over the long term". It also costs
nothing monetarily and brings in some very small income if your size
is large (if small, it's likely not worth mentioning) - but in that
case, take your security seriously.</li>
<li>Consider sizing when acting as a taker. We as a project should
perhaps create more transparency around this, but you can gauge from
your success in arranging big size coinjoins: if you can't easily
find 6+ counterparties to do a coinjoin at a particular size, it may
not be a good idea to rely on the outcomes, as you may be mixing in
too small of a crowd (whether that's at 10 BTC or 20 BTC or 50+ BTC
just depends on market condition).</li>
<li>Make good use of (a) the accounts (mixdepths) feature, (b) the coin
freeze feature and (c) the sweep feature (taker only). These three
things allow you to better isolate coins going to different
destinations - your cold wallet, your mobile spending wallet, an
exchange etc etc. Accounts let you have the assurance that coins in
one aren't linked with coins in another; you can't accidentally
co-spend them. The freeze feature (see the "Coins" tab on Qt) lets
you spend individual utxos, where that's important to you for some
reason, without connection to others. And the sweep feature lets you
make a coinjoin without any change, breaking a link to future
transactions.</li>
</ol>
<p>We soon (in 0.7.0; the code is basically already done) hope to have more
helpful features, in particular Payjoin as defined in BIP 78, along with
very basic PSBT support.</p>Schnorrless Scriptless Scripts2020-04-15T00:00:00+02:002020-04-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2020-04-15:/blog/blog/schnorrless-scriptless-scripts/<p>a new ECDSA single-signer adaptor signature construction.</p><h3>Schnorrless Scriptless Scripts</h3>
<h2>Introduction</h2>
<p>The weekend of April 4th-5th 2020 we had a remote "Lightning
Hacksprint" organized by the ever-excellent Fulmo, one Challenge was
related to "Payment Points" (see
<a href="https://wiki.fulmo.org/index.php?title=Challenges#Point_Time_Locked_Contracts_.28PTLC.29">here</a>;
see lots more info about the hacksprint at that wiki) and was based
around a new innovation recently seen in the world of adaptor
signatures. Work was led by Nadav Kohen of Suredbits and Jonas Nick of
Blockstream; the latter's API for the tech described below can be seen
currently as a PR to the secp256k1 project
<a href="https://github.com/jonasnick/secp256k1/pull/14">here</a>.
The output from Suredbits was a demo as show
<a href="https://www.youtube.com/watch?v=w9o4v7Idjno&feature=youtu.be">here</a>
on their youtube, a PTLC (point time locked contract, see their
<a href="https://suredbits.com/payment-points-part-1/">blog</a>
for more details on that).</p>
<p>I will not focus here on either the proof of concept code, nor the
potential applications of this tech (which are actually many, not only
LN, but also Discreet Log contracts, various design of tumbler and
others), but entirely on the cryptography.</p>
<h2>What you can do with Schnorr adaptors</h2>
<p>Previous blog posts have covered in some detail the concept of adaptor
signatures, how they are simply realizable using the Schnorr signature
primitive. Also noted here and elsewhere is that there are techniques to
create the same effect using ECDSA signature, but involving considerable
additional crypto machinery (Paillier homomorphic encryption and certain
zero knowledge (range) proofs). This technique is laid out in
<a href="https://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20180426/fe978423/attachment-0001.pdf">this</a>
brief note, fleshed out fully in
<a href="https://eprint.iacr.org/2017/552">this</a>
cryptographic construction from Lindell, paired with
<a href="https://eprint.iacr.org/2018/472">this</a>
paper on multihop locks (which represents a very important theoretical
step forward for Lightning channel construction). The problem with that
"tech stack" is the complexity in the Lindell construction, as
mentioned.</p>
<p>A recent
<a href="https://github.com/LLFourn/one-time-VES/blob/master/main.pdf">paper</a>
by Lloyd Fournier represents a very interesting step forward, at least
in a certain direction: it allows "single signer" ECDSA adaptor
signatures. The scare quotes in the previous sentence represent the fact
that the use of such adaptor signatures would not literally be single
signer - it would be in the context of Bitcoin's <code>OP_CHECKMULTISIG</code>,
most typically 2 of 2 multisig, so the same as the current
implementation of the Lightning network, in which a contract is enforced
by having funds controlled by both parties in the contract. Here, what
is envisaged is not a cooperative process to construct a single
signature (aggregated), but each party can individually create adaptor
signatures with signing keys they completely control. That this is
possible was a big surprise to me, and I think others will be unclear on
it too, hence this blog post after a week or so of study on my part.</p>
<p>Let's remember that the Schnorr adaptor signature construction is:</p>
<p>\(\sigma'(T, m, x) = k + H(kG+T||xG||m)x\)</p>
<p>where \(k\) is the nonce, \(x\) is the private (signing) key and
\(T\) is the 'adaptor point' or just adaptor. The left-hand-side
parentheses are important: notice that <strong>you don't need the discrete
log of the point T to construct the adaptor signature</strong>. But you <em>do</em>
need the signing key \(x\). Or wait .. do you?</p>
<p>As I explained last year
<a href="https://x0f.org/web/statuses/102897691888130818">here</a>
it's technically not the case: you can construct an adaptor signature
for signing pubkey \(P\) for which you don't know \(x\) s.t.
\(P=xG\), with a fly in the ointment: you won't be able to predict
the adaptor \(T\) or know its discrete log either (this makes it
un-dangerous, but still an important insight; I was calling this
"forgeability" but more on that later).</p>
<p>How you ask? To summarize the mastodon post:</p>
<p>\(\stackrel{\$}{\leftarrow} q, Q=qG, \quad \mathrm{assume}\quad
R+T =Q\)</p>
<p>\(\Rightarrow \sigma' G= R + H(P,Q,m)P\)</p>
<p>\(\stackrel{\$}{\leftarrow} \sigma' \quad \implies R = s'G -
H(P,Q,m)P \implies T = Q-R\)</p>
<p>Thus anyone can publish an adaptor signature \((T, \sigma')\) on any
message \(m\) for any pubkey \(P\) at any time. It <em>really</em> isn't a
signature.</p>
<p>And equally obvious is that this does not allow the "forger" to
complete the adaptor into a full signature (\(\sigma = \sigma' +
t\)) - because if he could, this would be a way to forge arbitrary
Schnorr signatures!</p>
<p>With the caveat in the above little mathematical vignette aside, we note
that the bolded phrase above is the crucial point: adaptors can be
created by non-secret owners, for secret owners to complete.</p>
<h2>Adaptors in ECDSA with less wizardry</h2>
<p>I was alerted to this trick via <a href="https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-November/002316.html">this mailing list
post</a>
and the work of the Suredbits guys, in particular Nadav Kohen, who blogs
on payment points, DLCs and related topics
<a href="https://suredbits.com/payment-points-part-1/">here</a>.
The idea can be summarised as "tweak the nonce multiplicatively instead
of linearly". Take the following notation for the base (complete) ECDSA
signature:</p>
<p>\(\sigma = k^{-1}\left(\mathbb{H}(m) + R_{\mathrm{x}}x\right)
\)</p>
<p>Here we're using the most common, if sometimes confusing notation. As
usual \(k\) is the nonce (generated deterministically usually),
\(R=kG\), \(m\) is the message and \(x\) is the private signing
key whose public key by convention is \(P\). Meanwhile
\(R_{\mathrm{x}}\) indicates the x-coordinate of the curve point
\(R\), with the usual caveats about the difference between the curve
order and the order of the finite field from which the coordinates are
drawn (feel free to ignore that last part if it's not your thing!).</p>
<p>Now clearly you cannot just add a secret value \(t\) to the nonce and
expect the signature \(\sigma\) to be shifted by some simple factor.
Multiplication looks to make more sense, since after all the nonce is a
multiplicative factor on the RHS. But it's not so simple, because the
nonce-<em>point</em> appears as the term \(R_{\mathrm{x}}\) inside the
multiplied factor. The clever idea is how to get around this problem. We
start by defining a sort-of "pre-tweaked" nonce:</p>
<p>\(R' = kG\)</p>
<p>and then the real nonce that will be used will be multiplied by the
adaptor secret \(t\):</p>
<p>\(R = kT = ktG\)</p>
<p>Then the adaptor signature will be published as:</p>
<p>\(\sigma' = k^{-1}\left(\mathbb{H}(m) + R_{\mathrm{x}}x\right)
\)</p>
<p>... which may look strange as here the RHS is identical to what we
previously had for the <em>complete</em> signature \(\sigma\). The
difference of course is that here, the terms \(k\) and \(R\) don't
match up; \(R\) has private key \(kt\) not \(k\). And hence we can
easily see that:</p>
<p>\(\sigma = t^{-1} \sigma'\)</p>
<p><em>will</em> be a valid signature, whose nonce is \(kt\).</p>
<p>However, we do not operate in a world without adversaries, so to be sure
of the statement "if I get given the discrete log of \(T\), I will be
able to construct a fully valid \(\sigma\)", we need a proof of that
claim. This is the key innovation, because this can be done <em>very</em>
simply with a proof-of-discrete-log, or a "PoDLE" as was described in
one of the first <a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/poodle/">blog
posts</a>
here. To prove that \(R'/G = R/T = k\), where we somewhat abuse / to
mean "elliptic curve discrete log", you just create an AND of two
\(\Sigma\)-protocols, using the same commitment (i.e., nonce), let's
call it \(k_2\) and output a schnorr style response \(s = k_2 +
ek\), where the hash e covers both points \(k_2 G\ ,\ k_2 T\) as
has been explained in the just-mentioned PoDLE blog post and also in a
bit more generality in the <a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/ring-signatures/">post on ring
signatures</a>.</p>
<p>It's thus intuitive, though not entirely obvious, that an "adaptor
signature" in this context is really a combination of the same idea as
in Schnorr, but with additionally a PoDLE tacked-on:</p>
<p>Input:</p>
<p>an adaptor point \(T\), a message \(m\), a signing key \(x\)</p>
<p>Output:</p>
<p>adaptor signature \((\sigma', R, R')\), adaptor signature PoDLE:
\((s, e)\)</p>
<p>Verification for non-owner of adaptor secret \(T\):</p>
<p>1. Verify the PoDLE - proves that \(R, R'\) have same (unknown)
discrete log w.r.t. \(T, G\) respectively.</p>
<p>2. Verify \(\sigma' R' \stackrel{?}{=} \mathbb{H}(m) +
R_{\mathrm{x}} P\)</p>
<h2>Swapping ECDSA coins with this method</h2>
<p>Fundamentally, if not exclusively, adaptor signatures as originally
conceived, and still here, allow the swap of a coin for a secret (in
that broadcast of a spending transaction necessarily implies broadcast
of a signature which can be combined with a pre-existing adaptor
signature to reveal a secret), and the crudest example of how that can
be used is the coinswap or atomic swap, see
<a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/coinswaps/">these</a>
<a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">previous</a>
blog posts for a lot of detail on pre-existing schemes to do this, both
with and without the Schnorr signature primitive that was previously
thought to be near-required to do adaptor signatures.</p>
<p>The ECDSA scheme above can be used in a slightly different way than I
had originally described for Schnorr adaptor signatures, but it appears
that was partly just oversight on my part: the technique described below
<em>can</em> be used with Schnorr too. So the advantage here is principally
that we can do it right now.</p>
<p>1. Alice spends into a 2/2 A1 B1 after first negotiating a timelocked
refund transaction with Bob, so she doesn't risk losing funds.</p>
<p>2. Bob does the same, spending into a 2/2 A2 B2 after negotiating a a
timelocked refund tranasction with Alice, so he also doesn't risk, but
his timelock is closer.</p>
<p>3. Alice creates an adaptor \(\sigma_{1}^{'}\) spending with key
A1 to Bob's destination and adaptor point \(T\) for which she knows
discrete log \(t\).</p>
<p>4. Bob verifies \(\sigma_{1}^{'}\) and the associated data
mentioned above, including crucially the PoDLE provided.</p>
<p>5. Bob creates an adaptor \(\sigma_{2}^{'}\) spending with key B2
to Alice's destination and adaptor point \(T\) for which he does
<strong>not</strong> know the \(t\).</p>
<p>6. Alice can now safely complete the adaptor she receives: \(\sigma_2
= t^{-1}\sigma_{2}^{'}\) and co-sign with A2 and broadcast,
receiving her funds.</p>
<p>7. Bob can see on the blockchain (or communicated directly for
convenience): \(t = \sigma_{2}^{'}\sigma_{2}^{-1}\) and use it
to complete: \(\sigma_{1} = t^{-1}\sigma_{1}^{'}\), and co-sign
with B1 and broadcast, receiving his funds.</p>
<h3>Comparisons to other coinswaps:</h3>
<p>This requires 2/2 P1 P2 type scriptPubKeys; these can be p2sh multisig
or p2wsh multisig using, as mentioned, <code>OP_CHECKMULTISIG</code>. Notice that
in a future Taproot/Schnorr world, this will still be possible, using
the linear style adaptor signatures previously described. However in
that case a musig-style combination of keys will almost certainly be
preferred, as it will create transaction styles that look
indistinguishable from single or any other script types. For now, the
system above does share one very valuable anonymity set: the set of
Lightning channel opens/closes, but doesn't share an anonymity set with
the full set of general single-owner ECDSA coins (which includes both
legacy and segwit).</p>
<p>For now, this method has the principal advantage that the only failure
mode is the timelocked backout, which can be a transaction that looks
entirely normal - having a non-zero <code>nLockTime</code> somewhere around the
current block is actually very normal. While the atomic enforcement part
is, just like Schnorr adaptors, entirely invisible. So apart from the
smaller anonymity set (2-2, so mostly LN), it has excellent privacy
properties.</p>
<h2>Reframing adaptors \(\rightarrow\) otVES</h2>
<p>The aforementioned
<a href="https://github.com/LLFourn/one-time-VES/blob/master/main.pdf">paper</a>
of 2019 by Lloyd Fournier is titled "<em>One Time Verifiably Encrypted
Signatures A.K.A. Adaptor Signatures</em>" - at first this new name
(henceforth otVES) seemed a bit strange, but after reading the paper I
came away pretty convinced. Both the conceptual framework is very clean,
but also, this links back to earlier work on the general concept of
Verifiably Encrypted Signatures. Most particularly the work of the same
guys that brought us BLS signatures from bilinear pairing crypto, in
<a href="http://crypto.stanford.edu/~dabo/papers/aggreg.pdf">this
paper</a>
(namely, Boneh, Lynn, Shacham but also Gentry of FHE fame). The context
considered there was wildly different, as Fournier helpfully explains:
this earlier work imagined that Alice and Bob wanted to fairly exchange
signatures that might be useful as authorization for some purpose. To
achieve that goal, they imagined trusted third party acting between
them, and that an encrypted-to-third-party-adjudicator but still
<em>verifiable</em> signature could serve as the first step of a fair protocol,
assuming honesty of that third party. However what makes the Bitcoin
use-case special is that signatures <strong>are useable if and only if
broadcast</strong><em>. </em>All of this coinswap/HTLC/second layer stuff relies on
that property. In this scenario, having not only a VES but an otVES is
exactly desirable.</p>
<p>Why is one-time desirable here? It's a little obtuse. For those
familiar with cryptography 101 it'll make sense to think about the <a href="https://en.wikipedia.org/wiki/One-time_pad">one
time
pad</a>.
The absolutely most basic concept of encryption (which also happens to
be perfectly secure, when considered in the most <a href="https://en.wikipedia.org/wiki/Spherical_cow">spherical
cow</a>
kind of way): take a plaintext \(p\) and a key \(k\), bitstrings of
the exact same length. Then make the ciphertext \(c\):</p>
<p>\(c = p \oplus k\)</p>
<p>and the thing about this that makes it perfect is exactly also something
that can be considered a "bug": the symmetry of the \(\oplus\)
(xor) operation is such that, given both the plaintext and the
ciphertext, the key can be derived: \(k = c \oplus p\). So any
broadcast of \(p\), after an earlier transfer of \(c\) (to Bob,
let's say), means that the secret key is revealed.</p>
<p>The same is true in our adaptor signature or VES scenario: the adaptor
signature \(\sigma'\) is an "encrypted signature", and is
verifiable using the verification algorithm already discussed, by anyone
who has that encrypted signature and the adaptor "public key" which we
called \(T\). Notice how this is analogous to <em>public</em> key encryption,
in that you only need a public key to encrypt; but also notice that the
one-time pad is <em>secret key </em>encryption, which is why the plaintext and
ciphertext are enough to reveal the key (note: more developed secret key
algorithms than OTP handle this problem). This is some kind of hybrid of
those cases. Once the "plaintext" signature \(\sigma\) is revealed,
the holder of the "encrypted" signature \(\sigma'\) can derive the
private key: \(t\).</p>
<p>So hopefully this makes clear why "one-time-ness" is not so much in
itself desirable, as what is implied by it: that the "private key"
(the <em>encryption</em> key, not the <em>signing</em> key, note!) is revealed on one
usage.</p>
<h2>Security properties - deniability, forgeability, validity, recoverability ...</h2>
<p>At a high level, what security properties do we want from these
"encrypted signatures''? I think there's a strong argument to focus
on two properties:</p>
<ul>
<li>Handing over such encrypted signatures should not leak any
information to any adversary, including the recipient (it may or may
not be needed to keep the transfer private, that is not considered
in the model).</li>
<li>Given an encrypted signature for a message and key, I should be able
to convince myself that when the plaintext signature is revealed, I
will get the secret key \(t\), or complementary: when the secret
key \(t\) is revealed, I should be able to recover the plaintext
signature.</li>
</ul>
<p>We'll deal with both of these points in the following subsections.</p>
<h3>Deniability</h3>
<p>The Schnorr version of the otVES is deniable in the specific sense that
given an unencrypted signature, a corresponding encrypted signature for
any chosen key (\(t\)) can be claimed, as was explained
<a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">here</a>
("Deniability" subsection). For anyone familiar with the basic
construction of zero knowledge proofs, this will be immediately
recognized as being the definition of a "Simulator", and therefore
proves that such an adaptor signature/encrypted signature leaks zero
information to recipients.</p>
<p>It is interesting to observe that the same trick does <strong>not</strong> work with
the ECDSA variant explained above:</p>
<p>Given \(\sigma, R\) satisfying \(\sigma R = \mathbb{H}(m)G +
R_{\mathrm{x}}P\) for the verifying pubkey \(P\), you can try to
assert that \(k = tk_2\) but <strong>you have no way to generate a PoDLE for
\(R, R'\) if you don't know k</strong> - this means that such a
"retrofitted" encrypted signature (which by definition <em>includes</em> the
PoDLE) is not possible for a party not knowing the original secret
nonce, and thus the simulator argument (the argument that an external
observer <em>not knowing the secret</em> can create fake transcripts with a
distribution indistinguishable from the real transcripts) is not
available, hence we cannot claim that such encrypted signatures are
fully zero knowledge. More on this shortly.</p>
<h3>Forgeability</h3>
<p>I am abusing terms here, because unforgeability is the central property
of a valid signature scheme, but here let's talk about the forgeability
of an <em>encrypted</em> signature, so perhaps "adaptor forgeability". Here I
mean the ability to create arbitrary encrypted signatures <em>without</em> the
signing key. This was demonstrated as possible for Schnorr in the first
section of this blog post (noting the obvious caveat!). For ECDSA, we
hit the same snag as for 'Deniability'. Without possessing the signing
key \(x\), you want to make the verification \(\sigma' R' =
\mathbb{H}(m)G + R_{\mathrm{x}}P\) pass for some \(R, R', T, R =
tR'\) such that you can prove DLOG equivalence w.r.t. \(G, T\). You
can do this by "back-solving" the same way as for Schnorr:</p>
<p>\(\stackrel{\$}{\leftarrow} k^{*}, R=k^{*}G, \quad Q =
\mathbb{H}(m)G + R_{\mathrm{x}}P\)</p>
<p>\(\stackrel{\$}{\leftarrow} \sigma', \quad \Rightarrow \sigma'
R' = Q \Rightarrow R' = (\sigma')^{-1}Q\)</p>
<p>But since this process did <em>not</em> allow you to deduce the scalar \(q\)
s.t. \(Q = qG\), it did not allow you to deduce the corresponding
scalar for \(R'\). Thus you can output a set \(\sigma', R, R'\)
but you cannot also know, and thus prove equivalence of, the discrete
logs of \(R\) and \(R'\).</p>
<p>The previous two sections demonstrate clearly that the otVES
construction for ECDSA is fundamentally different from that for Schnorr
in that it requires providing, and proving a relationship between two
nonces, and this also impacts quite significantly the security arguments
that follow.</p>
<h3>Validity, Recoverability</h3>
<p>These are aspects of the same thing, so grouped together, and they talk
about the most central and unique property for an otVES scheme, but
fortunately it is almost tautological to see that they hold for these
schemes.</p>
<p>The concern it addresses: what if Alice gave Bob an encrypted signature
to a key \(T\) but it turned out that when decrypted with the
corresponding key \(t\), a valid signature wasn't actually revealed.
That this is impossible is called <strong>validity</strong>. The flip side is
<strong>recoverability</strong>: if Alice gave Bob an encrypted signature and then
published the corresponding decrypted signature ("plaintext"), the
secret key for the encryption (\(t\)) must be revealed.</p>
<p>The Schnorr case illustrates the point clearly, see Lemma 4.1 in
Fournier's paper; \(\sigma' = \sigma -t\) in our notation and we
can see by the definition of Schnorr signature verification that this
must hold, given there cannot be another \(t' \ne t\) s.t. \(t'G =
T\) (there is a one-one mapping between scalars mod n and group
points). Recoverability is also unconditionally true in the same way.</p>
<p>For the ECDSA case, it is nearly the same, except: we rely on the PoDLE
between \(R, R'\), which has the same properties itself as a Schnorr
signature, and so the properties hold conditional on the inability to
break ECDLP (because that would allow Schnorr forgery, and thus PoDLE
forgery).</p>
<p>Note how a ECDLP break can obviously destroy the usefulness of all these
schemes, in particular the underlying signature schemes, but even that
does not alter the fact that the Schnorr encrypted signature is valid
and recoverable (though it becomes a mere technicality in that case).</p>
<h3>EUF-CMA for otVES using Schnorr</h3>
<p>EUF-CMA was discussed in the previous blogs on the Schnorr signature and
on ring signatures, in brief it is a technical term for "this signature
scheme is secure in that signatures cannot be forged by
non-secret-key-owners under this specific set of (fairly general)
assumptions".</p>
<p>Proving this for the Schnorr otVES turns out to be a fairly standard
handle-cranking exercise. This is essentially what I have focused on in
previous work as "proving soundness by running an extractor",
including patching up the random oracle. See the above linked post on
the Schnorr signature for more detail.</p>
<p>Note that unforgeability referred to here <strong>is not the same as "adaptor
forgeability" discussed above</strong>. Here we are specifically trying to
prove that access to such encrypted signatures does not help the
adversary in his pre-existing goal of forging <em>real </em>signatures.</p>
<p>So the handle-cranking simply involves adding an "encrypted signature
oracle" to the attacker's toolchest. EUF-CMA[VES] basically refers
to the inability to create signatures on new messages even when you have
access to arbitrary encrypted signatures, as well as arbitrary earlier
<em>complete</em> signatures, again, on different messages.</p>
<p>As Fournier points out here:</p>
<blockquote>
<p><em>EUF-CMA[VES] says nothing about the unforgeability of signature
encryptions. In fact, an adversary who can produce valid VES
ciphertexts without the secret signing key is perfectly compatible. Of
course, they will never be able to forge a VES ciphertext under a
particular encryption key. If they could do that, then they could
trivially forge an encrypted signature under a key for which they know
the decryption key and decrypt it.</em></p>
</blockquote>
<p>... which is the reason for my (I hope not too confusing) earlier
section on "adaptor forgeability". It <em>is</em> actually possible, for
Schnorr, but not ECDSA, to do what is mentioned in the second sentence
above.</p>
<h3>EUF-CMA[VES] for ECDSA</h3>
<p>Here is the most technical, but the most important and difficult point
about all this. In producing an encrypted ECDSA signature you output:</p>
<p>\((\sigma', R, R', m, P), \quad \textrm{DLEQ}(R, R')\)</p>
<p>(while \(m, P\) may be implicit of course), and this means you output
one piece of information in addition to the signature: that two nonce
points are related in a specific way. It turns out that this can be
expressed differently as the Diffie Hellman key of the key pair \((P,
T)\) (or, in Fournier's parlance, the signing key and the encryption
key). That DH key would be \(tP = xT = xtG\). Here's how; starting
from the verification equation for a published encrypted signature,
using the notation that we've used so far:</p>
<p>\(s'R' = \mathbb{H}(m) + R_{\mathrm{x}}P\)</p>
<p>isolate the public key P (this is basically "pubkey recovery"):</p>
<p>\(P = R_{\mathrm{x}}^{-1}\left(s'R' - \mathbb{H}(m)G\right)\)</p>
<p>\(\Rightarrow tP = R_{\mathrm{x}}^{-1}\left(s'tR' -
\mathbb{H}(m)tG\right)\)</p>
<p>\(\Rightarrow xT = tP = R_{\mathrm{x}}^{-1}\left(s'R -
\mathbb{H}(m)T\right)\)</p>
<p>Notice how we - a verifier, not possessing either the nonce \(k\) nor
the secret \(t\) - were able to deduce <em>this</em> DH key because we knew
the DH key of the key pair \((R', T)\) - it's \(R\), which we were
explicitly given. So this, in some sense "breaks" the <a href="https://en.wikipedia.org/wiki/Computational_Diffie%E2%80%93Hellman_assumption">CDH
assumption</a>:
that given only points on the curve \(A=aG, B=bG\) you should not be
able to calculate the third point \(abG\) (but "breaks" - because
actually we were given a related DH key to start with).</p>
<p>Fournier addresses this point in two ways. First, he argues that
requirement of the CDH problem being hard is not part of the protocols
for which this scheme is useful and that keys are by design one-time-use
in these applications. The more important point though, is that an
attempt is made to show the scheme secure <strong>if the CDH problem is
easy</strong>. A classic example of backwards cryptography logic ;)</p>
<p>The framework for this is non-trivial, and it is exactly the framework
developed by Fersch et al that was discussed in the section on ECDSA in
<a href="https://web.archive.org/web/20200803123741/https://joinmarket.me/blog/blog/liars-cheats-scammers-and-the-schnorr-signature/">this</a>
earlier blog post (subsection "What about ECDSA?"). I have not studied
this framework in any detail, only cursorily, and would encourage anyone
interested to at least watch the linked video of Fersch's talk on it,
which was quite interesting. With the addition of the assumption "CDH
is easy", Fournier claims that ECDSA can be said to have this
EUF-CMA[VES] security guarantee, which is intended to prove,
basically, that <strong>the leak of the DH key is the only leak of information
and that the scheme is secure against forgery</strong>. I can't claim to be
able to validate this; I can only say the argument appears plausible.</p>Avoiding Wagnerian Tragedies2019-12-15T00:00:00+01:002019-12-15T00:00:00+01:00Adam Gibsontag:joinmarket.me,2019-12-15:/blog/blog/avoiding-wagnerian-tragedies/<p>Wagner's attack</p><h3>Avoiding Wagnerian tragedies</h3>
<p><em>This blog post is all about
<a href="https://people.eecs.berkeley.edu/~daw/papers/genbday.html">this</a>
paper by David Wagner from 2002.</em></p>
<p><em>It is a personal investigation; long, mainly because I wanted to answer
a lot of questions for myself about it. If you are similarly motivated
to understand the algorithm, this may provide useful guideposts. But
there are no guarantees of accuracy.</em></p>
<p>_________________________________________________________________________</p>
<p>In the Berlin Lightning Conference, Jonas Nick gave a short talk (slides
<a href="https://nickler.ninja/slides/2019-tlc.pdf">here</a>)
that included a topic that had been on my "TODO list" for some
considerable time - the so-called Wagner attack. The talk was concise
and well thought out, and for me it made a lot of sense, but I suspect a
lot of the audience lost the key point, as indeed was evidenced by the
only audience question at the end, which was something along the lines
of "but doesn't the birthday attack mean you can only find a hash
collision in \(\sqrt{N}\) time, where \(N\) is the size of the hash
output?" - the questioner had, quite understandably, misunderstood
exactly what the attack does, and remembered what he (and most people
who take an interest in these things) saw as the key security property
that protects how SHA2 and similar are used in cryptocurrency.</p>
<p>So .. should you care? If so, why? I think the main value of this
<em>practically</em>, if, as likely, you're reading this from the perspective
of Bitcoin, is that it matters to various non-vanilla signing protocols:
it can matter to blind signatures, and multisignatures, and very likely
a whole slew of different complex contracts that might be based on such
things. And unfortunately, it is <strong>not intuitive</strong>, so it would be very
easy to miss it and leave a security hole.</p>
<p>My goal in this blog post will be to try to provide some intuition as to
what the hell Wagner's attack is, and why it could be dangerous.</p>
<h2>The Birthday Attack .. or Paradox ... (or just Party?)</h2>
<p>Just as the famous <a href="https://en.wikipedia.org/wiki/Twin_paradox">Twin
Paradox</a>
is not actually a paradox, nor is the perhaps even more famous <a href="https://en.wikipedia.org/wiki/Birthday_problem">Birthday
Paradox</a>.
The result shown in both of these thought experiments (and actual
experiments - the former <em>has</em> actually been done with atomic clocks and
small fractions of \(c\)) is just surprising, that's all. It violates
some simple intuitions we have. Here is it stated simply in words:</p>
<p>Given a set of 23 people (such as children in a classroom), it is a
<strong>better than 50-50 chance</strong> that at least some pair of them will share
the exact same birthday.</p>
<p>The simple argument is: the probability of at least one such pair
existing is \(1 - \) the probability \(p\) of there being <em>no</em> such
pair, which is the case exactly, and only, when <em>every child has a
different birthday.</em> Now we can easily see that \(p = 0\) when there
are 366 children (ignore leap years), and \(p=\frac{364}{365}\) when
there are only 2 children. The case for \(N\) children would be \(p =
\frac{364 \times 363 \times \ldots (365-N)}{ 365 \times 365 \times
\ldots 365}\) where here, we're using the fact that probabilities
multiply when we want the AND of different events. This \(1 - p\),
where \(p\) is a function of \(N\), just happens to be \(\simeq
0.5\) when \(N=23\), hence the result.</p>
<h3>Why intuitions about birthdays are (slightly) wrong.</h3>
<p>The 23 datapoint does surprise people, usually, but it doesn't shock
them. It just seems low. Why does it seem low? Is it because when we
hear the problem statement, we naturally think in more specific terms:
usually, when I am trying to make a match of two things, I am trying to
make a match from <em>one specific thing</em> against some other set of
comparable things. In case of birthdays, we might look for someone with
the same birthday as <em>us</em>, which is a very different problem to finding
<em>any pairwise match</em>, as here.</p>
<p>Also, it's probabilistic, and people don't have good intuitions about
probability generally.</p>
<p>But let's delve a little deeper than that. We're going to need to, to
understand the meat of this blogpost, i.e. Wagner's algorithm.</p>
<p>To <em>very roughly</em> quantify why there's a bit more of a chance of success
in getting a match, than you'd expect, imagine a square grid. Every new
child we add to the list adds another row *and* column; because this
is a <em>square</em>, this is a <strong>quadratic </strong>function, or effect, or scaling.</p>
<p><img src="https://web.archive.org/web/20200428222140im_/https://joinmarket.me/static/media/uploads/.thumbnails/simplesquare5.png/simplesquare5-300x367.png" width="300" height="367" alt="Simple illustration of search space for birthday matches" /></p>
<p><em>(Pictured above: simple example assuming only 3 children. The blue
stars represent possible matches; there are 3 choose 2 for 3 children,
i.e. 3. The lines illustrate that this is the same as 3x3/2 - 3/2. The
bottom left squares are redundant, and those on the diagonal don't
apply.)</em></p>
<p>If the set of children is \(\{L\}\), and we denote the size of the
set (number of elements) as \(|L|\), then we can see that the size
of the <a href="https://en.wikipedia.org/wiki/Cartesian_product"><strong>Cartesian
product</strong></a>
of the set with itself, is \(|L|^{2}\). Since in the problem
statement - getting a single match - we only need one of the elements of
this set to be a match. But let's qualify/correct a <em>little</em> bit so our
toy example is a little bit better defined. If Alice matches Carol on
the top row, she'll also match in the first column (A = C means also C =
A). Further the squares on the main diagonal don't count, A=A is not a
solution to the problem. So for a set \(\{L\}\), if we want the
number of chances of a 'hit' to be about the same as the number of
possible values (the 'sample space' - which for birthdays has size 365),
then we have this very rough approximation:</p>
<p>\(\frac{|L|^{2}}{2} - \frac{|L|}{2} \simeq 365\)</p>
<p>Notice this is a very artificial equation: there's no guarantee that
anything magical happen exactly when the size of the sample space of
each event (the 365) is equal to the number of 'events' (pairs of
children, in this case, that might have the same birthday). But it does
give us the right order of magnitude of <span
style="text-decoration: underline;">roughly how many children would be
needed for the probability to get at least one match in the set to be
'appreciable'</span> . Clearly if \(|L|\) was <em>much</em> bigger than the
positive solution to the above quadratic equation, the probability is
going to become overwhelming; eventually once it reaches 365 we must
have a solution, by the pigeonhole principle, and the probability will
be very close to 1 way before that. And indeed the positive solution is
\(\simeq 28\), which is around the same as the exact answer 23, if
our exact question is how large the set should be to get a 50%
probability.</p>
<p>So while as belaboured above, the calculation above is rough and
artificial, it conveys the key scaling information - <strong>the chance of
success scales with the square of the size of the set, because we are
comparing the set with itself</strong>.</p>
<h3>The birthday attack on hash functions</h3>
<p>This line of thinking is commonly applied to the problem of finding
<a href="https://en.wikipedia.org/wiki/Collision_(computer_science)">collisions in hash
functions</a>.</p>
<p>Suppose you had a hash function whose digests were of length 20 bytes
(SHA1 was of this type). This is 160 bits of 'entropy' - if you assume
it's doing a good job of producing unpredictably random output. However,
as a reminder, there is more than one attack against a hash function
that cryptographers worry about - finding a preimage for an output,
finding <em>another</em> preimage, and the third one relevant to our discussion
- just finding <strong>any</strong> collision, i.e. finding any two preimages giving
the same output hash. For this, the above "birthday paradox" scenario
applies exactly: we have a sample space of \(2^{160}\) possibilities,
and we're going to select a set \(\{L\}\) from it, with the
intention of finding at least one pair in the set with the same output
value. The mathematics is identical and we need something like
\(|L|^{2}\ \simeq 2^{160}\), or in other words, the size of the
set we'd have to generate to get a good chance of a collision in the
hash function, is \(\sqrt{2^{160}}=2^{80}\). Hence a common, if
approximate, statement, is that <span
style="text-decoration: underline;">hash functions have security against
collision of only half the bits of the output</span>. So here, SHA1
could crudely be considered as having 80 bits of security against
collisions ... unfortunately, this statement ignores the fact that
collisions in SHA1 have already been
<a href="https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html">found</a>.
This blog post is, however about non-broken cryptographic constructs;
collisions are supposed to not be possible to find other than by brute
force search, so that's a side story here.</p>
<h2>Wagner's algorithm</h2>
<p><a href="https://people.eecs.berkeley.edu/~daw/papers/genbday.html">Wagner's
paper</a>,
"A Generalised Birthday Problem", considers the question of what happens
if we don't just want a single match between items in a list like this,
but if we want, instead, <em>a relation between a set of items</em>. The
relation considered in particular is applying bitwise XOR, hereafter
\(\oplus\), e.g. :</p>
<p>\(a \oplus b \oplus c \oplus d = 0 \quad (1)\)</p>
<p>(this equation is only "e.g.", because we are not restricted to 4; any
number more than 2 is considered by the paper, and for a number \(k\)
of items, this is referred to as the <span
style="text-decoration: underline;">\(k\)-sum problem</span>, but for
now we'll keep it simple and stick to 4).</p>
<p>First, let's not forget the obvious: this is a trivial problem if we
just <strong>choose</strong> \(a, b, c, d\); just choose the first 3 at random and
make the last one fit. What Wagner's algorithm is addressing is the
scenario where the numbers are all drawn from a uniformly random
distribution (this observation also applies to the children's birthdays;
we are not <em>choosing</em> them but getting random ones), but we can generate
as many such randoms as is appropriate.</p>
<p>Next observation: this "generalised" problem is intuitively likely to be
easier than the original problem of finding only <em>one</em> pairwise match -
you can think of the original birthday problem of a match being the same
as: \(a \oplus b = 0\) (this means a perfect match between \(a\)
and \(b\)). There, we could think of ourselves as being constrained in
"roughly" one variable (imagine that \(a\) is fixed and you are
hunting for \(b\), with the caveat of course that it's crucial to the
argument of square-root scaling that that is <em>not</em> the correct problem
statement!). If we extend to 4 items holding a relation, as above in
\((1)\), then we have "roughly" three degrees of freedom to work with.
It'll always tend to be easier to find solutions to puzzles when you
have more pieces available to play with.</p>
<p>However, the meat of the paper is to explain just how much this problem
is easier than the original (pairwise) birthday problem to solve, and to
give an explicit algorithm for how to do so. Just like with the number
23, it is a bit surprising how effective this algorithm is.</p>
<h3>The algorithm</h3>
<p>To set the stage: suppose we are considering hash functions (so we'll
forget about birthdays now), and the values \(a, b, c, d\) in
\((1)\) are outputs of a hash function. Let's go with SHA256 for now,
so they will all be bit strings of length 256.</p>
<p>We can generate an arbitrary number of them by just generating random
inputs (one particularly convenient way: start with random \(x\),
calculate \(y = \mathbb{H}(x)\), then calculate \(\mathbb{H}(y)
\ldots \); this 'deterministic random' approach, which should still
give a completely random sequence if the hash function is well behaved,
can be very useful in many search algorithms, see e.g. Chapter 3 and 14
of
<a href="https://www.math.auckland.ac.nz/~sgal018/crypto-book/crypto-book.html">Galbraith</a>).
As in earlier sections, we can call this list of such values
\(\{L\}\).</p>
<p>Wagner's suggested approach is to break the problem up, in two ways:
first, take the list of items in \(L\) and split it into 4 (or
\(k\)) sublists \(L_1 , L_2, L_3, L_4\). Second, we will take 2
lists in pairs and then apply the birthday problem to each of them, but
with a twist: we'll only insist on a <strong>partial match</strong>, not a full
match.</p>
<p><em>(Historical note: this idea of using a subset of values satisfying a
simple verifying criteria is also seen in discrete log find algorithms
as well as hash collision finding algorithms, and is often known as
"distinguished points"; the idea seems to go back as far as the early
80s and is due to Rivest according to 14.2.4 of
<a href="https://www.math.auckland.ac.nz/~sgal018/crypto-book/crypto-book.html">Galbraith</a>.
(Of note is that it's intriguingly analogous to Back's or Dwork's proof
of work computation idea).)</em></p>
<p>The following diagram to illustrate the idea is taken directly from the
Wagner paper:</p>
<p><img src="https://web.archive.org/web/20200428222140im_/https://joinmarket.me/static/media/uploads/.thumbnails/wagnerpic1.png/wagnerpic1-527x472.png" width="527" height="472" alt="Wagner algorithm schematic from paper" /></p>
<p>The \(\bowtie\) symbol may not be familiar: it is here intended to
represent a
<a href="https://www1.udel.edu/evelyn/SQL-Class2/SQLclass2_Join.html">join</a>
operation; the non-subscripted variant at the top is what may be called
an 'inner join' (just, find matches between the two sets), whereas
\(\bowtie_{l}\) represents the novel part: here, we search not for
full matches, but only matches in the lowest \(l\) bits of the hash
values, and we store as output the \(\oplus\) of the pair (more on
this in a bit). A concrete example:</p>
<p>\(L_1 = \{\textrm{0xaabbcc}, \textrm{0x112804}, \textrm{0x1a1dee}
\ldots \}, \quad L_2 = \{\textrm{0x8799cc}, \textrm{0x54ea3a},
\textrm{0x76332f} \ldots \}\)</p>
<p>Here we're showing toy hash outputs of 3 bytes (instead of 32), written
in hexadecimal for ease of reading. We're going to use list lengths of
\(2^{l}\) (which will be justified later; we could have picked any
length). If \(l\) were 8 (and the lists length 256 therefore), then
we're searching for matches on the lowest 8 bits of the values, and we
have:</p>
<p>\(L_1 \bowtie_{l} L_2 = \{(\textrm{0xaabbcc} \oplus
\textrm{0x8799cc} = \textrm{0x2d2200}) \ldots \}\)</p>
<p>... plus any other matches if the lists are longer, so that the output
on doing the low-l-bit-join on these two lists of items, produced at
least this single item, which is the \(\oplus\) of the "partial
match", and perforce it will always have its lowest-\(l\) bits as zero
(because of the properties of \(\oplus\)).</p>
<p>Having done this first step for \(L_1 , L_2\) we then do exactly the
same for \(L_3 , L_4\) (remember - we took an original large random
list and split it into 4 (equal-sized) sub-lists).</p>
<p>That leaves us with two lists that'll look something like this:</p>
<p>\(L_1 \bowtie_{l} L_2 = \{\textrm{0x2d2200}, \textrm{0xab3100},
\textrm{0x50a200}, \ldots\}\)</p>
<p>... and the same for \(L_3 , L_4\). Wagner's idea is now to <strong>solve
the original birthday problem directly on this pair of lists</strong> - this is
the simple \(\bowtie\) operator - and he knows it will be easier
precisely because he has reduced the number of bits to be attacked (in
this case, by 8, from 24 to 16). To repeat, this <em>isn't</em> a way to solve
the original birthday problem (which we restated as \(a \oplus b = 0
\), but it <em>is</em> a way to solve the generalised problem of \(a \oplus
b \oplus c \oplus d = 0\).</p>
<p>To give concrete completeness to the above fictitious examples, we can
imagine:</p>
<p>\(L_3 \bowtie_{l} L_4 = \{\textrm{0x2da900}, \textrm{0x896f00},
\textrm{0x50a200}, \ldots\}\)</p>
<p>So we've found this one positive result of the join operation (ignoring
others from a longer list): \(\textrm{0x50a200}\). What can we deduce
from that?</p>
<h3>From partial solutions to an overall solution</h3>
<p>The reason the above steps make any sense in unison is because of these
key properties of the \(\oplus\) operation:</p>
<ul>
<li>Associativity: \(a \oplus (b \oplus c) = (a \oplus b) \oplus
c\)</li>
<li>\(a = b \Rightarrow a \oplus b = 0 \)</li>
<li>The above two imply: \( a \oplus b = c \oplus d \Rightarrow a
\oplus b \oplus c \oplus d = 0\)</li>
</ul>
<p>I hope it's clear that the third of the above is the reason why finding:</p>
<p>\((L_1 \bowtie_{l} L_2) \bowtie (L_3 \bowtie_{l} L_4)\)</p>
<p>... means exactly finding sets of 4 values matching \(a \oplus b
\oplus c \oplus d = 0\).</p>
<h2>Efficiency of the algorithm</h2>
<p>Here's why the above idea even matters: it means that finding such
multi-value matches can be <strong>much</strong> faster than finding pairwise
matches. Wagner goes through the reasoning as follows to give an
approximate feel for how much faster:</p>
<p>First, we can observe that it's likely that the efficiency of following
the above algorithm will depend on the value \(l\). Second, because
it's hard to get it in abstract, let's stick to our concrete toy example
where the hash function has only three bytes in the output (so 24 bits),
and \(l=8\).</p>
<p>The chance of a match on <em>any one pair</em> of elements from \(L_1 ,
L_2\) respectively is about \(2^{-l}\) (they have to match in
\(l\) bits and each bit is a coin flip); the number of possible
matches is \~ \(|L_1| \times |L_2|\). But given that we
arbitrarily chose the length of the lists as \(2^{8}\) - then we
expect the number of matches in \(L_1 \bowtie_{l} L_2\) to be
around \((2^{8} \times 2 ^{8} \times 2^{-8}) = 2 ^{8}\). At first it
may sound strange to say we expect so many matches but consider a
smaller example and it's obvious: if there are 10 possible values, and
we have <span style="text-decoration: underline;">two</span> lists of 10
items, then there are 100 possible matches and a probability 1/10 for
each one (roughly), so we again expect 10 matches.</p>
<p>To complete the analysis we only have to judge how many matches there
are likely to be between the output of \((L_1 \bowtie_{l} L_2)\)
and that of \((L_3 \bowtie_{l} L_4)\). As shown in our toy
example, all of those values have their lowest \(l\) bits zero; a full
solution of \(a \oplus b \oplus c \oplus d = 0\) will therefore be
obtained if the remaining bits of the \(\oplus\) of pairs of items
from the two lists are also zero (keep this deduction I just slid in
there, in mind! It will be crucial!); the probability of that for one
pair is clearly \(2^{-(n-l)}\) which in our toy case is
\(2^{-(24-8)}\), and since each of the lists is length
\(2^{l}=2^{8}\), we have finally that the expected number of solutions
from the whole process is around \(|L_{12}| \times |L_{34}|
\times 2^{-(24-8)} = 2^{8 + 8 - (24-8)} = 1\). This was not an
accident; we deliberately chose the lengths of the lists to make it so.
If we call this length \(2^{k}\), and generalise back to \(l\) bits
for the first partial match step, and \(n\) bits for the hash function
output, then we have an expected number of solutions of \(2^{2k}
\times 2^{-(n-l)}\). Clearly we have room for maneuver in what values
we choose here, but if we choose both \(l\) and \(k = f(l)\) so as
to make the expected number of matches around 1, then we can choose
\(k=l\) and \(l = \frac{n}{3}\), as the reader can easily verify.</p>
<p>Note that that choice \(l=n/3\) and \(k=l\) (or, in words, have the
4 sublists of length \(2^{l}\), and have \(l\) be one third of the
size of the hash output) is not arbitrary, in fact: because we are
trying to optimise our space and time usage. We discuss how this
generalises to more than 4 items in the next section, but for 4, this
means that we need space to store lists of \(\simeq
2^{\frac{n}{3}}\).</p>
<p>Compare this with the already-explained well-known scaling of the
original birthday problem: the time-space usage is of the order of
\(2^{\frac{n}{2}}\) for the same definition of \(n\). This
difference is big: consider, if a hash function had a 150 bit output
(let's forget that that's not a whole number of bytes!), then the
birthday problem is 'defended' by about 75 bits, whereas the 4-list
"generalised birthday problem" here is defended by only 50 bits (which
isn't a reasonable level of defence, at all, with modern hardware).</p>
<h3>Bigger \(k\)-sum problems and bigger trees.</h3>
<p>Clearly while the 4-sum problem illustrated above is already quite
powerful, it will be even more powerful if we can realise instances of
the problem statement with more lists. If we stick with powers of 2 for
simplicity, then, in the case of \(k=256\), we will be able to
construct a larger, complete binary tree with depth 8, combining pairs
of lists just as above and passing to the next level up the tree. At
each step, the number of bits matched increases until we search for full
matches (birthday) right at the top or root of the tree.</p>
<p><strong>This results in overall a time/space usage for these algorithms of
roughly \(O(2^{\frac{n}{log_{2}k+1}})\). So while for our earlier
\(k=4\) we had \(O(2^{\frac{n}{3}})\), for \(k=256\) we have
\(O(2^{\frac{n}{9}})\), i.e. the attack could be very powerful
indeed!</strong></p>
<p>If you're still a bit bewildered as how it might be possible to so
drastically reduce the difficulty of finding matches, just by
constructing a tree, note that it's part of a broader theme in much
mathematics: note what is sometime called the triangle inequality:</p>
<p>\(|a| + |b| \ge |a+b|\)</p>
<p>and in cases where a homomorphism applies, i.e. \(f(a+b) = f(a) +
f(b)\), it can sometimes be the case that the ability to shift from one
to the other - from "process each object individually" to "process the
combined object" allows one to collapse down the computational
difficulty of a problem. And that's what's happening here - the fact
that one can process <em>parts</em> of these objects individually - i.e., find
matches on <em>subsets</em> of the bits of the random numbers, and then combine
those linearly, gives a better outcome (performance wise) than if one
were to try to find total matches all at once.</p>
<p>This is just a very vague musing though; feel free to ignore it :)</p>
<h2>Generalising the algorithm</h2>
<p>First let's briefly mention the important but fairly simple point: you
can generalise from \(a \oplus b \oplus c \oplus d = 0\) to \(a
\oplus b \oplus c \oplus d = c\) for some non-zero \(c\); just
replace one of the lists, e.g. \(L_4\) with a corresponding list
where all terms are xor-ed with the value \(c\), so that the final
result of xor-ing the 4 terms found by the above algorithm will now be
\(c\) instead of zero.</p>
<p>Also let's note that we ended up finding solutions only from a small
set: those for which there was a match in the final \(l\) bits of
pairs of elements. This restriction can be changed from a match to an
offset in the bit values, but it's only of minor interest here.</p>
<p>A far more important question though, which we will expand upon in the
next section: can we generalise from groups with the
\(\oplus\)-operation to groups with addition? Solving, say:</p>
<p>\(a+b+c+d=0\ \textrm{mod}\ 2^{n}\)</p>
<p>(it's a little easier mod \(2^{n}\) than for arbitrary sized additive
groups, but that's a detail, explained in the paper).</p>
<p>The answer is yes, but it's worth taking a moment to consider why:</p>
<p>We need to slightly alter the algorithm to make it fit the properties of
addition: to replicate the property \(a \oplus b = 0\) we replace
\(b\) with \(-b\), and we do this in both the two "layers" of the
algorithm for the 4 list case (see paper for details). Now what's
crucial is that, in doing this, we preserve the property that <strong>a match
in the lowest \(l\) bits in the first step is retained after
combination in the second step</strong> (the way Wagner puts it is: "The reason
this works is that \(a \equiv b \ \textrm{mod} 2^{l}\) implies
\((a+c \ \textrm{mod}2^{n}) \equiv (b+c\ \textrm{mod}2^{n})\
(\textrm{mod}2^{l})\): the carry bit propagates in only one
direction."; in other words the match is not 'polluted' by the way in
which addition differs from xor, namely the carry of bits. This the
reader can, and probably should, verify for themselves with toy examples
of numbers written as bitstrings, using e.g. \(l=2, n=4\) or similar).</p>
<p>Because of the carry of bits (or digits) when we add, this isn't
perfectly obvious, but in the \(\oplus\) case it really is: what
makes the algorithm works is the preservation of a distinguishing
property after multiple applications of the operation, to reduce a large
set into a smaller one.</p>
<h3>Does it work for all groups?</h3>
<p>Since the above algorithm seems to be kind of generic, it's natural to
start wondering (and worrying!) that it may apply also to other
apparently hard collision problems. In particular, couldn't you do
something similar with elliptic curve points?</p>
<p>The main point of this blog post, apart from just trying to explain the
Wagner algorithm, was to answer this question in the negative. As we'll
see shortly, there is a concise academic argument that the answer
<em>should</em> be no, but I want to give some insight as to <em>why</em> it's no,
that is, why you cannot use this approach to find sets of scalars which,
when passed through the randomising function of elliptic curve scalar
multiplication to produce points on the curve, result in a sum to a
provided point, and thus solve the ECDLP.</p>
<h3>Wei Dai's argument</h3>
<p>Before we begin, an amusing piece of trivia: the long version of
Wagner's paper cites both Wei Dai and Adam Back, in a curious similarity
to ... another well known paper that came out 6 years later :)</p>
<p>What is cited as coming from private correspondence with Wei Dai is the
following logic, which superficially appears fairly trivial. But it's
nonetheless crucial. It's a <strong>reduction argument</strong> of the type we
discussed in some considerable detail in the last two blog posts (on
signatures):</p>
<blockquote>
<p>If the \(k\)-sum problem can be solved on any cyclic group \(G\)
in time \(t\), then the discrete logarithm problem on that group can
also be solved in time \(O(t)\).</p>
</blockquote>
<p>The words are carefully chosen here. Note that both \((\mathbb{Z}_n ,
+)\) and \((\mathbb{Z}_n , \times )\) are cyclic groups of order
\(n\). In the former, we have already explained that the \(k\)-sum
problem can be solved efficiently; so this is really only an important
statement about the multiplicative group, not the additive group.</p>
<p>And that makes sense, because the "discrete logarithm problem" (defined
in the broadest possible way) is only hard in the multiplicative group
(and even then, only if \(n\) has large/very large prime factors, or
ideally just is a prime) and not in the additive group. To illustrate:
take the group \(G = (\mathbb{Z}_{11} , +)\), and define a
'generator' element 3 (any element works as a generator if n is prime);
if I were to ask you for the 'discrete log' of 7 in this group, it would
really mean finding \(x \in G\) such that \(3x = 7\) which is
really just the problem of finding \(x = 7 \times 3^{-1} \
\textrm{mod} 11\), which is a trivial problem (see: the <a href="https://en.m.wikipedia.org/wiki/Extended_Euclidean_algorithm">Extended
Euclidean
Algorithm</a>),
even if you replace 11 with a very large prime. It's for this reason
that it would be a terribly naive error to try to do cryptography on an
additive group of integers; basically, division, being the additive
analog of logarithms for multiplication, is trivially easy.</p>
<p>But Wei Dai's argument goes a bit further than that concrete reasoning,
because he's saying the "if-then" (which can also be reversed, by the
way - see the paper, "Theorem 3") can be applied to any, arbitrary
groups - and that includes elliptic curve groups. If the DLP is hard in
that group, the \(k\)-sum problem can't be solved easily, and vice
versa. The argument is something like (we use \(\cdot\) specifically
to indicate <em>any</em> group operation):</p>
<p>If you can find a solution to:</p>
<p>\(x_1 \cdot x_2 \cdot \ldots x_k = y\)</p>
<p>..using an efficient \(k\)-sum problem algorithm applied to uniformly
randomly generated \(x_i\)s, and if the group's generator is written
as \(g\), and the dlog of \(y\) in this group is \(\theta\), i.e.
\(y=g^{\theta}\), then you can use that solution to find
\(\theta\):</p>
<p>\(w_1 + w_2 + \ldots w_k = \theta\)</p>
<p>Thus, we have, essentially, a <span
style="text-decoration: underline;">reduction of the discrete logarithm
problem to the k-sum problem</span>.</p>
<h3>But why doesn't the algorithm work for DLP hard groups?</h3>
<p>We've already seen the key point in "Generalising the algorithm" above,
so if you skipped the last part of that section, do read it!</p>
<p>To reiterate, notice that the main description of solving this problem
with groups using \(\oplus\) or just addition required finding
partial matches and then preserving the features of partial matches
through repeated operations. It's precisely this that does not work in a
multiplicative group.</p>
<p>Here's a concrete example of doing that, with an additive group of the
simplest type, where we are working modulo a power of 2, let's say
\(n=4\) and \(l=2\) so we are examining the lowest 2 bits, in
numbers of 4 bits (i.e. modulo 16):</p>
<p>Take \(a=17, \ b=41\) which are both 1 mod 4. Now we apply an offset
value \(c=9\) (can be anything). We find:</p>
<p>\((a+c)_{16} = 26_{16}=10,\quad (b+c)_{16}=50_{16} = 2\)</p>
<p>and both the answers (10 and 2) are 2 mod 4, which verifies the point:
equality in the lowest order bits can be preserved when adding members.
This is what allows Wagner's trick to work.</p>
<p>If we talk about multiplication, though, particularly in a group of
prime order, we find we don't get these properties preserved; in such a
group, multiplication has a strong <strong>scrambling effect</strong>. We'll take one
concrete example: \((\mathbb{Z}_{29}, \times)\). If I start with
any number and just keep multiplying by itself (this is basically how
'generators' work), we get this sequence:</p>
<p>\(3,9,27,23,11,4,12,7,21,5,15,16,19,28,26,20,2,6,18,25,17,22,8,24,14,13,10,1,3,\ldots
\)</p>
<p>(e.g. 4th element is 23 is because 27 times 3 mod 29 = 23).</p>
<p>The pattern repeats after 29 steps as expected; but within the sequence
we have an entirely random ordering. This is a direct consequence of the
fact that the number 3 and 29 have no common factors, there's nowhere
they can "line up".</p>
<p>To illustrate further, consider what happens with addition instead:
still working modulo 29, let's see what happens if we add a number to
itself repeatedly (note I chose 25 to be a slightly less obvious case -
but it's still obvious enough!):</p>
<p>\(25,21,17,13,9,5,1,26,22,18,14,10,6,2,27,23,19,15,11,7,3,28,24,20,16,12,8,4,0,
\ldots \)</p>
<p>Note that you're seeing it dropping by 4 each time because \(25 \equiv
-4\) in mod 29. There is always such a simple pattern in these
sequences in additive groups, and that's why division is trivial while
discrete logarithm is not.</p>
<p>So, as a consequence of this scrambling effect, we also find that
Wagner's observation about adding integers and then taking modulo
\(l\) no longer works, in multiplicative groups, at least in general.
Again, a concrete example using \((\mathbb{Z}_{29}, \times)\):</p>
<p>Let \(a=17,\ b=13\); both integers modulo 29. We'll, as before, check
the value modulo 4, both before and after adding an offset: they are
both 1 modulo 4. Let the offset we're going to apply to both, be 9. But
this time we're not going to <em>add</em> 9 but multiply it, because that is
the group operation now; we get:</p>
<p>\((17\times 9)_{29} = 153_{29} = 8_{29} \quad \rightarrow 0_{4}
\)</p>
<p>but:</p>
<p>\((13\times 9)_{29} = 117_{29} = 1_{29} \quad \rightarrow 1_{4}
\)</p>
<p>and, so unlike in the additive group case, we failed (at least for this
example, and this group - I haven't <span
style="text-decoration: underline;">proved</span> anything!) to preserve
the two low order bits (or the value mod 4, equivalently).</p>
<p>In summary, as far as the current state of mathematics goes, it is
believed that there is not a way to do such a property preservation
"through" multiplication - but specifically this statement only applies
in groups where the discrete log is <em>actually</em> hard.</p>
<p>All of the above cross-applies to elliptic curves: like in
multiplicative groups (certain of them), the DLP is hard because the
group operator is essentially a 'scrambler', so the preservation of
properties, that Wagner requires, doesn't work.</p>
<h2>Applications to real systems</h2>
<h3>The OR of sigma protocols.</h3>
<p>This is a topic that was covered in an earlier <a href="https://joinmarket.me/blog/blog/ring-signatures/">blog
post</a>,
so I will not give the outline here - but you'll need that context to
understand the following. But we see here a fascinating implication of
Wagner's idea to these protocols. Recall that the verification uses the
following equation:</p>
<p>\(e_1 \oplus e_2 \ldots \oplus e_k = e\)</p>
<p>... look familiar at all? This of course is <em>exactly</em> the \(k\)-sum
problem that Wagner attacks! Therefore a dishonest prover has a much
better chance of fooling a verifier (by providing a valid set of
\(e_i\)-s) than one might expect naively if one hadn't thought about
this algorithm. Fortunately, there is a huge caveat: <strong>this attack
cannot be carried out if the protocol has special soundness</strong>. Special
soundness is a technical term meaning that if an extractor can generate
two validating transcripts, it can extract the witness. In this case,
the Wagner algorithm could not be performed <em>without already knowing the
secret/witness </em>(details: the attack would be to generate huge lists of
transcripts \(R, e, s\) (notation as per previous blogs), where \(e,
s\) are varied, keeping \(R\) fixed - but that's exactly how an
extractor works) - so in that sense it wouldn't be an attack at all.
However, not all zero knowledge protocols do have the special soundness
property. So while this is very in the weeds and I am not able to
illustrate further, it is certainly an interesting observation, and the
discussion in the full version of the Wagner paper is worth a read.</p>
<h3>Musig</h3>
<p>Obviously Wagner did not discuss this one :) This will be a very high
level summary of the issue in the context of
<a href="https://eprint.iacr.org/2018/068">Musig</a>,
the newly proposed scheme for constructing multisignatures via
aggregated Schnorr signatures. Read the Musig paper for more detail.</p>
<p>Recall that the naive aggregation of Schnorr signatures is insecure in
the multisig context due to what can be loosely called "related key
attacks" or "key subtraction attacks":</p>
<p>\(P_1 = x_1 G\quad P_2 =x_2G\)</p>
<p>\(s_1 = k_1 + ex_1\ ,\ s_2 = k_2 + ex_2\quad
s_{\textrm{agg}} = k_1 + k_2 + e(x_1+x_2)\)</p>
<p>fails in the multisig context of user-generated keys due to attacker
choosing:</p>
<p>\(P_2 = P^{*}_2 - P_1\quad P^{*}_2 = x^{*}_2 G\)</p>
<p>and then the attacker is able to construct a valid signature without
knowledge of \(x_1\).</p>
<p>The paper explains that a naive fix for this problem <span
style="text-decoration: underline;">is actually susceptible to Wagner's
attack!</span></p>
<p>If you write each key as \(P^{*}_{i} = \mathbb{H}(P_i)P_i\), in
words, you (scalar) multiply each key by its hash, then you still know
the private key (just also multiply it by the same hash value), and you
might think you have removed the key subtraction attack, because an
attacker wants to create \(P_2\) such that it's the difference
between a key he knows and \(P_1\); but he can't know the hash value
before he computes it, so he will never be able to arrange for
\(\mathbb{H}(P_2)P_2\) to be a non-random value. This same logic is
seen in many places, e.g. in the fixing of public keys inside a basic
Schnorr signature challenge. But here, it's not enough, because there
are more degrees of freedom:</p>
<p>Suppose the attacker is all \(n-1\) keys \(P_i\) except for the
first, \(P_1\), which the honest victim provides. Then the attacker's
goal is to make signing work without the honest victim's participation.
Now the aggregate key in this naive form of Musig is:</p>
<p>\(P_{agg} = \sum\limits_{i=1}^{n} \mathbb{H}(P_i)P_i\)</p>
<p>So the attacker's goal is to find all the other keys as offsets to the
first key such that the first key is removed from the equation. He sets:</p>
<p>\(P_i = P_1 + y_iG \quad \forall i \in 2\ldots n\)</p>
<p>i.e the \(y_i\) values are just linear tweaks. Then let's see what
the aggregated key looks like in this naive version of Musig:</p>
<p>\(P_{agg} = \mathbb{H}(P_1)P_1 + \sum\limits_{i=2}^{n}
\mathbb{H}(P_1 + y_i G)(P_1 + y_i G) \)</p>
<p>\(P_{agg} = \mathbb{H}(P_1)P_1 + \sum\limits_{i=2}^{n}
\mathbb{H}(P_1 + y_i G)(P_1) + \sum\limits_{i=2}^{n}
\mathbb{H}(P_1 + y_i G)(y_i G)\)</p>
<p>Now, note that there are three terms and <strong>the last term is an
aggregated key which the attacker controls entirely</strong>. Consequently, if
the attacker can arrange for the first and second terms to cancel out,
he will succeed in signing without the victim's assent. Luckily that's
exactly an instance of Wagner's \(k\)-sum problem!:</p>
<p>\(\sum\limits_{i=2}^{n} \mathbb{H}(P_1 + y_i G) =
-\mathbb{H}(P_1) \)</p>
<p>Notice crucially that we've reduced this to an equation in <strong>integers</strong>
not elliptic curve points, as per the long discussions above about Wei
Dai's observation. This will be soluble, and it will be more soluble
(and more soluble than expected!) for arbitrarily chosen \(y_i\)-s,
as the value of \(n\) increases. The attack requires the attacker to
control some subset of keys (in this simple illustration, \(n-1\)
keys, but it can actually be fewer), but since the whole point is to
remove trust of other key-owners, this is certainly enough to reject
this construction.</p>
<p>The solution is nearly obvious, if unfortunately it makes the equation a
little more complicated: <strong>fix the entire keyset, not just your own key,
in the hash</strong> (notice an echo here to the discussion of ring signatures
in an earlier blog post). By doing so, you cannot separate out the
dependence in \(P_1\) and thus cancel it out. So replace
\(\mathbb{H}(P_1)P_1\) with \(\mathbb{H}(P_1, P_2, \ldots ,
P_n)P_1\). The authors of the musig construct tend to use the term
'delinearization' specifically to describe this.</p>
<h3>Other examples</h3>
<p>In fact, probably the most striking example of how Wagner's attack may
have implications for the security of real systems, is the attack he
describes against Schnorr blind signatures. But it is unfortunately also
the most complicated, so I will just briefly mention here that he shows
that a certain kind of such blind signatures can be forged given a
number \(k\) of parallel interactions with a signing oracle (which is
often a realised thing in systems that actually use blind signatures;
they are often used as kind of tokens/certificates), using the
corresponding \(k\)-sum problem.</p>
<p>He shows that certain specialised hash constructions (which may well be
outdated now, nearly 20 years later) have weaknesses exposed by this
kind of attack.</p>
<p>Curiously, he discusses the case of KCDSA, a Korean variant of DSA,
pointing out that it's possible to collide signatures (specifically the
\(s\) in an \(r, s\) pair), in the sense of having two different
messages with the same signature. A similar concept w.r.t. ECDSA can be
found in <a href="https://link.springer.com/content/pdf/10.1007%2F3-540-45708-9_7.pdf">this
paper</a>
- there it exploits a simple symmetry of the algorithm, but requires
that the public/private key pair be created as part of the 'stunt'.
Wagner on the other hand shows his algorithm can be used to find
"collisions" of this type in the KCDSA algorithm, but without the
restriction of having to create a key pair specially for the purpose
(i.e. it works for an existing key).</p>
<p>Several other possible applications are listed in the long version of
the paper.</p>Multiparty S62019-04-15T00:00:00+02:002019-04-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2019-04-15:/blog/blog/multiparty-s6/<p>multiparty symmetrical Schnorr signature scriptless script shuffle</p><h3>Multiparty S6</h3>
<h2>The multiparty symmetrical Schnorr signature scriptless script shuffle</h2>
<p>This blog is in the category of "a new-ish idea about privacy tech";
like similar previous ones (e.g.:
<a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/coinjoinxt/">CoinJoinXT</a>)
it is little more than an idea, in this case I believe it is correct,
but (a) I could be wrong and there could be a flaw in the thinking and
(b) it's not entirely clear how practically realistic it will be. What I
do hope, however, is that the kernel of this idea is useful, perhaps in
Layer 2 tech or in something I haven't even thought about.</p>
<h2>The Goal</h2>
<p>As with similar writeups, I feel it's important that the reader has some
idea what the goal is. Here is the goal I <em>mostly</em> had in mind when
thinking this through:</p>
<ul>
<li>11 (or 6, or 24...) anonymous users coordinate (on a lobby server,
with a Joinmarket-style incentive, on a p2p network somehow -
whatever). They each have 1 BTC utxos (put off denomination
questions for later) and they want a very meaningful privacy
increase.</li>
<li>Instead of doing a CoinJoin which is obvious or a whole set of
CoinSwaps (see
<a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/coinswaps/">earlier</a>
<a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">posts</a>)
which could get complicated for 11 people, they want to kind of
"permute" or "shuffle" all their utxos.</li>
<li>It's a year or two from now and a Schnorr soft fork has gone through
in Bitcoin mainchain; they're going to use the scriptless script
primitive (see
<a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">here</a>
or Poelstra and Nick's writeup
<a href="https://github.com/apoelstra/scriptless-scripts/blob/master/md/atomic-swap.md">here</a>,
or the following sections for more on this), to achieve the goal via
multisig outputs that look like other outputs.</li>
<li>They do effectively a "multiparty swap" or "shuffle" to achieve this
goal. Each of the 11 participants funds a single prepared
destination address, which is (though not seen because Schnorr) an
11 of 11 multisig. Before they do so, they get hold of a presigned
(by everyone) backout transaction to get their coins back if
something goes wrong.</li>
<li>They decide a shuffle/permutation: e.g. Alice is paying Bob 1,
Charlie is paying Edward etc etc. ... we're talking here about a
member of the set of permutations of 11 objects. Obviously the idea
is that everyone pays in 1, everyone gets back 1. They prepare
transactions for these payments.</li>
<li>Once everything is set up they pass around <strong>adaptor signatures</strong>
which create the atomicity effect we want - when any 1 of the 11
transactions goes through, all of them will go through.</li>
<li>In a certain order, that we'll discuss, they can now pass real
(Schnorr) signatures (note that even though "real" they are still
"partial" signatures - they're 1 of 11 needed in the multisig) on
the transactions such that one member of the group has a full set
and can broadcast the transaction paying themselves. Everyone else
sees this on the blockchain, and combining the signatures in this
published transaction, with the earlier adaptor signatures, has
enough information to broadcast the other transaction which pays
themself.</li>
</ul>
<p>Let's consider the advantages of doing this:</p>
<ul>
<li>Shared with a 2 of 2 CoinSwap: there is no linkage on the blockchain
between the 11 transactions. Effectively, Alice has swapped her coin
history with Bob, Charlie with Edward etc..</li>
<li>Big difference from the above: we can create, like a multiparty
coinjoin, the highly desirable scenario that <span
style="text-decoration: underline;">individual participants do not
know the linkages</span> between inputs for transactions other than
their own. As we know, there are various designs of CoinJoin
metaprotocols that allow this to different extents, but if CoinSwap
is restricted to 2 of 2 this is impossible (no cryptographic
trickery prevents the deduction "if it's not mine, it's yours!").</li>
<li>Biggest difference from CoinJoin is that CoinSwap transactions
(whether 2-2 or 11-11) can look like ordinary payments on the
blockchain, although there's meaningful wiggle room in how exactly
they will look. If we manage to combine this with even slight
variations in size of individual payments, and probably a little
timing de-correlation too, the task of a blockchain analyst in
identifying these is near impossible (<strong>notice that this hugely
desirable steganographic feature is shared with PayJoin and
CoinJoinXT, previous blog posts</strong> - notice though, that it depends
on Schnorr for indistinguishable multisig <em>and</em> for adaptor
signatures, unless ECDSA-n-party computation is a thing, which I
doubt is currently a thing for more than 2 parties, but see e.g.
<a href="https://eprint.iacr.org/2018/987.pdf">this</a>
for recent research in this area.).</li>
</ul>
<h3>Illustrations comparing CoinJoin, CoinSwap and multiparty S6:</h3>
<p><img src="../../../images/screenshot_from_2019-01-18_15-00-33-813x436.png" width="813" height="436" alt="Typical coinjoin" /></p>
<p><em>Typical CoinJoin transaction - it's very obvious because of equal
output amounts; the histories of coins are not disconnected, but fused</em></p>
<p><em><img src="../../../images/realcoinswap2-792x65.png" width="792" height="65" /></em></p>
<p><em><img src="../../../images/realcoinswap1-799x67.png" width="799" height="67" /></em></p>
<p><em>Typical 2-party CoinSwap transactions; they are entirely separate on
the blockchain, with different timestamps they could be extremely
difficult to find.</em></p>
<p>*<img src="../../../images/s6basic-1053x745.png" width="1053" height="745" />
*</p>
<p><em>A </em><em>very</em><em> simplified multiparty S6 as envisaged: note that Oscar to
Peter shows on a diagonal a simple transaction of the type used in
CoinSwap; in fact there is one such transaction for every red arrow;
i.e. each red arrow represents a payment from one of the group of 11 to
another, in a random permutation. All of these transactions will be
atomic; either they will all happen or none will. But none will be
linked on the blockchain.</em></p>
<h2>Schnorr and adaptor signatures</h2>
<p>Achieving the goals above is crucially dependent on the concept of an
adaptor signature as developed by Andrew Poelstra (see some detailed
descriptions as mentioned
<a href="https://github.com/apoelstra/scriptless-scripts/blob/master/md/atomic-swap.md">here</a>)
in his work on "scriptless scripts". A large part of the <a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">earlier blog
post</a>
on the topic of the scriptless script based swap, was explaining this
concept. I want to write an explanation which is easier to understand. I
will try :)</p>
<p>A basic Schnorr signature on a message \(m\) using a public key
\(P\) whose private key is \(x\), looks like this:</p>
<p>\(\sigma = k + \mathbb{H}(P||R||m) x \quad, R = kG \quad
\textrm{Publish: }\ (R,\sigma)\)</p>
<p>\(k\) is called the nonce, and \(R\) is the nonce point (point on
the curve corresponding). We shorten the hash function
\(\mathbb{H}(\ldots)\) to just \(e\), often.</p>
<p>Schnorr signatures are linear in the keys, in that:</p>
<p>\(\sigma_1 + \sigma_2 = (k_1 + k_2) + e (x_1+x_2)\)</p>
<p><strong>Combining signatures in this way is unsafe in many contexts, in
particular multisignature in Bitcoin. See the <a href="https://eprint.iacr.org/2018/068">paper on
Musig</a>
and <a href="https://github.com/ElementsProject/secp256k1-zkp/blob/secp256k1-zkp/src/modules/musig/musig.md">this
summary</a>
for the details on how the weakness (basically, potential of key
subtraction) is addressed in detail, using interactivity between the
parties cooperating to create the agreggated Schnorr signature. <span
style="text-decoration: underline;">As long as this is properly
addressed, though, the linearity property is retained</span>.
</strong></p>
<p><strong>Let me emphasise that the rest of this post will ignore the correct
construction of keys and nonce points for safe Schnorr multisig; we will
just talk about Alice, Bob and Charlie adding keys together and adding
signatures together; the difference is crucial in practice but I believe
does not alter any of the concepts being outlined.</strong></p>
<h3>Partial Signatures</h3>
<p>With the above bolded caveats in mind, it'll be important for the
following to understand the idea of a "partial signature" in a Schnorr
multisig context. What we're doing is to create a single signature
\(\sigma\) which represents, say, a 2 of 2 multisignature. Say it's
Alice and Bob (A, B). Then Alice would produce this <strong>partial
signature</strong>:</p>
<p>\(\sigma_A = k_A + \mathbb{H}(P_A + P_B || R_A + R_B || m)
x_A\)</p>
<p>Notice how it's not a valid signature according to the Schnorr
definition because the nonce \(k_A\) does not correspond to the nonce
point \(R_A + R_B\) <em>and</em> because the private key does not
correspond to the public key \(P_A+P_B\).</p>
<p>However when Bob adds his partial signature:</p>
<p>\(\sigma_B = k_B + \mathbb{H}(P_A + P_B || R_A + R_B || m)
x_B\)</p>
<p>... to Alice's, the sum of the two <em>is</em> a valid signature on the
message, with the sum of the keys.</p>
<p>We will make use of this shortly.</p>
<h3>Adaptor Signatures</h3>
<p>A creator of a signature can hide a verifiable secret value in a
signature, using simple addition. They can then pass across the
signature <em>without</em> the secret value, making it not valid, but
verifiable as "a signature with the secret not included". This is what
Poelstra means by his concept "adaptor signature". It looks like this:</p>
<p>\(\sigma' = k + \mathbb{H}(P||R+T||m) x \quad R=kG,\ T=tG \)</p>
<p>(from now on, note that ' indicates adaptor signatures). To repeat, it's
not a valid signature, but: <em>it can be verified that adding the discrete
log of \(T\)</em> <em>to \(\sigma'\) will yield a valid signature on the
message m and the public key \(P\)</em><strong><em>.</em></strong></p>
<p>Refer back to the <a href="https://web.archive.org/web/20200429002041/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">earlier blog
post</a>
if you want to check the mathematical details on that.</p>
<p>The alert reader will notice how similar the "adaptor signature" and
"partial signature" concepts are - it's almost the same mathematical
change, but with a very different purpose/application, as we expand on
below:</p>
<h3>Atomicity for two parties</h3>
<p>This trick is already cool - if I ever pass you the secret value
\(t\), you'll be able to form a full valid signature. But with an
additional nuance it's possible to make this a two-way promise, so we
have the outline of an atomicity property, which can be very powerful.
If the contents of the thing being signed, including the nonce points
\(R\) and the public keys \(P\) are fixed in advance, we could
create a situation where the promise works both ways:</p>
<p>\(P, R, m\) are known \(\therefore \sigma = k + t +
\mathbb{H}(P||R+T||m) x\) is fixed. If the adaptor is shared by
the owner of the private key \(x\), i.e. if he passes to a
counterparty the value \(\sigma' = k + \mathbb{H}(P||R+T||m)
x\), then either direction of publication reveals the other:</p>
<ul>
<li>If the full signature \(\sigma\) is revealed, the secret is
revealed as \(t = \sigma - \sigma'\)</li>
<li>If the secret \(t\) is revealed, the full signature is revealed:
\(\sigma = t + \sigma'\)</li>
</ul>
<p>This atomicity is the basic of the scriptless script atomic swap
published by Poelstra and explained in my earlier post.</p>
<h3>Atomicity for N parties</h3>
<p><strong>This is the novel idea in this post.</strong></p>
<p>Suppose we have fixed \(\Sigma P\), \(\Sigma R\) and a single
message \(m\). In other words several participants signing together
the same message (Bitcoin transaction).This is the scenario for Schnorr
aggregated multisig, modulo the complexity of Musig which as explained
above I'm deliberately ignoring. Without adaptors, each party will have
to produce a <strong>partial signature </strong> as already described, and then they
can all be added together to create a fully valid Schnorr signature.</p>
<p>Now suppose each of 3 parties (Alice, Bob, Charlie) makes an adaptor
signature for their partial signature:</p>
<p>\(\sigma_A' = k_A + \mathbb{H}(\Sigma P || \Sigma R + \Sigma T
|| m) x_A\)</p>
<p>Little explanatory note on this: each party will have to share their
public \(T\) values (which, remember are curve points corresponding to
the <em>adaptor secrets</em> \(t\)), so they will all know how to correctly
calculate the hash preimage by "combining" (here just adding, but with
musig it's more complicated) their public keys, and then linearly adding
in all their \(T\) public values to the corresponding \(R\) nonce
points as for a normal Schnorr signature.</p>
<p>Similarly for Bob, Charlie:</p>
<p>\(\sigma_B' = k_B + \mathbb{H}(\Sigma P || \Sigma R + \Sigma T
|| m) x_B\)</p>
<p>\(\sigma_C' = k_C + \mathbb{H}(\Sigma P || \Sigma R + \Sigma T
|| m) x_C\)</p>
<p>These can then be shared (all with all) and are verifiable in the same
way as previously, e.g.:</p>
<p>\(\sigma_A' G \stackrel{?}{=} R_A + \mathbb{H}(\Sigma P ||
\Sigma R + \Sigma T ||m) P_A \)</p>
<p>But, it seems to get a bit confusing when you ask what happens if one
party reveals either a full \(\sigma\) value, or a secret \(t\).</p>
<p>For example, what if Alice reveals her full partial signature (yes I
meant that!) \(\sigma_A =k_A + t_A + \mathbb{H}(...) x_A\)?</p>
<p>One partial signature on its own is not enough for Bob or Charlie to do
anything. If Alice <em>and</em> Bob do this, and pass these partials to
Charlie, then he can complete and publish. But we want atomicity. What
we want is:</p>
<ul>
<li>If the complete transaction signature is ever published, all parties
can learn the adaptor secrets \(t\).</li>
<li>If any or all parties learn the adaptor secrets \(t\) they can
publish the complete transaction.</li>
</ul>
<p>It's clear why that's the desire: that would mean you could make
multiple different transactions, sharing the same set of adaptor
secrets, and have it so that <span
style="text-decoration: underline;">if one transaction gets published
all the others do!</span></p>
<p><strong>But wait!</strong> Something was horribly obfuscated in those bullet points.
"learn the adaptor secrets"? All of them, or which ones?<strong><span
style="text-decoration: underline;"></span></strong></p>
<p>That this is crucial is easily seen by considering the following
"attack":</p>
<p>Suppose Alice, Bob, Charlie make three transactions, each of which pays
out of a 3-of-3 Schnorr multisig, between them. The idea would be (as
you've probably gathered from the build-up) that if any 1 transaction,
say the first one paying Bob, gets broadcast, then both Alice and
Charlie could broadcast the other 2 transactions, paying each of them,
because they "learnt the adaptor secrets". But: if say Alice kicks off,
reveals her adaptor secret \(t_A\), then couldn't Bob and Charlie
collude? They could take the partial signature of Alice:</p>
<p>\(\sigma_A = k_A + t_A + \mathbb{H}(\ldots)x_A\)</p>
<p>and then between themselves share and construct their "joint" partial
signature:</p>
<p>\(\sigma_{BC} = k_B + k_C + t_B + t_C
+\mathbb{H}(\ldots)(x_B+x_C)\)</p>
<p>then add this to \(\sigma_A\). They could do this for the two
transactions paying <em>them</em> and publish them to the blockchain. It may
seem at first glance that this is a problem, because in doing so they
haven't revealed their <em>individual</em> adaptor secrets \(t_B, t_C\) but
have instead revealed their sum \(t_B+t_C\).</p>
<p>However this is not a problem! One way of looking at it is <strong>adaptor
signatures are just as linear as proper Schnorr signatures</strong>. They are
thus aggregatable. From Alice's point of view, although she is taking
part in a 3-of-3 Schnorr multisig, she may just as well be participating
in a 2-of-2 with a single party, if Bob and Charlie choose to collude
and combine in that way. What will Alice see on the blockchain?</p>
<p>\(\sigma = k_A + k_B + k_C + t_A + t_B + t_C +
\mathbb{H}(\ldots)(x_A+x_B+x_C)\)</p>
<p>But she already got adaptor signatures from Bob and Charlie, so she can
remove them:</p>
<p>\(\sigma - \sigma_A - \sigma_B' - \sigma_C' = t_B + t_C\).</p>
<p>Now possessing the value of the <em>sum</em> \(t_B+t_C\), she can add this
to pre-existing adaptor signatures for the transaction paying <em>her</em> and
get the complete multisignature on those!</p>
<p>Unfairly linear signatures for the win!</p>
<h2>Protocol Outline</h2>
<p>We've now covered the meat of the concepts; so this instantiation phase
will either be easy to follow, or, if you found the above slightly
opaque, will hopefully make it clearer by being more concrete.</p>
<p>We're now ready to outline this "multiparty SSSSSS" design :) We'll
break it into three phases.</p>
<p>Because 11 party would be too tedious, we'll stick to Alice, Bob,
Charlie three party case as above.</p>
<p>Getting clear on notation: \(D_x\) will be destination addresses,
transactions will be \(\tau_x\), adaptor secrets will be \(t_x\),
corresponding curve points: \(T_x\). Signatures will be
\(\sigma_x\) and adaptor signatures will be marked with a ', so
\(\sigma_{x}'\). The subscripts will almost always be one of \(A,
B, C\) for Alice, Bob, Charlie.</p>
<h2>Phase 1 - Setup</h2>
<p>We first negotiate three destination addresses: \(D_A, D_B, D_C\).
Here the subscripts denote the payer <strong>into</strong> the address. So after the
end of the setup the first will contain a coin paid by Alice, the second
by Bob and the third by Charlie. The preparation of these addresses/keys
will of course be done with Musig but to reiterate, we are ignoring the
complexity there.</p>
<p>The three parties then all provide signatures on backout transactions
such that each party gets their money back after a timeout. See the
section "Timing Controls" for more details on this.</p>
<p>Once backouts are presigned, all parties pay into the destinations as
above and wait for confirms.</p>
<p>Parties will agree in advance on the "shuffle"/permutation of coins,
i.e. who will be paying who; this is related to Timing Control, so
again, see that section. The exact negotiation protocol to decide the
permutation is left open here. Once agreed, we know that we are going to
be arranging three transactions paying out of \(D_A, D_B, D_C\),
we'll call these \(\tau_{AB}, \tau_{BC}, \tau_{CA}\)
respectively, where the second subscript indicates who receives the
coin.</p>
<h2>Phase 2 - Adaptors</h2>
<p>Each participant chooses randomly their adaptor secret \(t_A, t_B,
t_C\) and then shares \(T_A, T_B, T_C\) with all others
(<em>technical note: this might need to happen in setup phase</em>). They then
also <strong>all</strong> provide adaptor signatures on <strong>all</strong> of three transactions
\(\tau_{AB}, \tau_{BC}, \tau_{CA}\) to each other. Note that
there is no risk in doing so; the adaptor signatures are useless without
receiving the adaptor secrets. Now each party must make two checks on
each received adaptor signature:</p>
<ul>
<li>That it correctly matches the intended transaction \(\tau\) and
the set of agreed keys in the setup</li>
<li>Crucially that <em>each</em> of the adaptor signatures from any
counterparty correctly matches the same adaptor secret point
\(T\).</li>
</ul>
<p>("Crucial" of course because without using the same adaptor secrets, we
don't get our desired atomicity across a set of transactions).</p>
<p>To be more concrete, here are the actions of Bob:</p>
<ol>
<li>Generate a single adaptor secret randomly: \(t_B
\stackrel{$}{\leftarrow} \mathbb{Z}_N\)</li>
<li>Broadcast \(T_B = t_B G\) to Alice, Charlie</li>
<li>Having agreed on all three payout transactions \(\tau_{AB},
\tau_{BC}, \tau_{CA}\), generate three adaptor signatures:
\(\sigma_{B\tau_{AB}}' = k_{B\tau_{AB}} +
\mathbb{H}(\Sigma P_{AB} || \Sigma R_{AB} + \Sigma T ||
m_{\tau_{AB}}) x_{B\tau_{AB}}\), \(\sigma_{B\tau_{BC}}'
= k_{B\tau_{BC}} + \mathbb{H}(\Sigma P_{BC} || \Sigma
R_{BC} + \Sigma T || m_{\tau_{BC}}) x_{B\tau_{BC}}\),
\(\sigma_{B\tau_{CA}}' = k_{B\tau_{CA}} +
\mathbb{H}(\Sigma P_{CA} || \Sigma R_{CA} + \Sigma T ||
m_{\tau_{CA}}) x_{B\tau_{CA}}\).</li>
<li>Broadcast these adaptors to Alice, Charlie.</li>
<li>Receive the 2 x 3 = 6 corresponding adaptors from Alice and Charlie.
Verify each one (note the above bullet points).</li>
</ol>
<p>Assuming all parties accept the adaptor signatures, we are ready to
proceed to the last phase. If any communication or protocol failure
occurs, all parties must fall back to the backout transactions presigned
in the Setup phase.</p>
<p>(<em>Technical note: it is not necessary for all parties to share all
adaptors, in general, but for simplicity we use that model, since it
hurts nothing I believe)</em>.</p>
<h2>Phase 3 - Execution</h2>
<p>The order of events in this final execution phase is important for
safety, but we defer that to the next section "Timing Controls". Here
we'll just show how events will proceed if everything goes correctly, in
a randomly chosen order.</p>
<ul>
<li>Alice and Charlie send full partial(! i.e. not adaptor) signatures
on \(\tau_{AB}\) to Bob, i.e. the transaction that pays Bob. So
they send \(\sigma_{A\tau_{AB}}\) and
\(\sigma_{C\tau_{AB}}\), respectively.</li>
<li>Bob can add this to his own full partial signature on the same
transaction, constructing: \(\sigma_{\tau_{AB}}\) and using
this to broadcast \(\tau_{AB}\) to the network, receiving his
coin.</li>
<li>Alice will read \(\sigma_{C\tau_{AB}} + \sigma_{B\tau_{AB}}
= \sigma_{\tau_{AB}} - \sigma_{A\tau_{AB}}\) from this
broadcast signature and from this deduce the value of \(t_B+t_C =
\sigma_{C\tau_{AB}} + \sigma_{B\tau_{AB}} -
\sigma_{C\tau_{AB}}' + \sigma_{B\tau_{AB}}'\).</li>
<li>Alice can add this aggregated adaptor secret \(t_B+t_C\) to the
pre-existing adaptors \(\sigma_{B\tau_{CA}}' +
\sigma_{C\tau_{CA}}'\) to get \(\sigma_{B\tau_{CA}} +
\sigma_{C\tau_{CA}}\), which she can then add to
\(\sigma_{A\tau_{CA}}\) to get a fully valid
\(\sigma_{\tau_{CA}}\) and broadcast this to the network to
receive her 1 coin.</li>
<li>Charlie can do exactly the same as Alice for the last transaction,
\(\tau_{BC}\) and receive his coin.</li>
</ul>
<p>Thus both other parties, after the first spend, were able to claim their
coin by creating complete signatures through combining the adaptor
signatures with the revealed (possibly aggregated) adaptor secrets.
(<em>Technical note: in a protocol we can allow participants to share
adaptor secrets at the appropriate times instead of having it deduced
from transaction broadcasts, as in the case of CoinSwap, just as a kind
of politeness, but this is not important</em>).</p>
<h2>Timing Controls</h2>
<p>In the 2 party scriptless script swap, as in earlier CoinSwap designs,
we simply account for the asymmetry of timing of revealing priviliged
information (e.g. signatures) using an asymmetry of timelocks. The one
who transfers something valuable first (a signature) must have an
earlier ability to refunds coins that are "stuck" due to protocol
non-completion, else the possessor of the adaptor secret / CoinSwap
secret, who does not reveal it first, may wait for the window where he
can reclaim and the other cannot, to both reclaim and use the secret to
steal the other's coins.</p>
<p>Here we must follow a similar principle, just extended to multiple
parties.</p>
<p>Suppose, naively, we just used the same locktime on each of the three
refund transactions.</p>
<p>Now suppose Alice, at the start of Phase 3, reveals her full signature
first, on transaction \(\tau_{AB}\) which pays Bob. And suppose for
maximal pessimism that Bob and Charlie are colluding to defraud Alice.
They will simply wait until the moment of the timeout and attempt to
cheat Alice: try to broadcast both of their own refunds, while spending
the transaction for which Alice provided the full signature (having done
so, she has revealed her adaptor secret to the other two).</p>
<p>Thus by instead making Alice's backout locktime the earliest, she is
safe in transferring her full signature, and thus her adaptor secret
first. In this case if Bob and Charlie collude, they can do no better
than publish this spend before that (earliest) timeout, and in so doing,
reveal the aggregate of their adaptor secrets atomically so Alice can
claim her money well before their backouts become active, as intended by
system design.</p>
<p>Now let's consider the second sender of a full signature, say it's Bob.
Suppose we let Charlie's locktime be identical to Bob's. And for maximal
pessimism let's say Alice and Charlie collude. Here, Charlie could
refuse to pass his signature to Bob and attempt to reclaim his coin at
the exact moment of the timeout, while spending Bob's (depending on the
exact permutation of spends, but at least, it's possible). Even though
Alice didn't back out at her timeout in this scenario, which is weird,
clearly this scenario is not safe for Bob, he has passed across a
signature to Charlie with no time based defence against him.</p>
<p>These considerations make it obvious, I think, that the obviously sound
way to do it is to stagger the locktime values according to the order in
which signatures, and therefore secrets, are revealed. If the order of
signature transfers is: first Alice, then Bob, then Charlie, then the
locktimes on the backouts which pay each must obey \(L_A < L_B
< L_C\).</p>
<p>So I believe this ordering must be settled on in Phase 1 (because we
define these locktimes before signing the backout transactions).</p>
<h2>Generalisation from 3-3 to N-N.</h2>
<p>I believe this is trivial, modulo practicality.</p>
<h2>Practical considerations, advantages</h2>
<p>The scenario described is a multiparty coinswap in which essentially a
group of \(N\) parties could randomly shuffle history of their coins.
This could be done with or without a coordinator (either Joinmarket
style or server style), could possibly be done with a Coinshuffle++ type
coordination mechanism and/or blinding.</p>
<p>Practicality: the biggest limitation is that of CoinSwap generally, but
extended further: using staggered locktime backouts means that in cases
of failure, participants may have to wait a long time for coin recovery.
This gets linearly worse with anonymity set, which is not good. Would
love to find a trick to avoid that.</p>
<p>Expanding further on that same limitation, a larger number of
participants makes worse the problem that I've previously called "XBI",
or "cross block interactivity". There's a lot of exposure to DOS attacks
and simple network failure when participants not only have to
coordinate, but have to do so more than once. This could partially be
addressed with incentives, e.g. fidelity bonds, but I'm not convinced.</p>
<p>On the positive side, there could be a tremendous boon over the 2-party
case in that it's possible here to have a group of anonymous
participants shuffle the history of their coins without any of the
parties knowing the others' linkages.</p>
<p>Also positively, such larger group swaps may offer much larger privacy
improvements in a very economical way (a few hundred bytes on chain per
participant vs tens or hundreds of kilobytes via coinjoin? complete
finger in the air here of course).</p>
<p>Leaving open: can amount decorrelation be achieved in a more powerful
way in this model? I believe so, for example by splitting amounts into
subsets across subsets of participants in interesting ways. Fees can
also be used for noise. I think the most powerful version of this model
would be very powerful indeed, but needs more detailed analysis, and
this blog post is already too long.</p>
<p>Other applications: also leaving this open. Perhaps using adaptor
signatures in groups like this (exploiting the linearity of adaptor
signatures) has applications to second layer tech like Lightning or
similar contracting.</p>Ring Signatures2019-02-28T00:00:00+01:002019-02-28T00:00:00+01:00Adam Gibsontag:joinmarket.me,2019-02-28:/blog/blog/ring-signatures/<p>construction of several different ring signatures relevant to Bitcoin.</p><h3>Ring signatures</h3>
<h2>Outline:</h2>
<ul>
<li>Basic goal of 1-of-\(N\) ring signatures</li>
<li>Recap: the \(\Sigma\)-protocol</li>
<li>OR of \(\Sigma\)-protocols, CDS 1994</li>
<li>Abe-Ohkubo-Suzuki (AOS) 2002 (broken version)</li>
<li>Security weaknesses</li>
<li>Key prefixing</li>
<li>Borromean, Maxwell-Poelstra 2015</li>
<li>Linkability and exculpability</li>
<li>AND of \(\Sigma\)-protocols, DLEQ</li>
<li>Liu-Wei-Wong 2004</li>
<li>Security arguments for the LWW LSAG</li>
<li>Back 2015; compression, single-use</li>
<li>Fujisaki-Suzuki 2007 and Cryptonote 2014</li>
<li>Monero MLSAG</li>
</ul>
<h2>Basic goal of 1-of-\(N\) ring signatures</h2>
<p>The idea of a <a href="https://en.wikipedia.org/wiki/Ring_signature">ring
signature</a>
(the term itself is a bit sloppy in context, but let's stick with it
for now) is simple enough:</p>
<p>An owner of a particular private key \(x\) signs a message \(m\) by
taking, usually without setup or interaction, a whole set of public
keys, one of which is his (\(P=xG\)), and forms a signature (exact
form unspecified) such that there is proof that <strong>at least one</strong> of the
private keys is known to the signer, but which one was responsible for
the signature is not known by the verifier, and not calculatable.</p>
<p>Obviously that's pretty vague but captures the central idea. We often
use the term "ring" because the construction must have some symmetry
over the entire set of \(n\) public keys, and a ring/circle represents
symmetry of an arbitrarily high order (limit of an \(n\)-gon). Less
abstractly it could be a good name because of some "loop"-ing aspect
of the algorithm that constructs the signature, as we'll see.</p>
<p>What properties do we want then, in summation?</p>
<ul>
<li>Unforgeability</li>
<li>Signer ambiguity</li>
</ul>
<p>We may want additional properties for some ring signatures, as we'll
see.</p>
<p>In the following sections I want to cover some of the key conceptual
steps to the kinds of ring signatures currently used in cryptocurrency
protocols; most notably Monero, but also several others; and also in the
Confidential Transactions construction (see: Borromean ring signatures,
briefly discussed here). I will also discuss security of such
constructions, in much less detail than the <a href="https://web.archive.org/web/20200713230948/https://joinmarket.me/blog/blog/liars-cheats-scammers-and-the-schnorr-signature/">previous
blog</a>
(on the security of Schnorr signatures), but showing how there are
several tricky issues to be dealt with, here.</p>
<h2>Recap: the \(\Sigma\)-protocol</h2>
<p>We consider a prover \(\mathbb{P}\) and a verifier \(\mathbb{V}\).</p>
<p>A \(\Sigma\)-protocol is a three step game, in which the prover
convinces the verifier of something (it can be \(\mathbb{P}\)'s
knowledge of a secret, but it can also be something more complicated),
in zero knowledge. Readers interested in a much more detailed discussion
of the logic behind this and several applications of the idea can read
Sections 3 and 4 of my <a href="https://github.com/AdamISZ/from0k2bp">From Zero (Knowledge) to
Bulletproofs</a>
writeup, especially section 4.1.</p>
<p>In brief, the three step game is:</p>
<p>\(\mathbb{P} \rightarrow \mathbb{V}\): <strong>commitment</strong></p>
<p>\(\mathbb{V} \rightarrow \mathbb{P}\): <strong>challenge</strong></p>
<p>\(\mathbb{P} \rightarrow \mathbb{V}\): <strong>response</strong></p>
<p>A few minor notes on this: obviously the game is not literally over with
the response step; the verifier will examine the response to establish
whether it is valid or invalid.</p>
<p>The <strong>commitment</strong> will usually in this document be written \(R\) and
will here always be a point on an elliptic curve, which the prover may
(or may not! in these protocols) know the corresponding scalar multiple
(private key or nonce) \(k\) such that \(R=kG\).</p>
<p>The <strong>challenge</strong> will usually be written \(e\) and will usually be
formed as the hash of some transcript of data; the subtleties around
exactly <em>what</em> is hashed can be vitally important, as we'll see. (This
is in the "Fiat-Shamir transform" case; we discussed the pure
interactive challenge case a bit in the previous blog and many other
places!)</p>
<p>The <strong>response</strong> will usually be a single scalar which will usually be
denoted \(s\).</p>
<p>We will be playing with this structure a lot: forging transcripts \(R,
e, s\); running multiple instances of a \(\Sigma\)-protocol in
parallel and performing logical operations on them. All of this will
play out <em>mostly</em> in the form of a Schnorr signature; again, refer to
previous blog posts or elementary explanations (including those written
by me) for more on that.</p>
<h2>OR of \(\Sigma\)-protocols, CDS 1994</h2>
<p>Let's start with the OR of \(\Sigma\)-protocols. I <em>believe</em> this
solution is due to <a href="https://link.springer.com/content/pdf/10.1007%2F3-540-48658-5_19.pdf">Cramer, Damgård and Schoenmakers
'94</a></p>
<p>(Historical note: the "believe" is because I've seen it cited to that
paper (which is famous for good reason, I guess); but in the paper they
actually attribute <em>this specific idea</em> to "M. Ito, A. Saito, and T.
Nishizeki: Secret Sharing Scheme realizing any Access Structure, Proc.
Glob.Com. (1987)" ; unfortunately I can't find that on the 'net).</p>
<p>It is also described, with a brief discussion of its security proof, in
<a href="https://crypto.stanford.edu/~dabo/cryptobook/BonehShoup_0_4.pdf">Boneh-Shoup</a>
Sec 19.7.2.</p>
<p>This is not, as far as I know, used at all(?) nor that widely discussed,
but it is in some sense the most simple and logical way to get a 1 out
of \(N\) ring signature; use the XOR (\(\oplus\)) operation:</p>
<p>We have in advance a set of public keys \(P_i\). We only know one
private key for index \(j\), \(x_j\).</p>
<p>We'll now use a standard three move \(\Sigma\)-protocol to prove
knowledge of <strong>at least one key</strong> without revealing which index is
\(j\).</p>
<p>We're going to fake the non-\(j\)-index signatures in advance. Choose
\(s_i \stackrel{\$}{\leftarrow} \mathbb{Z}_N\ ,\ e_i
\stackrel{\$}{\leftarrow} \mathbb{Z}_N \quad \forall i \neq j\).</p>
<p>Calculate \(R_i = s_iG - e_iP_i \quad \forall i \neq j\).</p>
<p>For the real signing index, \(k_j \stackrel{\$}{\leftarrow}
\mathbb{Z}_N\ ,\quad R_j = k_jG\).</p>
<p>We now have the full set of commitments: \((R_i \ \forall i)\)</p>
<p>Now for the clever part. In an interactive \(\Sigma\)-protocol, we
would at this point receive a random challenge \(e \in
\mathbb{Z}_N\). For the Fiat Shamir transformed case,
noninteractively (as for a signature), we use the constructed
\(R\)-values as input to a hash function, i.e. \(e = H(m||R_i
\ldots)\). We have already set the non-signing index \(e\)-values,
for the signing index we set \(e_j = e \oplus (\bigoplus_{i \ne
j}{e_i})\).</p>
<p>This allows us to calculate \(s_j = k_j + e_j x_j\), and we now have
the full set of 'responses' for all the \(\Sigma\)-protocols:
\(s_i \ \forall i\). (but here we are using Fiat Shamir, so it's
not actually a response).</p>
<p>By working this way we have ensured that the signature verifier can
verify that the logical XOR of the three \(e\)-values is equal to the
Fiat Shamir based hash-challenge, e.g. for the case of three
"signatures", we will have:</p>
<p>\(e = e_1 \oplus e_2 \oplus e_3 \stackrel{?}{=}
H(m||R_0||R_1||...)\)</p>
<p>where the verifier would calculate each \(R_i\) as \(s_iG -
e_iP_i\).</p>
<p>The excellent feature of this of course is that it is perfectly hidden
which of the three indexes was genuine. But the bad news is that the
protocol as stated, used let's say as a signature scheme, requires
about twice as many field elements as members of the group of signers.
The verifier needs to be given \((s_1, \ldots s_n),(e_1 \ldots
e_n)\).</p>
<p>Another excellent feature: this is not restricted to the Schnorr ID
protocol. It can work with another identity protocol, and even better,
it could work with a <em>mix</em> of them; they only have to share the one
challenge \(e\).</p>
<h2>Abe-Ohkubo-Suzuki (AOS) 2002 (broken version)</h2>
<p>This is an excellent
<a href="https://www.iacr.org/cryptodb/archive/2002/ASIACRYPT/50/50.pdf">paper</a>
generally, but its stand-out contribution, in this context, is a <strong>more
compact</strong> version of the 1 of n ring signature above. To clarify here,
both this and the previous are \(O(n)\) where \(n\) is the group
size, so "much more compact" is about the constant factor (scale not
scaling!); we reduce it from roughly 2 to roughly 1.</p>
<p>"Broken version" - here I'll present a slightly simpler form than the
one in the paper, and then explain the serious problem with it - which I
hope will be productive. <strong>Please don't mistake this as meaning that
the AOS design was broken, it was never presented like this in the
paper!</strong></p>
<p>Anyway, I think the best explanation for what's going on here
conceptually is due to A. Poelstra in the <a href="https://github.com/Blockstream/borromean_paper">Borromean ring signatures
paper</a>,
particularly Section 2 ; the reference to time travel may seem whimsical
but it gets to the heart of what's going on here; it's about having a
simulated form of causality with one way functions, and then violating
that.</p>
<p>In short: creating an ordinary Schnorr sig without the key (i.e.
forging) is impossible because, working at the curve point level of the
equation (\(sG = R + H(m||R)P\)), you need to know the hash value
before you can calculate \(R\), but you need to know the value of
\(R\) before you can calculate the hash. So we see that two one way
functions are designed to conflict with one another; only by removing
one of them (going from curve points to scalar eqn: (\(s = k +
H(m||kG)x\)), can we now create a valid \(s, R, m\) set.</p>
<p>To achieve that goal over a set of keys, we can make that "simulated
causality enforcement" be based on the same principle, but over a set
of equations instead of one. The idea is to make the commitment
\(H(m||R)\) use the \(R\) value from the "previous"
signer/key/equation, where "previous" is modulo \(N\), i.e. there is
a loop of dependencies (a ring, in fact).</p>
<p>Quick description:</p>
<p>Our goal is a list of \(N\) correctly verifying Schnorr signature
equations, with the tweak as mentioned that each hash-value refers to
the "previous" commitment. We will work with \(N=4\) and index from
zero for concreteness. Our goal is:</p>
<p>\(s_0 G = R_0 + H(m||R_3)P_0\)</p>
<p>\(s_1 G = R_1 + H(m||R_0)P_1\)</p>
<p>\(s_2 G = R_2 + H(m||R_1)P_2\)</p>
<p>\(s_3 G = R_3 + H(m||R_2)P_3\)</p>
<p>Again for concreteness, we imagine knowing specifically the private key
\(x_2\) for index 2, only. We can successfully construct the above,
but only in a certain sequence:</p>
<p>Choose \(k_2 \stackrel{\$}{\leftarrow} \mathbb{Z}_N,\ R_2 =
k_2G\), choose \(s_3 \stackrel{\$}{\leftarrow} \mathbb{Z}_N\).</p>
<p>\(\Rightarrow R_3 = s_3 G - H(m||R_2)P_3\). Now choose \(s_0
\stackrel{\$}{\leftarrow} \mathbb{Z}_N\).</p>
<p>\(\Rightarrow R_0 = s_0 G - H(m||R_3)P_0\). Now choose \(s_1
\stackrel{\$}{\leftarrow} \mathbb{Z}_N\).</p>
<p>\(\Rightarrow R_1 = s_1 G - H(m||R_0)P_1\).</p>
<p>Last, do not choose but <strong>calculate</strong> \(s_2\): it must be \(s_2 = k_2
+ H(m||R_1)x_2\).</p>
<p>After this set of steps, the set of data: \(e_0, s_0, s_1, s_2, s_3\)
can be verified without exposing which private key was known. Here is
the verification:</p>
<p>Given \(e_0, s_0\), reconstruct \(R_0 = s_0G -e_0P_0\).</p>
<p>\(\Rightarrow e_1 =H(m||R_0)\ ,\ R_1 = s_1 G - e_1P_1\)</p>
<p>\(\Rightarrow e_2 =H(m||R_1)\ ,\ R_2 = s_2 G - e_2P_2\)</p>
<p>\(\Rightarrow e_3 =H(m||R_2)\ ,\ R_3 = s_3 G - e_3P_3\)</p>
<p><strong>Check</strong>: \(e_0 \stackrel{?}{=} H(m||R_3)\).</p>
<h3>Security weaknesses</h3>
<p>The description above can't be described as secure.</p>
<p>To give a hint as to what I mean: is there something <strong>not completely
fixed</strong> in the above construction? Maybe an issue that's not even
specific to the "ring" construction, but even for any one of the
signature equations?</p>
<p>....</p>
<p>The answer is the keys, \(P_i\). We can in the most general case
consider three scenarios, although there may be some gray areas between
them:</p>
<ul>
<li>Key(s) fixed in advance: \(P_1 \ldots P_N\) are all specified
before doing anything, and not allowed to change by the verifier.
Every signature must be on that set of keys.</li>
<li>The <em>set</em> <em>of possible keys</em> is fixed in advance exactly as
described above, but the <em>set of keys used in the ring</em> is chosen by
the signer, dynamically, in signing oracle queries or forgery
attempts.</li>
<li>Even the set of possible keys is dynamic. That is to say, any valid
curve point (for EC case) is a valid potential key in (ring)
signature.</li>
</ul>
<p>This is not a full taxonomy of possible attack scenarios, either. Not
only must we consider the difference between EUF-CMA and SUF-CMA as was
discussed in the previous blog (a reminder: with SUF, a forger should
not be able to even create a second signature on the same message -
ECDSA doesn't have this in naive form), but much more: we must also
consider which of the above three key settings applies.</p>
<p>Even outside of ring signature settings, just considering a large scale
deployment of a signature scheme across millions or billions of keys,
could mean that the difference between these cases really matters. In
<a href="https://eprint.iacr.org/2015/996">this</a>
paper by Dan Bernstein the term MU-UF-CMA is used to refer to the
"multi-user" setting for this, where only single-key signatures are
used but one must consider whether having billions of other keys and
signing oracles for them might impact the security of <strong>any one</strong> key
(notice the huge difference between "I want to forge on \(P\)" and
"I want to forge on any existing key" is, in this scenario).</p>
<p>So enough about settings, what exactly constitutes a security problem
with the above version of the AOS ring sig?</p>
<p>Consider any one element in the ring like:</p>
<p>\(s_0 = R_0 + H(m||R_3)P_0\)</p>
<p>where, for concreteness, I choose \(n=4\) and look at the first of 4
signature equations. Because of Schnorr's linearity (see <a href="https://web.archive.org/web/20200713230948/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">this earlier
blog
post</a>
for some elucidations on the <em>advantage</em> of this linearity, although it
was also noted there that it had concomitant dangers (worse,
actually!)), there are two obvious ways we could tweak this equation:</p>
<p>(1) Tweaked \(s\) values on fixed message and tweaked keys:</p>
<p>Choose \(\alpha \in \mathbb{Z}_N\) and set \(s' = s_0
+\alpha\). We will not alter \(R=kG\), but we alter \(P_0
\rightarrow P_0 + e_0^{-1}\alpha G\). This makes the verification
still work <strong>without altering the fixing of the nonce in the hash value
\(e_0\):</strong></p>
<p>\(s_0 G + \alpha G = R_0 + e_0 P_0 + \alpha G = R_0 + e_0\left(P_0 +
e_0^{-1}\alpha G\right)\)</p>
<p>So it's really not clear how bad this failing is; it's <em>kinda</em> a
failure of strong unforgeability, but that notion doesn't precisely
capture it: we created a new, valid signature against a
<em>new</em> key, but with two severe
limitations: we weren't able to alter the message, and also, we
weren't able to <em>choose</em> the new key \(P'\). That last is slightly
unobvious, but crucial : if I have a pre-prepared \(P^{*}\), I
cannot choose \(\alpha\) to get \(P' = P^{*}\) as that would
require a discrete logarithm break.</p>
<p>A final statement, hopefully obvious: the above can apply to any and all
of the elements of the ring, so the forgery could consist of an entirely
different and random set of keys, not related to the starting set; but
the message would be the same, as would the \(R\) values.</p>
<p>(2) Completely different messages on tweaked keys, with the same
signature</p>
<p>This one is almost certainly more important. Algebraically, we here
allow alterations to the \(e\) values, using multiplication rather
than addition:</p>
<p>Given the same starting \(s_0\) as in (1), we take a chosen new
message \(m^{*}\) and calculate the new \(e^{*} =
H(m^{*}||R_3)\). If we likewise tweak the public key we get that
\(s_0, R_0\) is a valid signature on the new message, with the tweaked
key:</p>
<p>\(s_0 G = R_0 + e_0^{*}\left(\frac{e_0}{e_0^{*}} P_0\right)\)</p>
<p>We can see here that this produces a forgery with the same signature
values (but different hash values) on the new keys.</p>
<p>Most definitions of security against forgery require the attacker to
create a signature on a not-previously-queried message - so this <em>is</em> a
successful attack, by most measures.</p>
<p>However it does share the same limitation with (1) mentioned above -
that you cannot "control" the keys on which you get a signature,
unless you know a relative discrete log between one of the existing keys
and your new key, which implies you knew the secret key of the first (in
which case all this is pointless; whenever you have a private key, there
is no forgery on it).</p>
<p><strong>All of this should make very clear the reason why the real AOS (see
Section 5.1 of the paper) discrete-log ring signature fixes the entire
set of keys inside the hash, i.e. \(e_i = H(m || R_{(i-1)\%n}||
P_0 \ldots P_{n-1})\).</strong></p>
<h3>Key Prefixing</h3>
<p>The method in the previous bolded sentence is sometimes called
"key-prefixing". One way of looking at it: the Fiat-Shamir transform
that takes the Identity Protocol into a signature scheme, should hash
the conversation transcript between the prover and verifier, previous to
the challenge step; by including the public keys in this hash, we are
treating the keyset as part of the conversation transcript, rather than
something ex-protocol-run.</p>
<p>Also, the discussion above (both cases (1) and (2)) show clearly that
the same weakness exists for a single (\(n=1\)) key case.</p>
<p><strong>And yet, for the single key case, it was not a done deal historically -
this caused real world arguments</strong>.
After all, there are many use cases where the key <em>is</em> a given
ex-protocol-run, plus there may be some practical disadvantage to doing
the key-prefixing.</p>
<p>In
<a href="https://rd.springer.com/chapter/10.1007%2F978-3-662-53008-5_2">this</a>
paper from CRYPTO-2016, the controversy arising out of this is
elucidated, showing that these theoretical concerns had very substantial
impact on arguably the largest real world crypto usage (TLS):</p>
<blockquote>
<p>"Key-prefixing comes with the disadvantage that the entire public-key
has to
be available at the time of signing. Specifically, in a CFRG message
from September
2015 Hamburg [32] argues "having to hold the public key along
with
the private key can be annoying" and "can matter for constrained
devices".
Independent of efficiency, we believe that a cryptographic protocol
should be
as light as possible and prefixing (just as any other component)
should only
be included if its presence is justified. Naturally, in light of the
GMLS proof,
Hamburg [32] and Struik [44] (among others) recommended against
key prefixing\
for Schnorr. Shortly after, Bernstein [10] identifies the error in
the GMLS theorem
and posts a tight security proof for the key-prefixed variant of
Schnorr signatures.
In what happens next, the participant of the CFRG mailing list
switched
their minds and mutually agree that key-prefixing should be preferred,
despite of
its previously discussed disadvantages. Specifically, Brown writes
about Schnorr
signatures that "this justifies a MUST for inclusion of the public key
in the message
of the classic signature" [16]. As a consequence, key-prefixing
is contained in
the current draft for EdDSA [33]..."</p>
</blockquote>
<p><em>Technical note: the "GMLS proof" mentioned in the above is the proof
given in
<a href="https://www.researchgate.net/publication/256720499_Public_key_signatures_in_the_multi-user_setting">this</a>
paper, that was intended to reduce the security of the multi-user
setting to that of the single-user setting, and that Dan Bernstein's
<a href="https://eprint.iacr.org/2015/996">paper</a>
previously mentioned proved to be invalid.</em></p>
<p>What's the TLDR? Fix the keys in any group/ring/multisignature. And
even that may not be enough, see
<a href="https://eprint.iacr.org/2018/068">MuSig</a>
for details of why it really isn't, in the scenario of Bitcoin
aggregated multisig.</p>
<h2>Borromean, Maxwell-Poelstra 2015</h2>
<p>I covered this extensively (including description of AOS as above) in my
<a href="https://github.com/AdamISZ/ConfidentialTransactionsDoc/">CT
writeup</a>
section 3.2</p>
<p>The idea of the construction as outlined in <a href="https://github.com/Blockstream/borromean_paper">the paper by Maxwell,
Poelstra</a>
is to increase the space-efficiency of the published proof even more. By
having several ring signatures joined at a single index we get a
reduction in the number of \(e\) values we publish. This is basically
the same idea as the "AND of \(\Sigma\)-protocols" discussed a
little later in this document (although here we will only be using it
for achieving a specific goal, "Linkability", see more on this next).</p>
<p>For the real world context - Borromean ring signatures are used in
certain implementations of Confidential Transactions (e.g. Liquid by
Blockstream) today, and were previously used also in Monero for the same
goal of CT. They are a radically different use-case of ring signatures
to the one mostly described in the below; instead of using a ring
signature to hide the identity of a signer, they are used to hide which
exponent contains values in the encoding of a value committed to in a
Pedersen commitment. This allows arithmetic to be done on the
Pedersen-committed amount without worrying about overflow into negative
values modulo \(N\).</p>
<h2>Linkability and Exculpability</h2>
<p>In this section we'll briefly describe certain key features that turn
out to be useful in some real-world applications of a ring signature,
before in the following sections laying out how these features are, or
are not, achieved.</p>
<h3>Linkability (and spontaneity)</h3>
<p>At first glance, the idea "linkability" with a ring signature seems to
be a contradiction. Since we are trying to achieve signer
ambiguity/anonymity, we don't really want any "linking" being done.
But the idea is rather clever, and proves to be very interesting for
digital cash.</p>
<p>In a <strong>linkable</strong> ring signature, a participant with key \(P \in L\)
(i.e. \(L\) is a particular set of public keys), should be able to
produce one ring signature on a given message, but should not be able to
do so again without the two ring signatures being linked. Thus,
functionally, each participant can only make such a signature once
(note: they can still retain anonymity if double-signing).</p>
<p>This restriction-to-one-signature-while-keeping-anonymity is easily seen
to be valuable in cases like electronic voting or digital cash, as well
as the oft-cited example explained in the next paragraph.</p>
<p>The <strong>spontaneity</strong> property should be a lot more obvious. Consider the
example of a whistleblower. We would want individuals in some large
group (e.g. government bureaucrats) to attest to a statement, while only
revealing group membership and not individual identity. Clearly this is
not workable if it requires cooperation of other members of the group
(even in any setup phase), so it's necessary that the individual can
create the ring signature "spontaneously", knowing only the public key
of other participants.</p>
<p>The paper uses the abbreviation LSAG for this type of signature:
"Linkable Spontaneous Anonymous Group" signature.</p>
<p>Note that the previous two constructions (CDS, AOS) can also have this
spontaneity property; but not the linkability property.</p>
<h3>Culpability, Exculpability and Claimability</h3>
<p>A ring signature can be described as exculpable if, even given knowledge
of the signing private key, an adversary cannot deduce that that signing
key was the one used to create the ring signature.</p>
<p>Notice that such a property may be immensely important in a range of
scenarios where a ring sig is useful - e.g. for a whistleblower whose
cryptographic keys were stolen or extracted by force, he could still
plausibly deny being the origin of a leak.</p>
<p>The reader can easily verify that the AOS construction, for example, has
this exculpability. The fact that a particular key is released e.g.
\(x_2\) in our concrete example, does not allow inference of it having
been used to create that signature. Any other key could have created the
signature, using the same signing algorithm.</p>
<p>The LWW LSAG, which we'll describe shortly, is on the other hand
<strong>culpable</strong>, i.e. the opposite - because the key image can be verified
to be tied to one particular key.</p>
<p>It's easy to see that the two properties <strong>exculpability</strong> and
<strong>linkability</strong> are somewhat in conflict, although I'm not aware of a
theorem that <em>absolutely requires</em> linkability to somehow tag one key in
case it is leaked.</p>
<p>Lastly, I'll mention <strong>claimability</strong>, which is briefly described also
in the LWW paper (see below). It may be possible for the owner of a key
to independently/voluntarily prove that they were the source of a given
ring signature, which doesn't logically require culpability.
Claimability is generally easy to achieve with some proof of knowledge
technique.</p>
<h2>AND of \(\Sigma\)-protocols, DLEQ</h2>
<p>The thoughtful reader probably won't have much trouble in imagining
what it would mean to do the logical AND of 2 \(\Sigma\)-protocols.</p>
<p>"AND" here just means you need to prove to the Verifier that you know
both secrets / both conditions are true. So this only requires that you
can answer both challenges (second step) with correct responses. Using
the standard notation, that means generating two transcripts:</p>
<p>\((R_1, e, s_1) \quad (R_2, e, s_2)\)</p>
<p>i.e. the same \(e\)-value is given to both protocol runs after
receiving the initial commitments from each. Fiat-Shamir-ising this
protocol will work the same as the usual logic; if considering a
signature scheme, we'll be hashing something like
\(H(m||R_1||R_2||P_1||P_2)\), if we include, as we have learnt
to, key-prefixing.</p>
<p>As we already mentioned, the Borromean ring signature design uses this
idea to compactify a set of ring signatures, since only one
\(e\)-value is being published, rather than \(M\) for \(M\) ring
signatures.</p>
<p>This much is not super-interesting; but we can tighten this up a bit and
only use <strong>one</strong> commitment and response in a special case:</p>
<h3>Proof of Discrete Log Equivalence (DLEQ, PoDLE)</h3>
<p>See one of the first posts on this
<a href="https://web.archive.org/web/20200713230948/https://joinmarket.me/blog/blog/poodle">blog</a>
for a description of this technique; here we're giving a slightly
deeper look at the meaning.</p>
<p>If you are proving not only knowledge of a secret \(x\), but also that
two curve points have the same discrete log \(x\) w.r.t. different
bases \(G\) and \(J\) (whose relative discrete log must not be
known; see earlier blog post etc.), you can condense the above AND by
reusing the commitment and challenge for the two bases:</p>
<p>\(\mathbb{P} \rightarrow \mathbb{V}\): \(R_1= kG,R_2=kJ\)</p>
<p>\(\mathbb{V} \rightarrow \mathbb{P}\): \(e =
H(m||R_1||R_2||P_1||P_2)\)</p>
<p>\(\mathbb{P} \rightarrow \mathbb{V}\): \(s\), (in secret:
\(=k+ex\))</p>
<p>Now, if the prover acted honestly, his construction of \(s\) will
correctly pass verification <strong>twice</strong>:</p>
<p>\(sG \stackrel{?}{=}R_1 +e P_1 \quad sJ \stackrel{?}{=} R_2 +
eP_2\)</p>
<p>... and notice that it would be impossible to make that work for
different \(x\)-values on the two bases \(G\) and \(J\) because
you would need to find \(k_1, k_2 \in \mathbb{Z}_N, x_1, x_2 \in
\mathbb{Z}_N\) such that, <strong>without knowing \(e\) in advance,</strong>
\(s = k_1 + ex_1 =k_2 + ex_2\), which is clearly impossible.</p>
<p>Proof of soundness is easy to see using the standard rewinding technique
(see e.g. previous blog post amongst many other places); after the two
upfront commitments are fixed and the \(e\)-values are "forked", we
will get two \(s\) values as usual and extract \(x\).</p>
<h2>Liu-Wei-Wong 2004 LSAG</h2>
<p>Shortly after the AOS paper, Liu, Wei and Wong published a
<a href="https://www.researchgate.net/publication/220798466_Linkable_Spontaneous_Anonymous_Group_Signature_for_Ad_Hoc_Groups_Extended_Abstract">paper</a>
outlining how the same basic idea could be extended to a slightly more
complex context of requiring <strong>linkability</strong>, as earlier mentioned. It
uses a combination of the above: DLEQ via AND of
\(\Sigma\)-protocols, and OR of \(\Sigma\)-protocols for the ring
signature hiding effect. Detailed algorithm with commentary follows.</p>
<h3>Liu-Wei-Wong's LSAG algorithm</h3>
<p>We start with a keyset \(L = \{P_0 \ldots P_{n-1}\}\) chosen by
the signer, whose index will be \(\pi\) (note the ambiguities about
"what is the set of valid keys?" as was discussed under "Key
Prefixing"). We then form a special new kind of curve point that we'll
name from now on as the <strong>key image</strong> (for reasons that'll become
clear):</p>
<p>\(I =x_{\pi} \mathbb{H}(L)\)</p>
<p>Here \(\mathbb{H}\) is a hash function whose output space is points
on the curve, rather than scalar numbers. (<em>The mechanical operation for
doing this is sometimes described as "coerce to point"; for example,
take the 256 bit number output by SHA256 and interpret it as an
\(x-\)coordinate on secp256k1, find the "next" valid point
\(x,y\), incrementing \(x\) if necessary, or whatever; just has to
be deterministic</em>). \(\mathbb{H}(L)\) is therefore going to play the
same role as \(J\) in the previous section, and we assume
intractability of relative discrete log due to the hashing.</p>
<h3>Signing LWW LSAG</h3>
<p>The following steps are very similar "in spirit" to AOS; we still
"extend the causality loop" (bastardising Poelstra's description)
over the whole set of signatures instead of just one, but this time we
also "lift" the loop onto a base of \(\mathbb{H}(L)\) and replicate
the signatures there, too:</p>
<ul>
<li>Set \(k_{\pi} \stackrel{\$}{\leftarrow} \mathbb{Z}_N\)</li>
<li>Form the hash-challenge at the next index: \(e_{\pi+1} =
H(m||L||k_{\pi}G||k_{\pi}\mathbb{H}(L)||I)\)</li>
<li>Note to the above: \(k_{\pi}G\) was previously called
\(R_{\pi}\) in AOS; we are trying to preserve here, the same
notation where possible; and of course it's the \(R\) value, not
the \(k\)-value that will be known/calculated by the verifier. The
same applies to the "lifted" nonce-point which follows it in the
concatenation. With respect to the key image, note that it <em>will</em> be
published and known to the verifier; but he won't know which index
it corresponds to.</li>
<li>Pick \(s_{\pi+1} \stackrel{\$}{\leftarrow} \mathbb{Z}_N\);
then we do as in AOS, but duplicated; we set:</li>
<li>\(R_{\pi+1} = s_{\pi+1}G - e_{\pi+1}P_{\pi+1}\) and
\(R^{*}_{\pi+1} = s_{\pi+1}\mathbb{H}(L) - e_{\pi+1}I\)</li>
<li>I realise the last line is pretty dense, so let's clarify: the
first half is exactly as for AOS; calculate \(R\) given the random
\(s\) and the just-calculated hash value \(e\). The <em>second</em>
half is <strong>the same thing with the base point \(G\) replaced with
\(\mathbb{H}(L)\), and the pubkey replaced with \(I\) at every
index</strong>. We used a shorthand \(R^{*}\) to mean
\(k_{\pi+1}\mathbb{H}(L)\), because of course we don't
actually <em>know</em> the value \(k_{\pi+1}\).</li>
<li>Calculate the next hash-challenge as \(e_{\pi+2} =
H(m||L||R_{\pi+1}||R^{*}_{\pi+1}||I)\)</li>
<li>Etc...</li>
<li>As with AOS, we can now forge all the remaining indices, wrapping
around the loop, by repeating the above operation, generating a new
random \(s\) at each step, until we get back to the signing index
\(\pi\), when we must calculate \(s_{\pi}\) as: \(s_{\pi}
= k_{\pi} + e_{\pi}x_{\pi}\).</li>
<li>Signature is published as \(\sigma_{L}(m) = (s_0 \ldots
s_{n-1}, e_0, I)\). (As before, if the keyset \(L\) is not
specified in advance, it will have to be published for the
verifier).</li>
</ul>
<p>So what we're doing here is OR(DLEQ(0), DLEQ(1),.... DLEQ(n-1)). And
as observed, each DLEQ is actually an AND: "AND(I know x for P, x for P
is same as x for P2)". Hence this represents a clever combination of
AND- and OR- of \(\Sigma\)-protocols.</p>
<p><em>On a personal note, when I first saw something of this type (I think it
was Cryptonote, see below), I found it quite bewildering, and I'm sure
I'm not alone! But what partially saved me is having already studied
PoDLE/DLEQ as well as AOS ring sigs, so I could intuit that something
combining the two ideas was going on. I hope the previous painstaking
introductions make it all a lot clearer!</em></p>
<p>Note the key similarities and difference(s) in the published signature,
to the AOS case: you still only need to publish one hash \(e_0\) since
the others are determined by it, but you <strong>must</strong> publish also the key
image \(I\); if another LWW LSAG is published using the same private
key, it will perforce have the same key image, and be recognized as
having come from the same key <em>without revealing which
key</em>.</p>
<p>The protocol using the LSAG can thus reject a "double-sign", if
desired.</p>
<p>Let's sanity check that we understand the verification algorithm, since
it is slightly different than AOS:</p>
<h3>Verifying LWW LSAG</h3>
<p>Start with the given keyset \(L\), the message \(m\) and the
signature \((s_0 \ldots s_{n-1}, e_0, I)\)</p>
<ul>
<li>Construct \(e_{1} = H(m||L||R_{0}||R^{*}_{0}||I)\)
using \(R_0 = s_0G - e_0 P_0\) and \(R^{*}_{0} = s_0
\mathbb{H}(L) - e_0 I \)</li>
<li>Repeat at each index using the new \(e_j\) until \(e_0\) is
calculated at the last step and verify it matches: \(e_0
\stackrel{?}{=} H(m||L||R_{n-1}||R^{*}_{n-1}||I)\).
Accept if so, reject if not.</li>
</ul>
<p>(with the additional point mentioned: the protocol using the sig scheme
may also reject this as valid if \(I\) has already been used; this
additional protocol step is usually described as "LINK" in the
literature).</p>
<h3>A brief note on the key image</h3>
<p>Make sure you get the difference between this \(\mathbb{H}(L)\) and
the previous \(J\) as per the general DLEQ. In the latter case we can
(and should) choose an arbitrary globally-agreed NUMS point, for example
hashing the standard curve base point \(G\) (with the
"coerce-to-point" technique mentioned). In this case, we have chosen
something that both signer and verifier agree on, as part of the
<strong>setting</strong> of this particular run of the protocol - it's
deterministically tied to the keyset \(L\). The key image\(I\) is
analogous to \(P_2\) in my PoDLE blog post; it's the signer's
"hidden", one-time key.</p>
<p>This changes in the next construction, Back 2015. But first, a few words
on security.</p>
<h2>Security arguments for the LWW LSAG</h2>
<p>The general approach to proving <strong>unforgeability</strong> of this ring
signature is the same as that for the basic Schnorr signature as
described in the previous blog post.</p>
<p>A wrapper around an attacker \(\mathbb{A}\) who we posit to have the
ability to construct a forgery without knowing any private key
\(x_i\), will, as before, have to guess which random oracle query
corresponds to the forgery, and will want to provide two different
"patched" answers to the RO query at that point. As before, there will
be some reduced probability of success due to having to make this kind
of guess, and so the reduction will be even less tight than before.</p>
<p>Also as before, in the EUF-CMA model, we must allow for an arbitrary
number of signing oracle as well as RO queries, which complicates the
statistical analysis considerably, but the basic principles remain the
same. If at some point forgery is successfully achieved twice at the
same index, we will have something like:</p>
<p>\(x_{\pi} =
\frac{s^{*}_{\pi}-s_{\pi}}{e^{*}_{\pi}-e_{\pi}}\)</p>
<p>where the * superscripts indicate the second run, and the
\(e\)-values being the patched RO responses.</p>
<p>And as usual, with appropriate statistical arguments, one can generate a
reduction such that forgery ability with a certain probability \(p\)
implies thus ability to solve ECDLP with a related probability
\(p'\).</p>
<p>For proving <strong>signer ambiguity</strong> - for simplicity, we break this into
two parts. If <em>all</em> of the private keys are known to the attacker (e.g.
by subpoena), then this property completely fails. This is what we
called <strong>culpability</strong>. It's easy to see why - we have the key image as
part of the signature, and that is deterministically reproducible given
the private key. If <em>none</em> of the private keys are known to the
attacker, the problem is reduced to the <strong>solution of the <a href="https://en.wikipedia.org/wiki/Decisional_Diffie%E2%80%93Hellman_assumption">Decisional
Diffie Hellman
Problem</a></strong>,
which is considered computationally hard. The reduction is quite
complicated, but as in a standard zero knowledgeness proof, the idea is
that a Simulator can generate a transcript that's statistically
indistinguishable from a genuine transcript.</p>
<p>For proving <strong>linkability </strong> - in the LWW paper an argument is made that
this reduces to ECDLP in more or less the same was as for the
unforgeability argument, using two pairs of transcripts for two
different signatures which are posited to be based on the same private
key but having different key images. Examination of the two pairs of
transcripts allows one to deduce that the private key in the two cases
are the same, else ECDLP is broken.</p>
<p>Notice that these security arguments are <strong>much more complicated than for
the single Schnorr signature case</strong>
and perhaps for two distinct reasons: one, because the ring signature is
a more complex algebraic construction, with more degrees of freedom, but
also, because we are asking for a significantly richer set of properties
to hold. In particular notice that even for unforgeability, the EUF-CMA
description is not good enough (we've already discussed this a bit); we
need to consider what happens when creating multiple signatures on
different keysets and how they overlap. Signer anonymity/ambiguity is
especially difficult for LWW and its postdecessors (see below), because
by design it has been weakened (culpability).</p>
<h2>Back 2015; compression, single-use</h2>
<p><em>This is a good place to note that the constructions starting with LWW
are described in some detail in the useful document
<a href="https://ww.getmonero.org/library/Zero-to-Monero-1-0-0.pdf">Zero-To-Monero</a></em>.</p>
<p>Adam Back
<a href="https://bitcointalk.org/index.php?topic=972541.msg10619684#msg10619684">posted</a>
in 2015 on bitcointalk about a potential space saving over the
cryptonote ring signature, based on using AOS and tweaking it to include
a key image.</p>
<p>As was noted above, it's a space saving of asymptotically about 50% to
use a scheme like AOS that only requires publication of one hash
challenge as opposed to one for each index (like the CDS for example).</p>
<p>He then followed up noting that a very similar algorithm had already
been published, namely the LWW we've just described in the above, and
moreover it was published three years before Fujisaki-Suzuki that was
the basis of cryptonote (see below). So it was <em>somewhat</em> of an
independent re-discovery, but there is a significant tweak. I'll
outline the algorithm below; it'll look very similar to LWW LSAG, but
there's a difference.</p>
<h3>Signing Back-LSAG</h3>
<ul>
<li>Define key image \(I =x_{\pi}\mathbb{H}(P_{\pi})\);</li>
<li>Set \(k_{\pi} \stackrel{\$}{\leftarrow} \mathbb{Z}_N\)</li>
<li>Form the hash-challenge at the next index: \(e_{\pi+1} =
H(m||k_{\pi}G||k_{\pi}\mathbb{H}(P_{\pi}))\)</li>
<li>Pick \(s_{\pi+1} \stackrel{\$}{\leftarrow} \mathbb{Z}_N\);
then:</li>
<li>\(R_{\pi+1} = s_{\pi+1}G - e_{\pi+1}P_{\pi+1}\) and
\(R^{*}_{\pi+1} = s_{\pi+1}\mathbb{H}(P_{\pi+1}) -
e_{\pi+1}I\)</li>
<li>Calculate the next hash-challenge as \(e_{\pi+2} =
H(m||R_{\pi+1}||R^{*}_{\pi+1})\)</li>
<li>Etc...</li>
<li>As with AOS and LWW, we can now forge all the remaining indices,
wrapping around the loop, by repeating the above operation,
generating a new random \(s\) at each step, until we get back to
the signing index \(\pi\), when we must calculate \(s_{\pi}\)
as: \(s_{\pi} = k_{\pi} + e_{\pi}x_{\pi}\).</li>
<li>Signature is published as \(\sigma_{L}(m) = (s_0 \ldots
s_{n-1}, e_0, I)\), as in LWW (\(L\) being the set of \(P\)s).</li>
</ul>
<p>Verification for this is near-identical as for LWW, so is left as an
exercise for the reader.</p>
<h3>What's the difference, and what's the purpose?</h3>
<p>The tweak - which is very similar to Cryptonote (makes sense as it was
an attempt to improve that) - is basically this: by making each of the
signatures in the shifted base point version symmetrical (example:
\(s_2 \mathbb{H}(P_2) = k_2 \mathbb{H}(P_2) + e_2 I\)), it means
that a key image will be valid <em>independent of the set of public keys,
\(L\).</em> This is crucial in a cryptocurrency application - we need the
key image to be a unique double spend signifier across many different
ring signatures with different keysets - the keys are ephemeral and
change between transactions.</p>
<p>So it's a blend of the LWW LSAG, which has the advantage of space
compaction for the same reason as AOS - only one hash must be published,
the others can be deduced from the ring structure - with the
F-S-2007/Cryptonote design, which fixes the key image to the key and not
just the specific ring.</p>
<p>However I have to here leave open whether the security arguments of LWW
carry across to this case. I note that the original description did
<em>not</em> include the keyset in the hash challenge (notice absence of
\(L\)); but see the note on MLSAG below.</p>
<h2>Fujisaki-Suzuki 2007 and Cryptonote</h2>
<p><a href="https://cryptonote.org/whitepaper.pdf">Cryptonote</a>
was adapted from a paper of <a href="https://eprint.iacr.org/2006/389.pdf">Fujisaki and
Suzuki</a>
describing an alternate version of a linkable (here "traceable") ring
signature, in 2007. We won't dwell on these constructions here (except
inasmuch as we referred to them above), as they provide the same
linkability function as the above LSAG, but are less compact. Instead,
in the final section, I'll describe how Monero has applied LWW LSAG and
the Back LSAG to their specific requirements.</p>
<h2>Monero MLSAG</h2>
<p>For anyone paying close attention all the way through, there will be
nothing surprising here!</p>
<p>For a cryptocurrency, we build transactions consisting of multiple
inputs. Each input in Monero's case uses a ring signature, rather than
a single signature, to authenticate the transfer, referring back to
multiple pubkeys possessing coins as outputs of earlier transactions.</p>
<p>So here we need <strong>one ring signature per input</strong>. Moreover, per normal
transaction logic, we obviously need <em>all</em> of those ring signatures to
successfully verify. So this is another case for the "AND of
\(\Sigma\)-protocols". We just run \(M\) cases of Back's LSAG and
combine them with a single \(e\) hash challenge at each key index (so
the hash challenge kind of "spans over the inputs"). Additionally,
note that the hash challenge here is assumed to include the keyset with
a generic \(L\) (limiting tiresome subscripting to a minimum...).</p>
<p>To sign \(M\) inputs each of which have \(n\) keys:</p>
<ul>
<li>For each input, define key image \(I_i
=x_{i,\pi}\mathbb{H}(P_{i,\pi}) \ \forall i \in 0 \ldots
M-1\);</li>
<li>Set \(k_{i, \pi} \stackrel{\$}{\leftarrow} \mathbb{Z}_N \
\forall i \in 0 \ldots M-1\)</li>
<li>Form the hash-challenge at the next index: \(e_{\pi+1} =
H(m||L||k_{0, \pi}G||k_{0,
\pi}\mathbb{H}(P_{0,\pi})||k_{1, \pi}G||k_{1,
\pi}\mathbb{H}(P_{1,\pi}) ...)\)</li>
<li>Pick \(s_{i, \pi+1} \stackrel{\$}{\leftarrow} \mathbb{Z}_N\
\forall i \in 0 \ldots M-1\); then:</li>
<li>\(R_{i, \pi+1} = s_{i, \pi+1}G - e_{\pi+1}P_{i, \pi+1}\)
and \(R^{*}_{i, \pi+1} = s_{i, \pi+1}\mathbb{H}(P_{i,
\pi+1}) - e_{\pi+1}I_i \ \forall i \in 0 \ldots M-1\)</li>
<li>Calculate the next hash-challenge as \(e_{\pi+2} =
H(m||L||R_{0, \pi+1}||R^{*}_{0,\pi+1}||R_{1,
\pi+1}||R^{*}_{2,\pi+1} ...)\)</li>
<li>Etc...</li>
<li>Logic as for AOS, LWW but duplicated at every input with single
\(e\)-challenge, and at signing index for all inputs (\(\pi\)):
\(s_{i, \pi} = k_{i, \pi} + e_{i, \pi}x_{i, \pi}\ \forall
i \in 0 \ldots M-1\).</li>
<li>Signature is published as \(\sigma_{L}(m) = (s_{0,0} \ldots
s_{0,M-1}, \ldots, s_{n-1,0}, \ldots s_{n-1,M-1}, e_0, I_0
\ldots I_{M-1})\).</li>
</ul>
<p>Note:</p>
<p>(1) This algorithm as described requires each input to have the genuine
signer at the same key-index in the set of pubkeys for each input, which
is a limitation.</p>
<p>(2) Monero has implemented Confidential Transactions, and this is
folded in with the above into a new design which seems to have two
variants "RingCTFull" and "RingCTSimple". This can be investigate
further in the documents on RingCT as referenced in the previously
mentioned
<a href="https://ww.getmonero.org/library/Zero-to-Monero-1-0-0.pdf">ZeroToMonero</a>.</p>Liars, cheats, scammers and the Schnorr signature2019-02-01T00:00:00+01:002019-02-01T00:00:00+01:00Adam Gibsontag:joinmarket.me,2019-02-01:/blog/blog/liars-cheats-scammers-and-the-schnorr-signature/<p>security arguments for Schnorr</p><h3>Liars, cheats, scammers and the Schnorr signature</h3>
<p>How sure are <em>you</em> that the cryptography underlying Bitcoin is secure?
With regard to one future development of Bitcoin's crypto, in
discussions in public fora, I have more than once confidently asserted
"well, but the Schnorr signature has a security reduction to ECDLP".
Three comments on that before we begin:</p>
<ul>
<li>If you don't know what "reduction" means here, fear not, we will
get deeply into this here.</li>
<li>Apart from simply <em>hearing</em> this and repeating it, I was mostly
basing this on a loose understanding that "it's kinda related to
the soundness proof of a sigma protocol" which I discussed in my
<a href="https://github.com/AdamISZ/from0k2bp">ZK2Bulletproofs</a>
paper, which is true - but there's a lot more involved.</li>
<li>The assertion is true, but there are caveats, as we will see. And
Schnorr is different from ECDSA in this regard, as we'll also see,
at the end.</li>
</ul>
<p>But why write this out in detail? It actually came sort of out of left
field. Ruben Somsen was asking on slack about some aspect of Monero, I
forget, but it prompted me to take another look at those and other ring
signatures, and I realised that attempting to understand the
<strong>security</strong> of those more complex constructions is a non-starter unless
you <strong>really understand why we can say "Schnorr is secure" in the
first place</strong>.</p>
<h3>Liars and cheats</h3>
<p>The world of "security proofs" in cryptography appears to be a set of
complex stories about liars - basically made up magic beans algorithms
that <em>pretend</em> to solve things that nobody <em>actually</em> knows how to
solve, or someone placing you in a room and resetting your clock
periodically and pretending today is yesterday - and cheats, like
"let's pretend the output of the hash function is \(x\), because it
suits my agenda for it to be \(x\)" (at least in this case the lying
is consistent - the liar doesn't change his mind about \(x\); that's
something!).</p>
<p>I hope that sounds crazy, it mostly really is :)</p>
<p>(<em>Concepts I am alluding to include: the random oracle, a ZKP simulator,
extractor/"forking", an "adversary" etc. etc.</em>)</p>
<h2>Preamble: the reluctant Satoshi scammer</h2>
<p>The material of this blog post is pretty abstract, so I decided to spice
it up by framing it as some kind of sci-fi :)</p>
<p><img alt="" src="https://web.archive.org/web/20200428212652im_/https://joinmarket.me/static/media/uploads/cube-250082_6402.png"></p>
<p>Imagine you have a mysterious small black cube which you were given by
an alien that has two slots you can plug into to feed it input data and
another to get output data, but you absolutely can't open it (so like
an Apple device, but more interoperable), and it does one thing only,
but that thing is astonishing: if you feed it a message and a <strong>public</strong>
key in its input slot, then it'll <em>sometimes</em> spit out a valid Schnorr
signature on that message.</p>
<p>Well in 2019 this is basically useless, but after considerable
campaigning (mainly by you, for some reason!), Schnorr is included into
Bitcoin in late 2020. Delighted, you start trying to steal money but it
proves to be annoying.</p>
<p>First, you have to know the public key, so the address must be reused or
something similar. Secondly (and this isn't a problem, but is weird and
will become relevant later): the second input slot is needed to pass the
values of the hash function sha2 (or whatever is the right one for our
signature scheme) into the black box for any data it needs to hash. Next
problem: it turns out that the device only works if you feed it a few
<em>other</em> signatures of other messages on the same public key, first.
Generally speaking, you don't have that. Lastly, it doesn't <em>always</em>
work for any message you feed into it (you want to feed in 'messages'
which are transactions paying you money), only sometimes.</p>
<p>With all these caveats and limitations, you fail to steal any money at
all, dammit!</p>
<p>Is there anything else we can try? How about we pretend to be someone
else? Like Satoshi? Hmm ...</p>
<p>For argument's sake, we'll assume that people use the Schnorr Identity
Protocol (henceforth SIDP), which can be thought of as "Schnorr
signature without the message, but with an interactive challenge".
We'll get into the technicals below, for now note that a signature
doesn't prove anything about identity (because it can be passed
around), you need an interactive challenge, a bit like saying "yes,
give me a signature, but *I* choose what you sign".</p>
<p>So to get people to believe I'm Satoshi (and thus scam them into
investing in me perhaps? Hmm sounds familiar ...) I'm going to somehow
use this black box thing to successfully complete a run of SIDP. But as
noted it's unreliable; I'll need a bunch of previous signatures
(let's pretend that I get that somehow), but I *also* know this thing
doesn't work reliably for every message, so the best I can do is
probably to try to <strong>scam 1000</strong> <strong>people simultaneously</strong>. That way
they might reasonably believe that their successful run represents
proof; after all it's supposed to be <em>impossible</em> to create this kind
of proof without having the private key - that's the entire idea of it!
(the fact that it failed for other people could be just a rumour, after
all!)</p>
<p>So it's all a bit contrived, but weirder scams have paid off - and they
didn't even use literally alien technology!</p>
<p>So, we'll need to read the input to our hash function slot from the
magic box; it's always of the form:</p>
<p><code>message || R-value</code></p>
<p>... details to follow, but basically \(R\) is the starting value in
the SIDP, so we pass it to our skeptical challenger(s). They respond
with \(e\), intended to be completely random to make our job of
proving ID as hard as possible, then <strong>we trick our black box</strong> - we
don't return SHA2(\(m||R\)) but instead we return \(e\). More on
this later, see "random oracle model" in the below. Our magic box
outputs, if successful, \(R, s\) where \(s\) is a new random-looking
value. The challenger will be amazed to see that:</p>
<p>\(sG = R + eP_{satoshi}\)</p>
<p>is true!! And the millions roll in.</p>
<p>If you didn't get in detail how that scam operated, don't worry,
we're going to unpack it, since it's the heart of our technical story
below. The crazy fact is that <strong>our belief that signatures like the
Schnorr signature (and ECDSA is a cousin of it) is mostly reliant on
basically the argument above.</strong></p>
<p>But 'mostly' is an important word there: what we actually do, to make
the argument that it's secure, is stack that argument on top of at
least 2 other arguments of a similar nature (using one algorithm as a
kind of 'magic black box' and feeding it as input to a different
algorithm) and to relate the digital signature's security to the
security of something else which ... we <em>think</em> is secure, but don't
have absolute proof.</p>
<p>Yeah, really.</p>
<p>We'll see that our silly sci-fi story has <em>some</em> practical reality to
it - it really <em>is</em> true that to impersonate is a bit more practically
feasible than to extract private keys, and we can even quantify this
statement, somewhat.</p>
<p>But not the magic cube part. That part was not real at all, sorry.</p>
<h2>Schnorr ID Protocol and signature overview</h2>
<p>I have explained SIDP with reference to core concepts of Sigma Protocols
and Zero Knowledge Proofs of Knowledge in Section 3.2
<a href="https://github.com/AdamISZ/from0k2bp">here</a>
. A more thorough explanation can be found in lots of places, e.g.
Section 19.1 of <a href="https://crypto.stanford.edu/~dabo/cryptobook/">Boneh and
Shoup</a>.
Reviewing the basic idea, cribbing from my own doc:</p>
<p>Prover \(\mathbf{P}\) starts with a public key \(P\) and a
corresponding private key \(x\) s.t. \(P = xG\).</p>
<p>\(\mathbf{P}\) wishes to prove in zero knowledge, to verifier
\(\mathbf{V}\), that he knows \(x\).</p>
<p>\(\mathbf{P}\) → \(\mathbf{V}\): \(R\) (a new random curve
point, but \(\mathbf{P}\) knows \(k\) s.t. \(R = kG\))</p>
<p>\(\mathbf{V}\) → \(\mathbf{P}\): \(e\) (a random scalar)</p>
<p>\(\mathbf{P}\) → \(\mathbf{V}\): \(s\) (which \(\mathbf{P}\)
calculated from the equation \(s = k + ex\))</p>
<p>Note: the transcript of the conversation would here be: \((R, e,
s)\).</p>
<p>Verification works fairly trivially: verifier checks sG
\(\stackrel{?}{=} R+eP\). See previously mentioned doc for details on
why this is supposedly <em>zero knowledge</em>, that is to say, the verifier
doesn't learn anything about the private key from the procedure.</p>
<p>As to why it's sound - why does it really prove that the Prover knows
\(x\), see the same doc, but in brief: if we can convince the prover
to re-run the third step with a modified second step (but the same first
step!), then he'll be producing a second signature \(s'\) on a
second random \(e'\), but with the same \(k\) and \(R\), thus:</p>
<p>\(x = \frac{s-s'}{e-e'}\)</p>
<p>So we say it's "sound" in the specific sense that only a
knower-of-the-secret-key can complete the protocol. But more on this
shortly!</p>
<p>What about the famous "Schnorr signature"? It's just an
noninteractive version of the above. There is btw a brief summary in
<a href="https://web.archive.org/web/20200428212652/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">this</a>
earlier blog post, also. Basically replace \(e\) with a hash (we'll
call our hash function \(H\)) of the commitment value \(R\) and the
message we want to sign \(m\):</p>
<p>\(e = H(m||R)\)</p>
<p>; as mentioned in the just-linked blog post, it's also possible to add
other stuff to the hash, but these two elements at least are necessary
to make a sound signature.</p>
<p>As was noted in the 80s by <a href="https://link.springer.com/content/pdf/10.1007%2F3-540-47721-7_12.pdf">Fiat and
Shamir</a>,
this transformation is generic to any zero-knowledge identification
protocol of the "three pass" or sigma protocol type - just use a hash
function to replace the challenge with H(message, commitment) to create
the new signature scheme.</p>
<p>Now, if we want to discuss security, we first have to decide what that
even means, for a signature scheme. Since we're coming at things from a
Bitcoin angle, we're naturally focused on preventing two things:
forgery and key disclosure. But really it's the same for any usage of
signatures. Theorists class security into at least three types (usually
more, these are the most elementary classifications):</p>
<ul>
<li>Total break</li>
<li>Universal forgery</li>
<li>Existential forgery</li>
</ul>
<p>(Interesting historical note: this taxonomy is due to Goldwasser, Micali
and Rackoff - the same authors who introduced the revolutionary notion
of a "Zero Knowledge Proof" in the 1980s.)</p>
<p>Total break means key disclosure. To give a silly example: if \(k=0\)
in the above, then \(s = ex\) and, on receipt of \(s\), the verifier
could simply multiply it by the modular inverse of \(e\) to extract
the private key \(x\). A properly random \(k\) value, or 'nonce',
as explained ad nauseam elsewhere, is critical to the security. Since
this is the worst possible security failure, being secure against it is
considered the weakest notion of "security" (note this kind of
"reverse" reasoning, it is very common and important in this field).</p>
<p>The next weakest notion of security would be security against universal
forgery - the forger should not be able to generate a signature on any
message they are given. We won't mention this too much; we will focus
on the next, stronger notion of "security":</p>
<p>"Security against existential forgery under adaptive chosen message
attack", often shortened to EUF-CMA for sanity (the 'adaptive(ly)'
sometimes seems to be dropped, i.e. understood), is clearly the
strongest notion out of these three, and papers on this topic generally
focus on proving this. "Chosen message" here refers to the idea that
the attacker even gets to choose <em>what</em> message he will generate a
verifying forgery for; with the trivial restriction that it can't be a
message that the genuine signer has already signed.</p>
<p>(A minor point: you can also make this definition more precise with
SUF-CMA (S = "strongly"), where you insist that the finally produced
signature by the attacker is not on the same message as one of the
pre-existing signatures. The famous problem of <strong>signature
malleability</strong> experienced in ECDSA/Bitcoin relates to this, as noted by
Matt Green
<a href="https://blog.cryptographyengineering.com/euf-cma-and-suf-cma/">here</a>.)</p>
<p>I believe there are even stronger notions (e.g. involving active
attacks) but I haven't studied this.</p>
<p>In the next, main section of this post, I want to outline how
cryptographers try to argue that both the SIDP and the Schnorr signature
are secure (in the latter case, with that strongest notion of security).</p>
<h2>Why the Schnorr signature is secure</h2>
<h3>Why the SIDP is secure</h3>
<p>Here, almost by definition, we can see that only the notion of "total
break" makes sense: there is no message, just an assertion of key
ownership. In the context of SIDP this is sometimes called the
"impersonation attack" for obvious reasons - see our reluctant
scammer.</p>
<p>The justification of this is somehow elegantly and intriguingly short:</p>
<blockquote>
<p>The SIDP is secure against impersonation = The SIDP is <em>sound</em> as a
ZKPOK.</p>
</blockquote>
<p>You can see that these are just two ways of saying the same thing. But
what's the justification that either of them are true? Intuitively the
soundness proof tries to isolate the Prover as a machine/algorithm and
screw around with its sequencing, in an attempt to force it to spit out
the secret that we believe it possesses. If we hypothesise an adversary
\(\mathbb{A}\) who <em>doesn't</em> possess the private key to begin with,
or more specifically, one that can pass the test of knowing the key for
any public key we choose, we can argue that there's only one
circumstance in which that's possible: <strong>if \(\mathbb{A}\) can solve
the general Elliptic Curve Discrete Logarithm Problem(ECDLP) on our
curve.</strong> That's intuitively <em>very</em> plausible, but can we prove it?</p>
<h3>Reduction</h3>
<p>(One of a billion variants on the web, taken from
<a href="https://jcdverha.home.xs4all.nl/scijokes/6_2.html">here</a>
:))</p>
<blockquote>
<p>A mathematician and a physicist were asked the following question:</p>
<p>"Suppose you walked by a burning house and saw a hydrant and
a hose not connected to the hydrant. What would you do?"</p>
<p>P: I would attach the hose to the hydrant, turn on the water, and put out
the fire.</p>
<p>M: I would attach the hose to the hydrant, turn on the water, and put out
the fire.</p>
<p>Then they were asked this question:</p>
<p>"Suppose you walked by a house and saw a hose connected to
a hydrant. What would you do?"</p>
<p>P: I would keep walking, as there is no problem to solve.</p>
<p>M: I would disconnect the hose from the hydrant and set the house on fire,
reducing the problem to a previously solved form.
</p>
</blockquote>
<p>The general paradigm here is:</p>
<blockquote>
<p>A protocol X is "reducible to" a hardness assumption Y if a
hypothetical adversary \(\mathbb{A}\) who can break X can also
violate Y.</p>
</blockquote>
<p>In the concrete case of X = SIDP and Y = ECDLP we have nothing to do,
since we've already done it. SIDP is intrinsically a test that's
relying on ECDLP; if you can successfully impersonate (i.e. break SIDP)
on any given public key \(P\) then an "Extractor" which we will now
call a <strong>wrapper</strong>, acting to control the environment of
\(\mathbb{A}\) and running two executions of the second half of the
transcript, as already described above, will be able to extract the
private key/discrete log corresponding to \(P\). So we can think of
that Extractor itself as a machine/algorithm which spits out the \(x\)
after being fed in the \(P\), in the simple case where our
hypothetical adversary \(\mathbb{A}\) is 100% reliable. In this
specific sense:</p>
<blockquote>
<p><strong>SIDP is reducible to ECDLP</strong></p>
</blockquote>
<p>However, in the real world of cryptographic research, such an analysis
is woefully inadequate; because to begin with ECDLP being "hard" is a
computational statement: if the group of points on the curve is only of
order 101, it is totally useless since it's easy to compute all
discrete logs by brute force. So, if ECDLP is "hard" on a group of
size \(2^k\), let's say its hardness is measured as the probability
of successfully cracking by guessing, i.e. \(2^{-k}\) (here
<strong>deliberately avoiding</strong> the real measure based on smarter than pure
guesses, because it's detail that doesn't affect the rest). Suppose
\(\mathbb{A}\) has a probability of success \(\epsilon\); what
probability of success does that imply in solving ECDLP, in our
"wrapper" model? Is it \(\epsilon\)?</p>
<p>No; remember the wrapper had to actually extract <strong>two</strong> successful
impersonations in the form of valid responses \(s\) to challenge
values \(e\). We can say that the wrapper <strong>forks</strong> \(\mathbb{A}\):</p>
<p><img alt="Fork your sigma protocol if you want
fork" src="https://web.archive.org/web/20200428212652im_/https://joinmarket.me/static/media/uploads/.thumbnails/forking.png/forking-659x466.png"></p>
<p><em>Fork your sigma protocol if you want fork</em></p>
<p>Crudely, the success probability is \(\epsilon^2\); both of those
impersonations have to be successful, so we multiply the probabilities.
(More exact: by a subtle argument we can see that the size of the
challenge space being reduced by 1 for the second run of the protocol
implies that the probability of success in that second run is reduced,
and the correct formula is \(\epsilon^2 - \frac{\epsilon}{n}\),
where \(n\) is the size of the hash function output space; obviously
this doesn't matter too much).</p>
<p>How does this factor into a real world decision? We have to go back to
the aforementioned "reverse thinking". The reasoning is something
like:</p>
<ul>
<li>We believe ECDLP is hard for our group, let's say we think you
can't do better than p = \(p\) (I'll ignore running time and
just use probability of success as a measure, for simplicity).</li>
<li>The above reduction implies that <em>if</em> we can break SIDP with prob
\(\epsilon\), we can also break ECDLP with prob \(\simeq
\epsilon^2\).</li>
<li>This reduction is thus <strong>not tight</strong> - if it's really the case that
"the way to break SIDP is only to break ECDLP" then a certain
hardness \(p\) only implies a hardness \(\sqrt{p}\) for SIDP,
which we may not consider sufficiently improbable (remember that if
\(p=2^{-128}\), it means halving the number of bits: \(\sqrt{p}
=2^{-64}\)). See
<a href="https://crypto.stackexchange.com/questions/14439/proofs-by-reduction-and-times-of-adversaries">here</a>
for a nice summary on "non-tight reductions".</li>
<li>And <em>that</em> implies that if I want 128 bit security for my SIDP, I
need to use 256 bits for my ECDLP (so my EC group, say). This is all
handwavy but you get the pattern: these arguments are central to
deciding what security parameter is used for the underlying hardness
problem (here ECDLP) when it's applied in practice to a specific
protocol (here SIDP).</li>
</ul>
<p>I started this subsection on "reductions" with a lame math joke; but I
hope you can see how delicate this all is ... we start with something
we believe to be hard, but then "solve" it with a purely hypothetical
other thing (here \(\mathbb{A}\) ), and from this we imply a two-way
connection (I don't say <em>equivalence</em>; it's not quite that) that we
use to make concrete decisions about security. Koblitz (he of the 'k'
in secp256k1) had some interesting thoughts about 'reductionist'
security arguments in Section 2.2 and elsewhere in
<a href="https://cr.yp.to/bib/2004/koblitz.pdf">this</a>
paper. More from that later.</p>
<p>So we have sketched out how to think about "proving our particular SIDP
instance is/isn't secure based on the intractability of ECDLP in the
underlying group"; but that's only 2 stacks in our jenga tower; we
need MOAR!</p>
<h2>From SIDP to Schnorr signature</h2>
<p>So putting together a couple of ideas from previous sections, I hope it
makes sense to you now that we want to prove that:</p>
<blockquote>
<p>"the (EC) Schnorr signature has existential unforgeability against
chosen message attack (EUFCMA) <strong>if</strong> the Schnorr Identity Protocol is
secure against impersonation attacks."</p>
</blockquote>
<p>with the understanding that, if we succeed in doing so, we have proven
also:</p>
<blockquote>
<p>"the (EC) Schnorr signature has existential unforgeability against
chosen message attack (EUFCMA) <strong>if</strong> the Elliptic Curve discrete
logarithm problem is hard in our chosen EC group."</p>
</blockquote>
<p>with the substantial caveat, as per the previous section, that the
reduction involved in making this statement is not tight.</p>
<p>(there is another caveat though - see the next subsection, <em>The Random
Oracle Model</em>).</p>
<p>This second (third?) phase is much less obvious and indeed it can be
approached in more than one way.
<a href="https://crypto.stanford.edu/~dabo/cryptobook/">Boneh-Shoup</a>
deals with it in a lot more detail; I'll use this as an outline but
dumb it down a fair bit. There is a simpler description
<a href="http://web.stanford.edu/class/cs259c/lectures/schnorr.pdf">here</a>.</p>
<p>The "CMA" part of "EUFCMA" implies that our adversary
\(\mathbb{A}\), who we are now going to posit has the magical ability
to forge signatures (so it's the black cube of our preamble), should be
able to request signatures on an arbitrarily chosen set of messages
\(m_i\), with \(i\) running from 1 to some defined number \(S\).
But we must also allow him to make queries to the hash function, which
we idealise as a machine called a "random oracle". Brief notes on that
before continuing:</p>
<h3>Aside: The Random Oracle Model</h3>
<p>Briefly described
<a href="https://en.wikipedia.org/wiki/Random_oracle">here</a>
. It's a simple but powerful idea: we basically idealise how we want a
cryptographic hash function \(f\) to behave. We imagine an output
space for \(f\) of size \(C\). For any given input \(x\) from a
predefined input space of one or more inputs, we will get a
deterministic output \(y\), but it should be unpredictable, so we
imagine that the function is <em>randomly</em> deterministic. Not a
contradiction - the idea is only that there is no <strong>public</strong> law or
structure that allows the prediction of the output without actually
passing it through the function \(f\). The randomness should be
uniform.</p>
<p>In using this in a security proof, we encounter only one problem: we
will usually want to model \(f\) by drawing its output \(y\) from a
uniformly random distribution (you'll see lines like \(y
\stackrel{\$}{\leftarrow} \mathbb{Z}_N\) in papers, indicating
\(y\) is set randomly). But in doing this, we have set the value of
the output for that input \(x\) permanently, so if we call \(f\)
again on the same \(x\), whether by design or accident, we <em>must</em>
again return the same "random" \(y\).</p>
<p>We also find sometimes that in the nature of the security game we are
playing, one "wrapper" algorithm wants to "cheat" another, wrapped
algorithm, by using some hidden logic to decide the "random" \(y\)
at a particular \(x\). This <em>can</em> be fine, because to the "inner"
algorithm it can look entirely random. In this case we sometimes say we
are "<strong>patching the value of the RO at \(x\) to \(y\)"</strong> to
indicate that this artificial event has occurred; as already mentioned,
it's essential to remember this output and respond with it again, if a
query at \(x\) is repeated.</p>
<p>Finally, this "perfectly random" behaviour is very idealised. Not all
cryptographic protocols involving hash functions require this behaviour,
but those that do are said to be "secure in the random oracle model
(ROM)" or similar.</p>
<h3>Wrapping A with B</h3>
<p><img alt="B tries to win the impersonation game against C, by wrapping the
signature forger
A" src="https://web.archive.org/web/20200428212652im_/https://joinmarket.me/static/media/uploads/.thumbnails/EUFCMA1.png/EUFCMA1-584x413.png"></p>
<p>So we now wrap \(\mathbb{A}\) with \(\mathbb{B}\).
And \(\mathbb{B}\)'s job will be to succeed at winning the SIDP
"game" against a challenger \(\mathbb{C}\) .</p>
<p>Now \(\mathbb{A}\) is allowed \(S\) signing queries; given his
messages \(m_i\), we can use \(S\) eavesdropped conversations \(R,
e, s\) from the actual signer (or equivalently, just forge transcripts
- see "zero knowledgeness" of the Schnorr signature), and for each,
\(\mathbb{B}\) can patch up the RO to make these transcripts fit
\(\mathbb{A}\)'s requested messages; just do
\(H(m_i||R_i)=e_i\). Notice that this part of the process represents
\(S\) queries to the random oracle.</p>
<p>Observe that \(\mathbb{B}\) is our real "attacker" here: he's the
one trying to fool/attack \(\mathbb{C}\) 's identification
algorithm; he's just using \(\mathbb{A}\) as a black box (or cube,
as we say). We can say \(\mathbb{A}\) is a "subprotocol" used by
\(\mathbb{B}\).</p>
<p>It's all getting a bit complicated, but by now you should probably have
a vague intuition that this will work, although of course not reliably,
and as a function of the probability of \(\mathbb{A}\) being able to
forge signatures of course (we'll again call this \(\epsilon\)).</p>
<h3>Toy version: \(\epsilon = 1\)</h3>
<p>To aid understanding, imagine the simplest possible case, when
\(\mathbb{A}\) works flawlessly. The key \(P\) is given to him and
he chooses a random \(k, R =kG\), and also chooses his message \(m\)
as is his right in this scenario. The "CMA" part of EUF-CMA is
irrelevant here, since \(\mathbb{A}\) can just forge immediately
without signature queries:</p>
<ul>
<li>\(\mathbb{A}\) asks for the value of \(H(m||R)\), by passing
across \(m,R\) to \(\mathbb{B}\).</li>
<li>\(\mathbb{B}\) receives this query and passes \(R\) as the
first message in SIDP to \(\mathbb{C}\) .</li>
<li>\(\mathbb{C}\) responds with a completely random challenge value
\(e\).</li>
<li>\(\mathbb{B}\) "patches" the RO with \(e\) as the output for
input \(m, R\), and returns \(e\) to \(\mathbb{A}\) .</li>
<li>\(\mathbb{A}\) takes \(e\) as \(H(m||R)\), and provides a
valid \(s\) as signature.</li>
<li>\(\mathbb{B}\) passes \(s\) through to \(\mathbb{C}\) , who
verifies \(sG = R + eP\); identification passed.</li>
</ul>
<p>You can see that nothing here is new except the random oracle patching,
which is trivially non-problematic as we make only one RO query, so
there can't be a conflict. The probability of successful impersonation
is 1.</p>
<p>Note that this implies the probability of successfully breaking ECDLP is
also \(\simeq 1\). We just use a second-layer wrapper around
\(\mathbb{B}\), and fork its execution after the provision of
\(R\), providing two separate challenges and thus in each run getting
two separate \(s\) values and solving for \(x\), the private
key/discrete log as has already been explained.</p>
<p>Why \(\simeq\)? As noted on the SIDP to ECDLP reduction above, there
is a tiny probability of a reused challenge value which must be factored
out, but it's of course negligible in practice.</p>
<p>If we assert that the ECDLP is not trivially broken in reasonable time,
we must also assert that such a powerful \(\mathbb{A}\) does not
exist, given similarly limited time (well; <em>in the random oracle model</em>,
of course...).</p>
<h3>Full CMA case, \(\epsilon << 1\)</h3>
<p>Now we give \(\mathbb{A}\) the opportunity to make \(S\) signing
queries (as already mentioned, this is what we mean by an "adaptive
chosen message attack"). The sequence of events will be a little longer
than the previous subsection, but we must think it through to get a
sense of the "tightness of the reduction" as already discussed.</p>
<p>The setup is largely as before: \(P\) is given. There will be \(h\)
RO queries allowed (additional to the implicit ones in the signing
queries).</p>
<ul>
<li>For any signing query from \(\mathbb{A}\), as we covered in
"Wrapping A with B", a valid response can be generated by patching
the RO (or using real transcripts). We'll have to account for the
possibility of a conflict between RO queries (addressed below), but
it's a minor detail.</li>
<li>Notice that as per the toy example previously, during
\(\mathbb{A}\)'s forgery process, his only interaction with his
wrapper \(\mathbb{B}\) is to request a hash value
\(H(m||R)\). So it's important to understand that, first
because of the probabilistic nature of the forgery (\(\epsilon
<< 1\)), and second because \(\mathbb{A}\)'s algorithm is
unknown, <strong>\(\mathbb{B}\) does not know which hash function query
(and therefore which RO response) will correspond to a successful
forgery.</strong> This isn't just important to the logic of the game; as
we'll see, it's a critical limitation of the security result we
arrive at.</li>
<li>So to address the above, \(\mathbb{B}\) has to make a decision
upfront: which query should I use as the basis of my impersonation
attempt with \(\mathbb{C}\)? He chooses an index \(\omega\
\in 1..h\).</li>
<li>There will be a total of \(S+h+1\) queries to the random oracle,
at most (the +1 is a technical detail I'll ignore here). We
discussed in the first bullet point that if there is a repeated
\(m, R\) pair in one of the \(S\) signing queries, it causes a
"conflict" on the RO output. In the very most pessimistic
scenario, the probability of this causing our algorithm to fail can
be no more than \(\frac{S+h+1}{n}\) for each individual signing
query, and \(\frac{S(S+h+1)}{n}\) for all of them (as before we
use \(n\) for the size of the output space of the hash function).</li>
<li>So \(\mathbb{B}\) will <strong>fork</strong> \(\mathbb{A}\)'s execution,
just as for the SIDP \(\rightarrow\) ECDLP reduction, <strong>at index
\(\omega\)</strong>, without knowing in advance whether \(\omega\) is
indeed the index at the which the hash query corresponds to
\(\mathbb{A}\)'s final output forgery. There's a \(1/h\)
chance of this guess being correct. So the "partial success
probability", if you will, for this first phase, is
\(\epsilon/h\), rather than purely \(\epsilon\), as we had for
the SIDP case.</li>
<li>In order to extract \(x\), though, we need that the execution
<em>after</em> the fork, with the new challenge value, at that same index
\(\omega\), also outputs a valid forgery. What's the probability
of both succeeding together? Intuitively it's of the order of
\(\epsilon^2\) as for the SIDP case, but clearly the factor
\(1/h\), based on accounting for the guessing of the index
\(\omega\), complicates things, and it turns out that the
statistical argument is rather subtle; you apply what has been
called the <strong>Forking Lemma</strong>, described on
<a href="https://en.wikipedia.org/wiki/Forking_lemma">Wikipedia</a>
and with the clearest statement and proof in
<a href="https://cseweb.ucsd.edu/~mihir/papers/multisignatures-ccs.pdf">this</a>
paper of Bellare-Neven '06. The formula for the success probability
of \(\mathbb{B}\) turns out to be:</li>
</ul>
<blockquote>
<p>\(\epsilon_{\mathbb{B}} = \epsilon\left(\frac{\epsilon}{h} -
\frac{1}{n}\right)\)</p>
</blockquote>
<ul>
<li><a href="https://crypto.stanford.edu/~dabo/cryptobook/">Boneh-Shoup</a>
in Section 19.2 bundle this all together (with significantly more
rigorous arguments!) into a formula taking account of the Forking
Lemma, the accounting for collisions in the signing queries, to
produce the more detailed statement, where \(\epsilon\) on the
left here refers to the probability of success of \(\mathbb{B}\),
and "DLADv" on the right refers to the probability of success in
solving the discrete log. The square root term of course corresponds
to the "reduction" from Schnorr sig. to ECDLP being roughly a
square:</li>
</ul>
<blockquote>
<p>\(\epsilon \le \frac{S(S+h+1)}{n} + \frac{h+1}{n} +
\sqrt{(h+1)\ \times \ \textrm{DLAdv}}\)</p>
</blockquote>
<p>So in summary: we see that analysing the full CMA case in detail is
pretty complicated, but by far the biggest take away should be: <strong>The
security reduction for Schnorr sig to ECDLP has the same
\(\epsilon^2\) dependency, but is nevertheless far less tight,
because the success probability is also reduced by a factor \(\simeq
h\) due to having to guess which RO query corresponds to the successful
forgery.</strong></p>
<p>(<em>Minor clarification: basically ignoring the first two terms on the RHS
of the preceding as "minor corrections", you can see that DLAdv is
very roughly \(\epsilon^2/h\)</em>).</p>
<p>The above bolded caveat is, arguably, very practically important, not
just a matter of theory - because querying a hash function is something
that it's very easy for an attacker to do. If the reduction loses
\(h\) in tightness, and the attacker is allowed \(2^{60}\) hash
function queries (note - they can be offline), then we (crudely!) need
60 bits more of security in our underlying cryptographic hardness
problem (here ECDLP); at least, <em>if</em> we are basing our security model on
the above argument.</p>
<p>Although I haven't studied it, <a href="https://eprint.iacr.org/2012/029">the 2012 paper by Yannick
Seurin</a>
makes an argument (as far as I understand) that we cannot do better than
this, in the random oracle model, i.e. the factor of \(h\) cannot be
removed from this security reduction by some better kind of argument.</p>
<h2>Summary - is Schnorr secure?</h2>
<p>For all that this technical discussion has exposed the non-trivial guts
of this machine, it's still true that the argument provides some pretty
nice guarantees. We can say something like "Schnorr is secure if:"</p>
<ul>
<li>The hash function behaves to all intents and purposes like an ideal
random oracle as discussed</li>
<li>The ECDLP on our chosen curve (secp256k1 in Bitcoin) is hard to the
extent we reasonably expect, given the size of the curve and any
other features it has (in secp256k1, we hope, no features at all!)</li>
</ul>
<p>This naturally raises the question "well, but how hard <em>is</em> the
Elliptic Curve discrete logarithm problem, on secp256k1?" Nobody really
knows; there are known, standard ways of attacking it, which are better
than brute force unintelligent search, but their "advantage" is a
roughly known quantity (see e.g. <a href="https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm">Pollard's
rho</a>).
What there isn't, is some kind of proof "we know that \(\nexists\)
algorithm solving ECDLP on (insert curve) faster than \(X\)".</p>
<p>Not only don't we know this, but it's even rather difficult to make
statements about analogies. I recently raised the point on
#bitcoin-wizards (freenode) that I thought there must be a relationship
between problems like RSA/factoring and discrete log finding on prime
order curves, prompting a couple of interesting responses, agreeing that
indirect evidence points to the two hardness problems being to some
extent or other connected. Madars Virza kindly pointed out a
<a href="https://wstein.org/projects/john_gregg_thesis.pdf#page=43">document</a>
that details some ideas about the connection (obviously this is some
pretty highbrow mathematics, but some may be interested to investigate
further).</p>
<h2>What about ECDSA?</h2>
<p>ECDSA (and more specifically, DSA) were inspired by Schnorr, but have
design decisions embedded in them that make them <em>very</em> different when
it comes to security analysis. ECDSA looks like this:</p>
<blockquote>
<p>\(s = k^{-1}\left(H(m) + rx\right), \quad r=R.x, \ R = kG\)</p>
</blockquote>
<p>The first problem with trying to analyse this is that it doesn't
conform to the
three-move-sigma-protocol-identification-scheme-converts-to-signature-scheme-via-Fiat-Shamir-transform.
Why? Because the hash value is \(H(m)\) and doesn't include the
commitment to the nonce, \(R\). This means that the standard
"attack" on Schnorr, via rewinding and resetting the random oracle
doesn't work. This doesn't of course mean, that it's insecure -
there's another kind of "fixing" of the nonce, in the setting
of\(R.x\). This latter "conversion function" kind of a random
function, but really not much like a hash function; it's trivially
"semi-invertible" in as much as given an output x-coordinate one can
easily extract the two possible input R-values.</p>
<p>Some serious analysis has been done on this, for the obvious reason that
(EC)DSA is <strong>very widely used in practice.</strong> There is work by
<a href="https://www.iacr.org/archive/pkc2003/25670309/25670309.pdf">Vaudenay</a>
and
<a href="https://www.cambridge.org/core/books/advances-in-elliptic-curve-cryptography/on-the-provable-security-of-ecdsa/69827A20CC94C54BBCBC8A51DBAF075A">Brown</a>
(actually a few papers but I think most behind academic paywalls) and
most recently <a href="https://dl.acm.org/citation.cfm?doid=2976749.2978413">Fersch et
al</a>.
Fersch gave a talk on this work
<a href="https://www.youtube.com/watch?v=5aUPBT4Rdr8">here</a>
.</p>
<p>The general consensus seems to be "it's very likely secure - but
attempting to get a remotely "clean" security reduction is very
difficult compared to Schnorr".</p>
<p>But wait; before we trail off with an inaudible mumble of "well, not
really sure..." - there's a crucial logical implication you may not
have noticed. Very obviously, ECDSA is not secure if ECDLP is not secure
(because you just get the private key; game over for any signature
scheme). Meanwhile, in the long argument above we <strong>reduced</strong> Schnorr to
ECDLP. This means:</p>
<blockquote>
<p><strong>If ECDSA is secure, Schnorr is secure, but we have no security
reduction to indicate the contrary.</strong></p>
</blockquote>
<p>The aforementioned Koblitz paper tells an interesting historical
anecdote about all this, when the new DSA proposal was first put forth
in '92 (emphasis mine):</p>
<blockquote>
<p>"At the time, the proposed standard --- which soon after became the
first digital signature algorithm ever approved by the industrial
standards bodies --- encountered stiff opposition, especially from
advocates of RSA signatures and from people who mistrusted the NSA's
motives. Some of the leading cryptographers of the day tried hard to
find weaknesses in the NIST proposal. A summary of the most important
objections and the responses to them was published in the Crypto'92
proceedings[17]. The opposition was unable to find any significant
defects in the system. <strong>In retrospect, it is amazing that none of the
DSA opponents noticed that when the Schnorr signature was modified,
the equivalence with discrete logarithms was
lost.</strong></p>
</blockquote>
<h2>More exotic constructions</h2>
<p>In a future blog post, I hope to extend this discussion to other
constructions, which are based on Schnorr in some way or other, in
particular:</p>
<ul>
<li>The AOS ring signature</li>
<li>The Fujisaki-Suzuki, and the cryptonote ringsig</li>
<li>the Liu-Wei-Wong, and the Monero MLSAG (via Adam Back) ringsig</li>
<li>The MuSig multisignature</li>
</ul>
<p>While these are all quite complicated (to say the least!), so no
guarantee of covering all that, the security arguments follow similar
lines to the discussion in this post. Of course ring signatures have
their own unique features and foibles, so I will hopefully cover that a
bit, as well as the security question.</p>Finessing commitments2019-01-15T00:00:00+01:002019-01-15T00:00:00+01:00Adam Gibsontag:joinmarket.me,2019-01-15:/blog/blog/finessing-commitments/<p>discussion of properties of commitment schemes as applied to Bitcoin</p><h3>Finessing commitments</h3>
<h2>Introduction</h2>
<p>This post was mostly prompted by a long series of discussions had online
and in person with many people, including in particular Adam Back and
Tim Ruffing (but lots of others!) - and certainly not restricted to
discussions I took part in - about the tradeoffs in a version of Bitcoin
that does actually use Confidential Transactions.</p>
<p>The topic that keeps recurring is: exactly what level of safety against
<em>hidden inflation</em> does CT offer, in principle and in practice; this is
closely related to what level of privacy it offers, too, but the hidden
inflation is what gets people thinking, first and foremost.</p>
<p>My goal here is to explain to people who are not completely up to speed
with what causes this discussion; to explain, in detail but without
assuming <em>too</em> much pre-knowledge, what the heart of the tradeoff is,
and in the last couple of sections, how we might get around it. How we
might "have our cake and eat it" so to speak.</p>
<p><em>You'll find a lot of the ideas in first three sections of this blog
post, although not all of them, in my write up on
<a href="https://github.com/AdamISZ/from0k2bp">Bulletproofs</a>
section 3, and I also went over the basics in the early part of my
London talk on the <a href="https://www.youtube.com/watch?v=mLZ7qVwKalE">Schnorr
signature</a>
(since they're not visible, you'd want to see the
<a href="../../../images/schnorrplus.pdf">slides</a>
for that talk if you watch it).</em></p>
<p><em>You should have </em><em>some</em><em> general idea about what Confidential
Transactions is, in order to understand the second half - in particular
you should understand that amounts are hidden under Pedersen
commitments, although we'll go through some of it here in the early
sections.</em></p>
<h2>Commitments - the basic ideas</h2>
<p>A commitment fixes a value in advance, without revealing it. Think of
flipping a coin and covering it with your hand as part of a gamble or
game. Covering it with your hand means it's invisible. The visibility of
your hand not moving, and pressed against a surface over the coin, on
the other hand, ensures you can't cheat, because you can't flip the coin
under your hand. This is the physical representation of the two ideas of
a commitment; everything that cryptographers call a commitment has those
two properties, but defended by very strong mathematical "guarantees"
(more on those quote marks shortly!), instead of by a crappy physical
analogue like a pressed-down hand:</p>
<ul>
<li><code>Hiding</code> - nobody but the committer can see or infer the actual
value being committed</li>
<li><code>Binding</code> - the committer can't change the value after the
commitment is published.</li>
</ul>
<p>The most "vanilla" way to implement this primitive is to use a
cryptographically secure hash function, say \(\mathbb{H}\). You'd
make up a random number \(r\) and combine it with your secret value
\(x\), and the commitment would just be something like
\(\mathbb{H}(x||r)\), where || indicates just concatenating the
data together.</p>
<p>For background, commitments are usually defined in the literature as a
tuple of three algorithms: Setup, Commit and Verify/Open, where the last
one needs the committing party to give what's called the "opening" of
the commitment as input. It should be clear what is meant when I say
that the "opening" of the above commitment is just literally the two
values \(x\) and \(r\). (It's like taking your hand off the coin in
our analogue).</p>
<p>Because (cryptographically secure) hash-functions are
collision-resistant (avoiding technical definitions for brevity here),
you can't open a commitment you already made to \(x\), with a
different value \(x'\), i.e. as committer you can't cheat, because
even though you're free to make your cheating opening operation with any
\(r'\) you like (it wasn't published in advance), you just can't find
any combination \(x',r'\) such that
\(\mathbb{H}(x||r)=\mathbb{H}(x'||r')\). That's why you need a
proper, strong hash function - to make that computationally infeasible.</p>
<h2>Homomorphic commitments</h2>
<p>The strict definition of homomorphism requires going into group theory a
bit, which isn't needed here, but basically a homomorphism should be
thought of as a function that takes you from one group into another
while keeping the group operation intact (so it's closely related to
symmetry). A toy example would be the homomorphism from the group of
integers under addition (\(\mathbb{Z},+)\) to the group of integers
modulo two \(\mathbb{Z}_2\) (a very simple group!: two members: 0,
1; each is self-inverse; 0+1=1+0 = 1, 0+0=1+1 =0). What is that
homomorphism? It's just a function that returns 0 if the input integer
is even, and 1 if the input integer is odd. We lost everything about the
integers except their evenness/oddness but we kept the group operation.
Call that function \(f()\) and we clearly have \(f(a+b)=f(a)+f(b)\)
because even + odd = odd, odd + even = odd, even + even = odd + odd =
even.</p>
<p>But homomorphisms need not collapse a larger group into a smaller one,
the "throw away everything except X" can still conceivably mean throwing
away nothing, i.e. mapping from one group to another with the same order
- and the homomorphism that's relevant to this discussion fits that
description: it translates members of the group of integers modulo
\(N\) under addition, to the group of elliptic curve points of order
\(N\), under elliptic curve point addition: \(a \in \mathbb{Z}_N
\ ; a \rightarrow aG\), where \(G\) is the so-called generator
point of the elliptic curve (exactly which point on the curve is taken
for this role is irrelevant, but the definition has to include one, and
we call it \(G\)), and implicit in the notation \(aG\) is the fact
that it means a <em>scalar multiplication</em> of \(G\) by \(a\), i.e.
\(G\) added to itself \(a\) times.</p>
<p>In Bitcoin, the curve in question is secp256k1, but that's not important
here (except perhaps the fact that \(N\) is prime, meaning both groups
are isomorphic to the cyclic group of order \(N\)).</p>
<p>So this is all getting technical, but all it really boils down to is
\((a+b)G = aG + bG\) is an equation that holds, here.</p>
<p>What about commitments? We can treat the above homomorphism as <em>similar</em>
to a cryptographic hash function, in that we can <em>assume</em> that it's not
possible to derive the integer \(a\) given only the curve point
\(aG\) - this assumes that the "elliptic curve discrete logarithm
problem" is hard (ECDLP for short; also, that's kind of the definition
of that problem).</p>
<p>In making that assumption, we can go ahead and apply the same paradigm:
take a random \(r\) for hiding, then take a second generator point
\(H\) and write our commitment as \(xG + rH\) (addition not
concatenation; publishing the two curve points separately would defeat
the purpose; \(r\) is supposed to be helping to hide \(x\)!).</p>
<p>And we note how the homomorphism between an integer and the scalar
multiple of \(G\) by that integer carries over to this composite case
of the sum of two points: \(x_{1}G + r_{1}H + x_{2}G + r_{2}H =
(x_{1}+x_{2})G + (r_{1}+r_{2})H\).</p>
<p>So it means the commitments, which we can denote \(C(x, r)\) for
brevity, <strong>are homomorphic</strong> too. The sum of two commitments is equal to
the commitment to the sum (in this sentence, notice how "sum" refers to
the sum of two integers; but the homomorphism allows that to "carry
over" to "sum" as in elliptic curve point addition). Algebraically we
can condense it to: \(C(x_1,r_1)+C(x_2,r_2) =
C(x_{1}+x_{2},r_{1}+r_{2})\).</p>
<p>This specific form of commitment is known as the <strong>Pedersen
commitment</strong>. It is most commonly referred to in the finite field, non
elliptic curve form (where \(C=g^{x}h^{r}\) rather than \(C=xG +
rH\)), but it's the same thing.</p>
<p>And to re-emphasise what at this point should be obvious - <strong>none</strong> of
the above applies to commitments built with cryptographic hash
functions.</p>
<h2>Imperfect commitments</h2>
<p>The first imperfection in the idea above of the Pedersen commitment: the
second generator point \(H\). In practice it has been calculated in a
<a href="https://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number">NUMS</a>
way, using some kind of hash of the defined generator point \(G\). The
idea is that if this hash function \(\mathbb{H}\) is secure as
described above, \(H\) cannot be reverse-engineered such that its
discrete log (that's to say, \(\gamma\) s.t. \(H=\gamma G\)). And
while this seems like a side-note, we can use this to lead in to the
subtleties which are the main topic of this blog post.</p>
<p>Consider (and this is an excellent exercise for those <em>somewhat</em>
familiar with basic elliptic curve operations as used in Bitcoin and
similar, but not yet seasoned in it): if you secretly knew that
\(\gamma\) and no one else did, and Pedersen commitments were being
used, how could you use this knowledge to gain advantage?</p>
<p>(Spoiler space)</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>.</p>
<p>The answer is that the a commitment would lose its <strong>binding</strong> property.
Remember: the binding property is the defence the receiver/verifier of
the commitment has against the creator/committer. So you would be able
to make a commitment randomly - just take any random "private key", call
it \(y\), and publish its "public key" point \(yG\) as \(C\). Then
when asked to verify later, you could pretend that you had bound this
commitment to any \(x\) as follows: Take your newly chosen \(x\),
and calculate what \(r\) has to be for that to work:</p>
<p>\(xG + rH = C\)
\(xG + r\gamma G = yG\)
\(\therefore\)
\(r = \gamma^{-1}(y-x)\)</p>
<p>There are two thoughts that may spring out of this:</p>
<ul>
<li>A Pedersen commitment then, is only sound in as much as the relative
discrete log between \(H\) and \(G\) is unknown. And as is often
discussed, the aforementioned ECDLP is almost certainly broken by a
sufficiently powerful quantum computer (although there might
conceivably other ways to compute a discrete log quickly, as yet
unknown to mathematicians). We say that Pedersen commitments are
only <span style="text-decoration: underline;">computationally
binding</span> - this means, only binding inasmuch as breaking their
binding requires an unrealistic amount of computational power - to
an adversary with infinite computing power, they are <strong>not</strong>
binding, at all.</li>
<li>You may think - what about hiding? Surely knowing this "secret
unlocking key" \(\gamma\) can break that as well. The answer is
an emphatic <strong>no</strong> - but what might be really surprising is: <em>the
answer is no in the case of hiding for exactly the same reason that
the answer is yes, in the case of binding!</em></li>
</ul>
<p>I'm first going to explain that formally, but then we'll take a step
back, and use a picture and perhaps some examples to flesh out what's
really going on here, because it's the heart of the story we're telling.</p>
<p>The reason hiding is not lost is because an adversary on the
receiving/verifying side who wants to "sneak peek" inside \(C\) to
find out the real \(x\) faces an insurmountable problem: <span
style="text-decoration: underline;">there isn't only one answer!</span>.
If the answer is \(x\) then the \(r\) value is, as already
explained, \(\gamma^{-1}(y-x)\). Remember, a computationally unbound
attacker can find those discrete logs - can get \(y\) from \(C\),
and can get \(\gamma\) from \(H\). So <em>any</em> \(x\) will have a
corresponding \(r\). And of course you can flip it backwards - if he
fixes a particular \(r\) he can get the corresponding \(x\). Where
is this slipperiness coming from? What's it's fundamental cause?</p>
<p>Think about functions. They take inputs to outputs of course, and we
generalise by talking about input spaces and output spaces (by "space"
here I mean something like "the set of all..."). The technical terms
used are domain/range and sometimes preimage space and image space. The
input space for the private->public key "function" in Bitcoin is of
course the set of all integers modulo \(N\). And then output space is
the set of all public keys, which is the set of all points on the curve
secp256k1. Here we have what's called a "one-one" mapping (technically
it's both "one-one" and also "onto", since all points in the output
space are mapped to). But only "nice" functions are like that, not all
are. Some, in particular, have <strong>more than one input corresponding to
the same output</strong>.</p>
<p>And that's exactly what's happening with the Pedersen commitment;
moreover, there's a very concrete reason <em>why</em> the Pedersen commitment
has that property of having multiple inputs for the same output - it
logically has to! Consider this diagram:</p>
<p><img src="https://web.archive.org/web/20200428225915im_/https://joinmarket.me/static/media/uploads/.thumbnails/InputOutput1.png/InputOutput1-689x487.png" width="689" height="487" alt="Input space larger than output space" /></p>
<p>By the <a href="https://en.wikipedia.org/wiki/Pigeonhole_principle">pigeonhole
principle</a>,
because there are more inputs than outputs, it's impossible for it <em>not</em>
to be the case that at least some outputs have more than one
corresponding input - they wouldn't all "fit" otherwise. And it's for
this reason that <span style="text-decoration: underline;">a commitment
scheme where the input space is larger than the output space can have
perfect hiding.</span> (notice "can", not "will" - for the hiding to be
*perfect* we need to be leaking zero information about which element
of the input space is being hidden; that needs it to be the case that we
can demonstrate that *any* element of some very large subset of the
input space is part of a valid opening of a given commitment \(C\);
that's true here in Pedersen, but certainly not for some other
commitment schemes).</p>
<p>And for the exact same reason, the binding is only computational - there
are bound to be at least some outputs with more than one input, and so
if nothing else, by exhaustive search, the computationally unbounded
attacker can simply search and find the other input corresponding to the
same output (as per the diagram, if the attacker had the value \(k\)
as commitment, he could find \(x_2, r_2\) even if the commitment was
originally to \(x_1, r_1\). At least that'll be true for <em>some</em>
outputs (in Pedersen, for every output, in fact).</p>
<p>So the Pedersen commitment falls neatly into this category; it has an
input space of 512 bits if the message \(x\) and randomness \(r\)
are both the same size as the group elements and an output space of 256
bits (almost exactly); the outputs \(C\) are the points on the
secp256k1 curve.</p>
<p>(You'll notice how technically "exhaustive search" may not be the actual
process - there can be shortcuts depending on how the commitment is
structured; in the case of Pedersen, because it hinges on the unknown
discrete log \(\gamma\), the attacker can leverage that - he can use
that knowledge to find these "other inputs" directly instead of by
exhaustive search).</p>
<p>What if the input space is <em>smaller</em> than the output space? It takes a
moment of reflection to realise that this idea doesn't make sense. A
function has a single output by definition (note that doesn't mean the
output can't be multiple "things", e.g. it could be 4 integers or 6
curve points or 5 triangles, whatever - but each of them is a single
output). So a function with 10 inputs can't have more than 10 outputs.
Which means we have one case remaining:</p>
<p>What if the input space is <em>the same size as </em>the output space?</p>
<p><img src="https://web.archive.org/web/20200428225915im_/https://joinmarket.me/static/media/uploads/.thumbnails/InputOutput2.png/InputOutput2-688x487.png" width="688" height="487" alt="Input and output space equal size" /></p>
<p>In this case we must have a one-one mapping - again by the pigeonhole
principle (Remember, we are defining the output space as the space of
actual possible outputs, not some larger set; this will usually require
justification - you can justify it here by first considering the second
commitment point \(C_2=rG\) - note that the lines are horizontal for
a reason!). And by the same reasoning as above, but in reverse, we see
that this gives <strong>perfect</strong> binding, and <strong>at best
computational (imperfect) hiding.</strong></p>
<p>What's neat about this reasoning is that none of it is specific to
anything elliptic curve, or discrete log related, or anything - it
applies to <em>any</em> commitment scheme, including the hash-based one we
introduced right at the start. The only difference between the Pedersen
and hash-based case is because of the messiness mathematically of hash
functions, we can't really talk about perfection; it's only the
limitations, the negative parts of the above logic, that are the same:</p>
<p>If your output space is the space of SHA256 outputs, then it's 256 bits.
Now according to specification, that hash function can take extremely
large input (I forget exactly, but vastly larger than 256 bits), which
means it is in the first category - its input space is vastly larger
than its output space, so it <strong>cannot</strong> be perfectly binding. But that
<em>doesn't</em> mean that it's perfectly hiding, unfortunately - that would
require that a given output leaks precisely zero information about the
corresponding input. But it's certainly not the case that we have some
theorem about SHA256 that ensures that every input is equiprobable,
given an arbitrary output. So at <em>best</em> we have computational hiding,
and that's based on the idea tha the hash function is well designed. See
<a href="https://en.wikipedia.org/wiki/Cryptographic_hash_function">Wikipedia</a>
for a reminder on the key properties of cryptographic hash functions.
These properties are also what provides the argument for at least a
computational binding. But again, it's certainly not perfectly either
hiding <em>or</em> binding.</p>
<p>So let's summarize the key insight:</p>
<p><strong>It's LOGICALLY impossible for a commitment scheme to be both perfectly
hiding and perfectly binding, no matter what algorithm or mathematical
architecture is used to construct it.</strong></p>
<p>Why "logically"? Because we've demonstrated the two ideas are
fundamental contradictions of each other; it is only confusion to think
you can get both at the same time. Another way to say it (slightly more
dynamic description):</p>
<p><strong>A commitment scheme which has been constructed to have perfect binding
will at BEST achieve computational hiding, while a scheme constructed to
achieve perfect hiding will at BEST achieve computational binding.</strong></p>
<p>Here we're emphasizing that these are the limits, only achieved by well
designed algorithms; a badly designed or not-fit-for-purpose commitment
scheme may not be perfect in <em>either</em> sense, and for example may not
even manage to be computationally hiding, e.g. an adversary may very
feasibly be able to break the hiding property without excessive
computational resources. This is just a description of the <em>best</em> we can
do.</p>
<h2>From hidden to bound</h2>
<p>We'll get into the Bitcoin-related application shortly, but for now note
that is not unreasonable to prefer binding over hiding in the trade-off.
Since clearly Pedersen doesn't fit there, what does?</p>
<p>Let's start with an almost-obvious idea: suppose I want to commit to a
value \(x\) and have it be perfectly binding. Can I just use
\(C=xG\) as the commitment?</p>
<p>If you've been following along, you'll probably be a little uncertain,
because .. the "hiding" part doesn't seem to have been implemented.
You're right to be uncertain, because the answer is really "formally no,
but it kinda depends".</p>
<p>There are two scenarios: if the set of values you might commit to is
restricted in some way, e.g. a number between 1 and \(2^{25}\) then
the lack of hiding makes the commitment a complete failure, because a
computer could just find it by brute force guessing. And if your \(x\)
was a random value in the entire range \(2^{256}\) of elements of the
group - this kind of construction <em>is</em> sometimes used as a commitment,
but it doesn't count as a proper, generic commitment <em>scheme</em>, because
it doesn't have even computational hiding in the general case; if I
<em>think</em> I know what your \(x\) is (or know the range of it), I can
just check if I'm right; there is no blinding value \(r\) to prevent
that.</p>
<p>This naturally leads us to the <a href="https://en.wikipedia.org/wiki/ElGamal_encryption">ElGamal encryption
scheme</a>,
re-purposed as a commitment scheme (this can be done with any
cryptosystem, by the way):</p>
<p>Take our usual suspects \((x, r)\) and construct <strong>two</strong> elliptic
curve points: \((xG+rH, rG)\). This is the ElGamal commitment (with
all notation as for Pedersen). Wait, I hear you cry - you're just
regurgitating the Pedersen commitment, but adding \(rG\)? What does
that mean? Well, we're taking the slightly broken idea above and
applying it <em>in conjunction with</em> the idea of the Pedersen commitment.
We "commit" to the value \(r\) using \(rG\), and that's OK
specifically because \(r\) is a random number in the range (a bit like
a "nonce") used just for this commitment, so there is no guessing it
outside the standard brute force or breaking ECDLP; by doing so we've
increased the <strong>output space</strong> from Pedersen's set of single curve
points to the Cartesian product of 2 sets of curve points. And we by
doing so arrive at the second of the two scenarios described in the two
diagrams above; now, for each input tuple \((x, r)\), there is an
output tuple \((C_1,C_2) = (xG+rH,rG)\) - guaranteed distinct
because the mapping from \(r\) to \(rG\) is one-one - so the mapping
is one-one and is perfectly binding. More simply: the task of the
adversary who wants to break the commmitment by opening it to a
different value than originally chosen is now impossible: for \(rG\)
there is precisely one and only one \(r\), and once \(r\) is set,
there is only one \(x\): it's the discrete log of \(C_1 -rH\) which
is now uniquely defined, once \(r\) is.</p>
<p>And, following our insights above, it is now decidely unsurprising to
learn that the described ElGamal commitment is only computationally
hiding: because \(rG\) is published as a separate curve point
\(C_2\) and not folded into the single curve point as with Pedersen,
an attacker with the ability to solve the discrete log problem can
extract, from that, \(r\) and then follow up by extracting from
\(C_1 - rH=xG\), the supposedly hidden committed value \(x\).</p>
<p>But let's be clear: it <em>is</em> computationally hiding, unlike our toy
"quasi-commitment" \(xG\) which fails at that task (imagine committing
to the value "2"). And that can be expressed formally with what's called
a "reduction proof"; a rather weird but also very clever concept often
used in cryptography:</p>
<div class="highlight"><pre><span></span><code><span class="err">The ElGamal commitment is hiding if the DDH problem is hard,</span>
<span class="err">because an adversary who can violate the hiding property of the ElGamal</span>
<span class="err">commitment can use that algorithm to solve the DDH problem.</span>
</code></pre></div>
<p>DDH refers to the <a href="https://en.wikipedia.org/wiki/Decisional_Diffie%E2%80%93Hellman_assumption">Decisional Diffie Hellman
problem</a>
- in words, it's that you can't distinguish \(abG\) from a random
curve point, even if you already know the values of \(A,B\) where
\(A=aG, B=bG\).</p>
<p>The intended consequence of this reasoning (and notice how slippery this
logic is!) is to say: DDH is <em>actually</em> hard, therefore ElGamal is
computationally hiding. Or: since it <em>is</em> believed that DDH is hard, it
follows that "we" (some undefined group of cryptographers) believe that
ElGamal, as a commitment scheme, is computationally hiding.</p>
<h3>Brief observation: ElGamal is homomorphic</h3>
<p>Notice how the description of the homomorphic (with respect to addition)
property of the Pedersen commitment cross applies here; even though we
have two curve points here, not one, the same linearity exists:</p>
<p>\(C_{EG}(x_1, r_1) + C_{EG}(x_2, r_2) = \)</p>
<p>\((x_1G + r_1H, r_1G) + (x_2G + r_2H, r_2G)\)</p>
<p>\( = ((x_1 + x_2)G + (r_1+r_2)H, (r_1+r_2)G)\)</p>
<p>\( = C_{EG}(x_1+x_2, r_1+r_2)\)</p>
<h2>An unpalatable tradeoff?</h2>
<p>So all of the above is the "behind the scenes" of the discussion you'll
often see in public about <a href="https://elementsproject.org/features/confidential-transactions/investigation">Confidential
Transactions</a>
in Bitcoin, specifically (not that the tradeoff doesn't apply in other
systems like Monero of course).</p>
<p>We naturally choose the Pedersen commitment for Confidential
Transactions, because it's more compact (remember - size of output
space!). It's only one curve point as output. Confidential Transactions
take up a non-trivial amount of extra space in a transaction, so it's
natural to prefer Pedersen to ElGamal for that reason, even though,
importantly, <span style="text-decoration: underline;">both have the
necessary homomorphic property</span> as already outlined.</p>
<p>Moreover (and more importantly, actually), a CT output needs a <em>range
proof</em> (as explained in great detail e.g.
<a href="https://github.com/AdamISZ/ConfidentialTransactionsDoc/">here</a>,
see also bulletproofs e.g.
<a href="https://eprint.iacr.org/2017/1066.pdf">here</a>
and
<a href="https://github.com/AdamISZ/from0k2bp">here</a>),
which itself requires a <em>lot</em> of space - the range proofs described in
the link, especially bulletproofs, go to a lot of trouble to condense
this data to the greatest extent possible, since it must be published on
the blockchain for all to verify, but that space usage is a serious
issue.</p>
<p>The previous links point to all the work done on space optimisation for
Pedersen; if we switched to ElGamal we'd lose that (I'm not exactly sure
<em>where</em> we'd be in terms of how much space a CT style output would take
up, but it would definitely be considerably more. While writing this
I've noticed Andreev has written up an ElGamal range proof
<a href="https://blog.chain.com/preparing-for-a-quantum-future-45535b316314">here</a>).</p>
<p>Hence the title of the subsection; our choice in CT for something like
Bitcoin seems to be:</p>
<ul>
<li>Continue on the existing path - highly space optimised Pedersen
commitments with perfect hiding and computational binding under the
ECDLP assumption.</li>
<li>Switch to ElGamal commitments, with much more bloaty range proofs
and commitments, which however have perfect binding and
computational hiding (under DDH assumption).</li>
</ul>
<p>Some people might argue that there is just too much fuss and worry about
this. Computational is good enough, if our crypto hardness assumptions
are good enough, and they are kind of industry standard already. However
there's a big problem with this reasoning, and it was explained in the
"tradeoffs" section of
<a href="https://joinmarket.me/blog/blog/the-steganographic-principle">this</a>
earlier blog post. To avoid getting sidetracked on that now, let me
summarize simply:</p>
<blockquote>
<p><em>A break in the binding assumption of Confidential Transactions can
result in the attacker being able to print money in arbitrary amounts
at any time, with absolutely no knowledge by the outside world.</em></p>
</blockquote>
<p>As I was at pains to point out in the linked blog post, this problem is
not CT-specific; it's generic to any blinding mechanism relying on
cryptographic hardness assumptions (i.e. without <strong>perfect binding</strong> or
something analogous where even an infinitely powerful adversary cannot
violate the binding of the blinded amount).</p>
<p>But here (for the rest of this blog post) we'll focus specifically on
the CT version of the problem.</p>
<h2>The unexcluded middle</h2>
<p>If perfect binding and hiding are logically incompatible in a
commitment, our only choice to violate the principle of the excluded
middle is to step outside the boundaries of the problem described, and
the most natural way to do that is to use two different commitments.</p>
<p>Using both Pedersen and ElGamal concurrently makes so little sense as to
be incoherent, not least because an ElGamal commitment <em>contains</em> a
Pedersen commitment. But the key word you could have skipped over in
that sentence was <strong>concurrently</strong>. Ruffing and Malavolta in <a href="https://eprint.iacr.org/2017/237.pdf">this
paper</a>
suggest spreading the problem over time:</p>
<h2>Switch commitments</h2>
<p>The idea here is deceptively simple: what if you use an ElGamal
commitment, but don't verify the non-Pedersen component (the second
point \(rG\) to use consistent notation) initially. If there is some
time \(T\) at which all participants in the system agree that the
ECDLP has "fallen" to quantum computing (the most well discussed failure
vector of elliptic curve crypto), it could be required that after that
flag day, spending of coins (presumably into some safer new cryptosystem
defined by consensus; spending into current-style Bitcoin outputs would
probably not make sense, here) is only valid if the verification/opening
(and the range proof) were applied to the full ElGamal commitment
\(xG+rH, rG\) and not just \(xG+rH\) as was allowed before \(T\).</p>
<p>There are two critiques that may immediately spring to mind, one obvious
and one not:</p>
<ul>
<li>Not necessarily a realistic scenario - the break may be very public
or not, it may be very gradual or not. Declaring a flag day is
mostly assuming it being public. So it's not a panacea.</li>
<li>If you've been reading closely all this time, you'll be alert to a
serious drawback: publishing an ElGamal commitment will not actually
be hiding, if ECDLP is "cracked" (you remember that it requires DDH
hardness, but it's easy to see that if you "crack" ECDLP you also
crack DDH).</li>
</ul>
<p>Taking a more positive perspective, though: it's not as if \(T\) has
to be a "panic stations day". Just as hash functions and TLS versions
are sometimes retired because they <em>start</em> to show just a <em>tiny</em> bit of
weakness, it would similarly make perfect sense for Bitcoin to be
similarly prompt in making a switch to a post-quantum cryptosystem once
EC came into question, and not wait to be attacked. Not to say it would
be easy!</p>
<p>This approach is sometimes called "cryptographic agility" - awkward as
it seems, we do kinda want the ability to upgrade cryptographic
protocols "in-flight", while they are being used.</p>
<p>So at this point we have an ingenious and smart <em>amelioration</em> to the
problem, but it can't be called a complete solution, I think - and
principally because of the (admittedly tiny) possibility of a private
break by some lone genius or similar.</p>
<h3>We put a commitment inside your commitment, so ...</h3>
<p>The authors and collaborators of the switch commitment paper and idea
(Ruffing, Malavolta, Wuille, Poelstra, others .. I'm not actually sure)
found a way to slightly improve the properties of such switch
commitments: a structure they call the <strong>opt-in switch commitment</strong>
which looks something like this:</p>
<p>\(xG + (r+\mathbb{H}(xG+rH || rG))H = xG + r'H\)</p>
<p>The idea is to tweak the blinding component of a standard Pedersen
commitment with the hash of an ElGamal commitment to the same value
(insert old meme as appropriate). Those of you aware of such things may
instantly recognize a close parallel with ideas like pay-to-contract and
<a href="https://bitcoinmagazine.com/articles/taproot-coming-what-it-and-how-it-will-benefit-bitcoin/">taproot</a>
(the latter was inspired by the former, so no surprise there). We're
effectively committing to a "contract" which here is a promise to open
to an ElGamal commitment <em>later,</em> if the consensus calls for it, while
for now not revealing that contract, as it's hidden/blinded with the
value \(r\).</p>
<p>As noted <a href="https://lists.launchpad.net/mimblewimble/msg00479.html">on the mimblewimble mailing
list</a>
by Ruffing, this has a couple of very important advantages over the
non-opt-in version:</p>
<ul>
<li>It preserves the perfect hiding of the Pedersen commitment for as
long as the flag day \(T\) isn't reached (it's exactly a Pedersen
commitment until then).</li>
<li>It doesn't use up another curve point on the blockchain - you only
publish the single curve point as per Pedersen, and not two as per
ElGamal.</li>
</ul>
<p>(Another useful feature - you can derive the value of \(r\) from your
private key deterministically to make it more practical).</p>
<p>Of course one must prove it's secure (under the random oracle model) but
for now I'll take that as a given (it's too much detail for here). But
clearly this is a neat way to encapsulate that "switch" idea; modulo
security proofs, it's an unqualified and very substantial improvement
over the "naked ElGamal" version.</p>
<h3>A hard decision for the sleepy or lazy</h3>
<p>There is still an area of imperfection even in this souped-up "opt-in"
switch commitment case. After the flag day \(T\) if you still have not
moved coins from existing outputs, you can publish the ElGamal
"contract" (commitment) inside the hash, thus keeping the binding
property, so that the envisioned attacker-possessing-a-quantum-computer
will still not be able to print money, but in so doing, you give up the
hiding (the value is revealed <em>at least to such attackers</em> because they
can break the DDH problem). So thus a person failing to take action
before said deadline \(T\) has at least to risk, and probably lose,
one of those two: their privacy of amount or their money.</p>
<h2>Have your cake and eat it?</h2>
<p>Is it possible to do better than such a transition approach, as
envisaged in the switch commitments paradigm?</p>
<p>As was earlier discussed, it suffers from not covering every threat
scenario, in particular, it does not cover the scenario of a private and
unexpected break.</p>
<p>Unfortunately this is where this very long blog post trails off ...
because I don't know, and currently I don't think anyone else does.</p>
<p>My personal feeling was that the switch commitment paradigm suggests
there might be a way to finesse this tradeoff about using commitments.
And it also seems to be something which Adam Back seems to have gone
some way to thinking through - the fact that a single commitment scheme
can't provide perfect hiding and binding for a single value doesn't
imply that it is impossible to get this property, <strong>as long you're not
working with the same value</strong>. For example, what if you could provide an
ElGamal commitment for the money created in a Bitcoin <em>block</em>, while
providing Pedersen commitments as in the current design of CT for the
individual transactions? This means that a quantum or ECDLP breaking
attacker can "snoop" into the overall value created in a block, but this
should either be already known or uninteresting, while although he could
in theory violate the binding property of individual transactions, this
would in turn violate the binding of the block-level commitment which is
supposed to be impossible?</p>
<p>I suspect my original line of thinking is somehow incoherent (how,
mathematically, are the block-level and transaction-level commitments
related?), but Dr Back seems to have in mind something involving
coinjoin-like interactivity. I am leaving it here without attempting to
describe further, because the question seems to continue to be
interesting and if there <em>is</em> a solution (even perhaps if it involves
interactivity), it would be a hugely important fact, making CT a much
more plausible technology for a global money.</p>
<h3>Build the wall?</h3>
<p>We could also take the Trumpian approach - it's far from infeasible to
imagine that there is a mechanism that prevents CT coins arriving back
into plaintext area without allowing any hidden inflation that <em>might</em>
occur to "infect". This is essentially the sidechain model, except it
could be implemented in a variety of different ways. In fact, this
<strong>model already does exist</strong> in the sidechain
<a href="https://blockstream.com/liquid/">Liquid</a>,
which uses CT. But there have also been proposals to implement CT as a
kind of extension block (which has slightly different tradeoffs to a
sidechain), for example see ZmnSCPxj's note
<a href="https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-January/016605.html">here</a></p>Payjoin2018-12-15T00:00:00+01:002018-12-15T00:00:00+01:00Adam Gibsontag:joinmarket.me,2018-12-15:/blog/blog/payjoin/<p>coinjoins in payments</p><h3>PayJoin</h3>
<h2>PayJoin.</h2>
<p>You haven't read any other blog posts here? No worries, here's what
you need to know (<em>unless you're an expert, read them anyway...</em>):</p>
<ul>
<li>A utxo is an "unspent transaction output" - a Bitcoin transaction
creates one or more of these, and each contains a specific amount of
Bitcoin. Those outputs get "used up" in the transaction that
spends them (somewhat like physical coins, someone gives you them in
a payment to you, then you give them to someone else when you spend
them; bitcoins aren't coins, utxos are coins; only difference is
physical coins don't get destroyed in transactions).</li>
<li>The fees you have to pay for a Bitcoin transaction depend on how
many bytes it takes up; this is *somewhat* dominated by how many
inputs you provide, although there are other factors.</li>
<li>CoinJoin basically means - two or more people provide inputs (utxos)
to a transaction and co-sign without needing trust, because when
they sign that the output amounts and addresses are what they
expect. <strong>Note that CoinJoin requires interaction, almost always.</strong></li>
<li>Traditional "equal-sized" CoinJoin means a bunch of people paying
<em>themselves</em> the same fixed amount in a single transaction
(according to the process just mentioned), with the intention that
nobody can tell which of the equal sized outputs belong to who
(basically!).</li>
</ul>
<h2>The drawbacks of CoinJoin as implemented</h2>
<p>Current implementations of CoinJoin are of the "equal-sized" variety
(see above). This requires coordination, but it's possible to get a
decent number of people to come together and agree to do a CoinJoin of a
certain fixed amount. The negative is that this kind of transaction is
trivially distinguishable from an "ordinary" transaction, in
particular a payment from one counterparty to another. Here's a typical
Joinmarket CoinJoin (and other implementations are just as, or more,
distinguishable):</p>
<p><img alt="Equal-outs-coinjoin-example" src="/web/20200803124759im_/https://joinmarket.me/static/media/uploads/.thumbnails/screenshot_from_2019-01-18_15-00-33.png/screenshot_from_2019-01-18_15-00-33-807x433.png">{width="807"
height="433"}</p>
<p>The biggest flag of "this is CoinJoin" is exactly the multiple
equal-value (0.18875417 here) outputs that are the core premise of the
idea, that give the anonymity. Here, you get anonymity in an "anonymity
set" of all participants of <em>this</em> transaction, first, but through
repeated rounds, you <em>kind of</em> get a much bigger anonymity set,
ultimately of all participants of that CoinJoin implementation in the
absolute best scenario. But it's still only a small chunk of Bitcoin
usage generally.</p>
<p>And while this obviously gets better if more people use it, there is a
limit to that thinking: because <strong>all participants are forced to use the
same denomination for any single round</strong>, it isn't possible to fold in
the payments you're doing using Bitcoin as a currency (don't laugh!)
into these CoinJoin rounds (notice: this problem mostly disappears with
blinded amounts).</p>
<p>So a world where "basically everyone uses CoinJoin" is cool for
privacy, but could end up pretty bad for scalability, because these
transactions are <em>in addition to</em> the normal payments.</p>
<p>Also, the fact that these transactions are trivially watermarked means
that, if the blockchain analyst is not able to "crack" and unmix such
transactions, he can at least isolate them in analysis. That's
something; "these coins went from Exchange A to wallet B and then into
this mixset" may be a somewhat negative result, but it's still a
result. There are even noises made occasionally that coins might be
blocked from being sent to certain exchange-type entities if they're
seen to have come from a "mixer" (doesn't matter that CoinJoin is
<em>trustless</em> mixing here; just that it's an activity specific for
obfuscation).</p>
<p>I don't mean to scaremonger - I have used such CoinJoin for years
(measured in the thousands) and will continue to do so, and never had
payments blocked because of it. But this is another angle that must be
borne in mind.</p>
<p>So let's say our primary goal is to minimize the negative privacy
effects of blockchain analysis; can we do better? It's debatable, but
we <em>do</em> have another angle of attack.</p>
<h2>Hiding in a much bigger crowd ... ?</h2>
<p>[One angle is to make your behaviour look more like other, non-coinjoin
transactions]{style="text-decoration: underline;"}. (For the
philosophically/abstract inclined people, <a href="https://web.archive.org/web/20200803124759/https://joinmarket.me/blog/blog/the-steganographic-principle/">this post might be of
interest</a>,
but it sidetracks us here, so - later!). Let's think of the naive way
to do that. Suppose just Alice and Bob make a 2 party CoinJoin:</p>
<p><code>0.05 BTC --->| 0.05 BTC 3AliceSAddReSs</code></p>
<p><code>0.05 BTC --->| 0.05 BTC 3BobSAddReSs</code></p>
<p>This first attempt is a clear failure - it "looks like an ordinary
payment" <em>only</em> in the sense that it has two outputs (one change, one
payment). But the failure is not <em>just</em> the obvious, that the output
amounts are equal and so "obviously CoinJoin". There's another aspect
of that failure, illustrated here:</p>
<p><code>0.01 BTC --->| 0.05 BTC 3AliceSAddReSs</code></p>
<p><code>0.04 BTC --->| 0.06 BTC 3BobSAddReSs</code></p>
<p><code>0.03 BTC --->|</code></p>
<p><code>0.03 BTC --->|</code></p>
<p>This at least is <em>more</em> plausible as a payment, but it shows the
<strong>subset sum</strong> problem that I was describing in my <a href="https://web.archive.org/web/20200803124759/https://joinmarket.me/blog/blog/coinjoinxt/">CoinJoinXT
post</a>
- and trying to solve with CoinJoinUnlimited (i.e. using a Lightning
channel to break the subset sum problem and feed-back the LN privacy
onto the main chain). While the blockchain analyst <em>could</em> interpret
this as a payment, semi-reasonably, of 0.05 btc by one participant, he
could also notice that there are two subsets of the inputs that add up
to 0.05, 0.06. And also splitting the outputs doesn't fundamentally
solve that problem, notice (they'd also have to split into subsets),
and it would anyway break the idea of "looking like a normal payment"
(one payment, one change):</p>
<p><code>0.01 BTC --->| 0.011 BTC 3AliceSAddReSs</code></p>
<p><code>0.04 BTC --->| 0.022 BTC 3BobSAddReSs</code></p>
<p><code>0.03 BTC --->| 0.039 BTC 3Alice2</code></p>
<p><code>0.03 BTC --->| 0.038 BTC 3Bob2</code></p>
<p>After you think about this problem for a while you come to the
conclusion - only if there's actually a transfer of coins from one
party to the other is it solved. Hence
<a href="https://web.archive.org/web/20200803124759/https://joinmarket.me/blog/blog/coinjoinxt/">CoinJoinXT</a>.</p>
<p>But also, hence <strong>PayJoin</strong> - why not actually do a CoinJoin [while you
are making a payment?]{style="text-decoration: underline;"}</p>
<p>[]{style="text-decoration: underline;"}</p>
<h2>PayJoin advantages</h2>
<p>I'm not sure who first thought of doing CoinJoins (see bullet point at
start) of this particular flavour, but a <a href="https://blockstream.com/2018/08/08/improving-privacy-using-pay-to-endpoint/">blogpost from Matthew
Haywood</a>
last summer detailed an implementation approach which came out of a
technical workshop in London shortly before, and a little later a
<a href="https://github.com/bitcoin/bips/blob/master/bip-0079.mediawiki">BIP</a>
was put out by Ryan Havar.</p>
<p>The central idea is:</p>
<ul>
<li>Let Bob do a CoinJoin with his customer Alice - he'll provide at
least one utxo as input, and that/those utxos will be consumed,
meaning that in net, he will have no more utxos after the
transaction than before, and an obfuscation of ownership of the
inputs will have happened [without it looking different from an
ordinary payment.]{style="text-decoration: underline;"}</li>
</ul>
<p>Before we look in detail at the advantages, it's worth answering my
earlier question ("Why not actually do a CoinJoin while you are making
a payment?") in the negative: it's not easy to coordinate that. It
means that either (a) all wallets support it and have a way for
*anyone* to connect to *anyone* to negotiate this (2-party) CoinJoin
or (b) it's only limited to peer to peer payments between owners of a
specific wallet that has a method for them to communicate. So let's be
clear: this is not going to suddently take over the world, but
incremental increases in usage could be tremendously valuable (I'll
explain that statement shortly; but you probably already get
it).[]{style="text-decoration: underline;"}</p>
<ul>
<li><strong>Advantage 1: Hiding the payment amount</strong></li>
</ul>
<p>This is what will immediately stand out from looking at the idea. Bob
"chips in" a utxo (or sometimes more than one). So the payment
<em>output</em> will be more than the actual payment, and it will be profoundly
unobvious what the true payment amount was. Here's an example:</p>
<p><code>0.05 BTC --->| 0.04 BTC 3AliceSAddReSs</code></p>
<p><code>0.09 BTC --->| 0.18 BTC 3BobSAddReSs</code></p>
<p><code>0.08 BTC --->|</code></p>
<p>Now, actually, Alice paid Bob 0.1 BTC using 0.09 and 0.05, getting back
0.04 change. But what does a blockchain analyst think? His first
interpretation will certainly be that there is a payment <em>either</em> of
0.04 BTC or 0.18 BTC, by the owner of the wallet containing all the
inputs. Now, it probably seems very unlikely that the <em>payment</em> was 0.04
and the <em>change</em> 0.18. Why? Because, if the payment output were 0.04,
why would you use all three of those utxos, and not just the first, say?
(0.05). This line of reasoning we have called "UIH1" in the comments
to <a href="https://gist.github.com/AdamISZ/4551b947789d3216bacfcb7af25e029e">this
gist</a>
(h/t Chris Belcher for the nomenclature - "unnecessary input
heuristic") for the details. To be fair, this kind of deduction by a
blockchain analyst is unreliable, as it depends on wallet selection
algorithms; many are not nearly so simplistic that this deduction would
be correct. But possibly combined with wallet fingerprinting and
detailed knowledge of wallet selection algorithms, it's one very
reasonable line of attack to finding the change output and hence the
payment output.</p>
<p>For those interested in the "weeds" I've reproduced the key points
about this UIH1 and UIH2 (probably more important) including stats
collected by LaurentMT of oxt.me, in an "Appendix" section at the end
of this post.</p>
<p>Anyway, what else <em>could</em> the payment amount be, in the transaction
above? As well as 0.04 and 0.18, there is 0.09 and 0.01. Do you see the
reasoning? <em>If</em> we assume that PayJoin is a possibility, then one party
could be consuming 0.09 and 0.08 and getting back 0.01. And similarly
for other contributions of inputs. In the simplest case, I would claim
there are 4 potential payment amounts if there are only two inputs and
we assume that one of the two is owned by the receiver. For the
blockchain analyst, this is a huge mess.</p>
<ul>
<li><strong>Advantage 2 - breaking Heuristic 1</strong></li>
</ul>
<p>I discussed Heuristic 1 in the <a href="%22https://joinmarket.me/blog/blog/coinjoinxt/">CoinJoinXT
post</a>. Simple
description: people (analysts) assume that all the inputs to any
particular transaction are owned by one wallet/owner; i.e. they assume
coinjoin is not used, usually. Following the overall logic of our
narrative here, it's obvious what the main point is with PayJoin - we
break the heuristic <em>without flagging to the external observer that the
breakage has occurred. </em>This is enormously important, even if the
breakage of the assumption of common input ownership on its own seems
rather trivial (especially if PayJoin is used by only few people), with
only 2 counterparties in each transaction.</p>
<ul>
<li><strong>Advantage 3 - Utxo sanitization</strong></li>
</ul>
<p>This one might not occur to you immediately, at all, but is actually
really nice. Consider the plight of the merchant who sells 1,000 widgest
per day for Bitcoin. At the end of the day he has 1,000 utxos that he
has to spend. Perhaps the next day he pays his supplier with 80% of the
money; he'll have to construct a transaction (crudest scenario) with
800 inputs. It's not just that that costs a lot in fees (it does!); we
can't really directly solve that problem (well - use layer 2! - but
that's another blog post); but we can solve something else about it -
the privacy. The merchant immediately links <em>almost</em> <em>all</em> of his
payments in the 800-input payout transaction - horrible!</p>
<p>But PayJoin really helps this; each payment that comes in can consume
the utxo of the last payment. Here are two fictitious widget payments in
sequence to illustrate; Bob's utxos are bolded for clarity:</p>
<p>[PayJoin 1 - Alice pays Bob 0.1 for a
widget:]{style="text-decoration: underline;"}</p>
<p><code>0.05 BTC --->| 0.04 BTC 3AliceSAddReSs</code></p>
<p><code>0.09 BTC --->| 0.18 BTC 3BobSAddReSs</code></p>
<p><code>0.08 BTC --->|</code></p>
<p>(notice: Bob used up one utxo and created one utxo - no net change)</p>
<p>[PayJoin2 - Carol pays Bob 0.05 for a discount
widget:]{style="text-decoration: underline;"}</p>
<p><code>0.01 BTC --->| 0.02 BTC 3CarolSAddReSs</code></p>
<p><code>0.06 BTC --->| 0.23 BTC 3BobSAddReSs</code></p>
<p><code>0.18 BTC --->|</code></p>
<p>This would be a kind of snowball utxo in the naive interpretation, that
gets bigger and bigger with each payment. In the fantasy case of every
payment being PayJoin, the merchant has a particularly easy wallet to
deal with - a wallet that only ever has 1 coin/utxo! (I know it's quite
dubious to think that nobody could trace this sequence, there are other
potential giveaways <em>in this case</em> than just Heuristic 1; but with
Heuristic 1 gone, you have a lot more room to breathe, privacy-wise).</p>
<p>It's worth mentioning though that the full snowball effect can damage
the anonymity set: after several such transactions, Bob's utxo is
starting to get big, and may dwarf other utxos used in the transaction.
In this case, the transaction will violate "UIH2" (you may remember
UIH1 - again, see the Appendix for more details on this) because a
wallet <em>probably</em> wouldn't choose other utxos if it can fulfil the
payment with only one. So this may create a dynamic where it's better
to mix PayJoin with non-PayJoin payments.</p>
<ul>
<li><strong>Advantage 4 - hiding in (and being helpful to) the large crowd</strong></li>
</ul>
<p>"...but incremental increases in usage could be tremendously
valuable..." - let's be explicit about that now. If you're even
reasonably careful, these PayJoin transactions will be basically
indistinguishable from ordinary payments (see earlier comments about
UIH1 and UIH2 here, which don't contradict this statement). It's a
good idea to use decide on a specific locktime and sequence value that
fits in with commonly used wallets (transaction version 2 makes the most
sense). Now, here's the cool thing: suppose a small-ish uptake of this
was publically observed. Let's say 5% of payments used this method.
<strong>The point is that nobody will know which 5% of payments are PayJoin</strong>.
That is a great achievement (one that we're not yet ready to achieve
for some other privacy techniques which use custom scripts, for example;
that may happen after Schnorr/taproot but not yet), because <em>it means
that all payments, including ones that don't use PayJoin, gain a
privacy advantage!</em></p>
<h2>Merchants? Automation?</h2>
<p>The aforementioned
<a href="https://github.com/bitcoin/bips/blob/master/bip-0079.mediawiki">BIP79</a>
tries to address how this might work in a standardized protocol;
there's probably still significant work to do before the becomes
actualized. As it stands, it may be enough to have the following
features:</p>
<ul>
<li>Some kind of "endpoint" (hence "pay to endpoint"/p2ep) that a
customer/payer can connect to encoded as some kind of URL. A Tor
hidden service would be ideal, in some cases. It could be encoded in
the payment request similar to BIP21 for example.</li>
<li>Some safety measures on the server side (the merchant/receiver) to
make sure that an attacker doesn't use the service to connect,
request, and block: thus enumerating the server's (merchant's)
utxos. BIP79 has given one defensive measure against this that may
be sufficient, Haywood's blog post discussed some more advanced
ideas on that score.</li>
<li>To state the obvious friction point - wallets would have to
implement such a thing, and it is not trivial compared to features
like RBF which are pure Bitcoin.</li>
</ul>
<h2>Who pays the fees?</h2>
<p>The "snowball effect" described above, where the merchant always has
one utxo, may lead you to think that we are saving a lot of fees (no 800
input transactions). But not true except because of some second/third
order effect: every payment to the merchant creates a utxo, and every
one of those must be paid for in fees when consumed in some transaction.
The effect here is to pay those fees slowly over time. And it's left
open to the implementation how to distribute the bitcoin transaction
fees of the CoinJoin. Most logically, each participant pays according to
the amount of utxos they consume; I leave the question open here.</p>
<h2>Implementation in practice</h2>
<p>As far as I know as of this writing (mid-January 2019), there are two
implementations of this idea in the wild. One is from Samourai Wallet,
called
<a href="https://samouraiwallet.com/stowaway">Stowaway</a>
and the other is in
<a href="https://github.com/Joinmarket-Org/joinmarket-clientserver/blob/master/docs/PAYJOIN.md">Joinmarket</a>
as of version 0.5.2 (just released).</p>
<p>I gave a demo of the latter in my last <a href="https://web.archive.org/web/20200803124759/https://joinmarket.me/blog/blog/payjoin-basic-demo/">post on this
blog</a>.</p>
<p>In both cases this is intended for peers to pay each other, i.e. it's
not something for large scale merchant automation (as per discussion in
previous section).</p>
<p>It requires communication between parties, as does any CoinJoin, except
arguably
<a href="https://web.archive.org/web/20200803124759/https://joinmarket.me/blog/blog/snicker/">SNICKER</a>.</p>
<p>The sender of the payment always sends a non-CoinJoin payment
transaction to start with; it's a convenient/sane thing to do, because
if connection problems occur, or software problems, the receiver can
simply broadcast this "fallback" payment instead.</p>
<p>In Joinmarket specifically, the implementation looks crudely like this:</p>
<p><code>Sender Receiver</code></p>
<p><code>pubkey+versionrange --></code></p>
<p><code><-- pubkey and version</code></p>
<p><code>(ECDH e2e encryption set up)</code></p>
<p><code>fallback tx ---></code></p>
<p><code><--- PayJoin tx partial-signed</code></p>
<p><code>co-signs and broadcasts</code></p>
<p>Before starting that interchange of course, the receiver must "send"
(somehow) the sender the payment amount and destination address, as well
as (in Joinmarket) an ephemeral "nick" to communicate over the message
channel. Details here of course will vary, but bear in mind that as any
normal payment, there <em>must </em>be some mechanism for receiver to
communicate payment information to the sender.</p>
<h2>Conclusion</h2>
<p>This is another nail in the coffin of blockchain analysis. If 5% of us
do this, it will <em>not</em> be safe to assume that a totally ordinary looking
payment is not a CoinJoin. That's basically it.</p>
<p>----------------------------------------------------------------------</p>
<h3>Appendix: Unnecessary Input Heuristics</h3>
<p>The health warning to this reasoning has already been given: wallets
will definitely not <em>always</em> respect the logic given below - I know of
at least one such case (h/t David Harding). However I think it's worth
paying attention to (this is slightly edited from the comment section of
the referenced gist):</p>
<p>[Definitions:]{style="text-decoration: underline;"}</p>
<p>"UIH1" : one output is smaller than any input. This heuristically
implies that <em>that</em> output is not a payment, and must therefore be a
change output.</p>
<p>"UIH2": one input is larger than any output. This heuristically
implies that <em>no output</em> is a payment, or, to say it better, it implies
that this is not a normal wallet-created payment, it's something
strange/exotic.</p>
<p>Note: UIH2 does not necessarily imply UIH1.</p>
<p>~~ ~~</p>
<p>So we just have to focus on UIH2. Avoiding UIH1 condition is nice,
because it means that both outputs could be the payment; but in any case
the normal blockchain analysis will be wrong about the payment amount.
If we don't avoid the UIH2 condition, though, we lose the
steganographic aspect which is at least 50% of the appeal of this
technique.</p>
<p>Joinmarket's current implementation does its best to avoid UIH2, but
proceeds with PayJoin anyway even if it can't. The reasoning is
partially as already discussed: not all wallets follow this logic; the
other part of the reasoning is the actual data, as we see next:</p>
<p>[Data collection from LaurentMT:]{style="text-decoration: underline;"}</p>
<p>From block 552084 to block 552207 (One day: 01/12/2018)</p>
<ul>
<li>Txs with 2 outputs and more than 1 input = 35,349<ul>
<li>UIH1 Txs (identifiable change output) = 19,020 (0.54)</li>
<li>!UIH1 Txs = 16,203 (0.46)</li>
<li>Ambiguous Txs = 126 (0.00)</li>
</ul>
</li>
</ul>
<p>From block 552322 to block 553207 (One week: 03/12/2018 - 09/12/2018)</p>
<ul>
<li>Txs with 2 outputs and more than 1 input = 268,092<ul>
<li>UIH1 Txs (identifiable change output) = 145,264 (0.54)</li>
<li>!UIH1 Txs = 121,820 (0.45)</li>
<li>Ambiguous Txs = 1,008 (0.00)</li>
</ul>
</li>
</ul>
<p>And here are a few stats for UIH2:</p>
<p>Stats from block 552084 to block 552207 (One day: 01/12/2018)</p>
<ul>
<li>Txs with 2 outputs and more than 1 input = 35,349<ul>
<li>UIH2 Txs = 10,986 (0.31)</li>
<li>!UIH2 Txs = 23,596 (0.67)</li>
<li>Ambiguous Txs = 767 (0.02)</li>
</ul>
</li>
</ul>
<p>From block 552322 to block 553207 (One week: 03/12/2018 - 09/12/2018)</p>
<ul>
<li>Txs with 2 outputs and more than 1 input = 268,092<ul>
<li>UIH2 Txs = 83,513 (0.31)</li>
<li>!UIH2 Txs = 178,638 (0.67)</li>
<li>Ambiguous Txs = 5,941 (0.02)</li>
</ul>
</li>
</ul>CoinjoinXT2018-09-15T00:00:00+02:002018-09-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2018-09-15:/blog/blog/coinjoinxt/<p>a proposal for multi-transaction coinjoins.</p><h3>CoinJoinXT</h3>
<h1>CoinJoinXT - a more flexible, extended approach to CoinJoin</h1>
<p>*Ideas were first discussed
<a href="https://gist.github.com/AdamISZ/a5b3fcdd8de4575dbb8e5fba8a9bd88c">here</a>.
Thanks again to arubi on IRC for helping me flesh them out.\
*</p>
<h2>Introduction</h2>
<p>We assume that the reader is familiar with CoinJoin as a basic idea -
collaboratively providing inputs to a transactions so that it may be
made difficult or impossible to distinguish ownership/control of the
outputs.</p>
<p>The way that CoinJoin is used in practice is (today mainly using
JoinMarket, but others over Bitcoin's history) is to create large-ish
transactions with multiple outputs of exactly the same amount. This can
be called an "intrinsic fungibility" model - since, although the
transactions created are unambiguously recognizable as CoinJoins, the
indistinguishability of said equal outputs is kind of "absolute".</p>
<p>However, as partially discussed in the earlier blog post <a href="https://web.archive.org/web/20200603010653/https://joinmarket.me/blog/blog/the-steganographic-principle/">"the
steganographic
principle"</a>,
there's at least an argument for creating fungibility in a less
explicit way - that is to say, creating transactions that have a
fungibility effect but aren't <em>necessarily</em> visible as such - they
<em>may</em> look like ordinary payments. I'll call this the <em>deniability</em>
model vs the <em>intrinsic fungibility</em> model. It's harder to make this
work, but it has the possibility of being much more effective than the
<em>intrinsic fungibility model</em>, since it gives the adversary (who we'll
talk about in a minute) an additional, huge problem: he doesn't even
know where to start.</p>
<h2>The adversary's assumptions</h2>
<p>In trying to create privacy, we treat the "blockchain analyst" as our
adversary (henceforth just "A").</p>
<p>Blockchain analysis consists, perhaps, of two broad areas (not sure
there is any canonical definition); we can call the first one
"metadata", vaguely, and think of it is every kind of data that is not
directly recorded on the blockchain, such as personally identifying
information, exchange records etc, network info etc. In practice, it's
probably the most important. The second is stuff recorded directly on
the blockchain - pseudonyms (scriptPubKeys/addresses) and amount
information (on non-amount-blinded blockchains as Bitcoin's is
currently; for a discussion about that see this earlier <a href="https://web.archive.org/web/20200603010653/https://joinmarket.me/blog/blog/the-steganographic-principle/">blog
post</a>);
note that amount information includes the implicit amount - network fee.</p>
<p>Timing information perhaps straddles the two categories, because while
transactions are (loosely) timestamped, there is also the business of
trying to pick up timing and perhaps geographic information from
snooping the P2P network.</p>
<p>With regard to that second category, the main goal of A is to correlate
ownership of different utxos. An old
<a href="https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf">paper</a>
of Meiklejohn et al 2013 identified two Heuristics (let's call them
probabilistic assumptions), of which the first was by far the most
important:</p>
<ul>
<li>Heuristic 1 - All inputs to a transaction are owned by the same
party</li>
<li>Heuristic 2 - One-time change addresses are owned by the same party
as the inputs</li>
</ul>
<p>The second is less important mainly because it had to be caveat-ed quite
a bit and wasn't reliable in naive form; but, identification of change
addresses generally is a plausible angle for A. The first has been, as
far as I know, the bedrock of blockchain analysis and has been referred
to in many other papers, was mentioned in Satoshi's whitepaper, and you
can see one functional example at the long-existent website
<a href="https://www.walletexplorer.com/">walletexplorer</a>.</p>
<p>[But I think it's important to observe that this list is
incomplete.]{style="text-decoration: underline;"}[]{style="text-decoration: underline;"}</p>
<p>I'll now add two more items to the list; the first is omitted because
it's elementary, the other, because it's subtle (and indeed you might
find it a bit dumb at first sight):</p>
<ul>
<li><code>Heuristic/Assumption 0</code>: All inputs controlled by only one pubkey
are unilaterally controlled</li>
<li>Heuristic/Assumption 1: All inputs to a transaction are owned by the
same party</li>
<li>Heuristic/Assumption 2(?): One-time change addresses are owned by
the same party as the inputs</li>
<li><code>Heuristic/Assumption 3</code>: Transfer of ownership between parties in
one transaction implies payment</li>
</ul>
<p>So, "Heuristic/Assumption" because assumption is probably a better
word for all of these generally, but I want to keep the existing
nomenclature, the "?" for 2 is simply because, as mentioned, this one
is problematic (although still worthy of consideration).</p>
<p><strong>Assumption 0</strong>: basically, that if it's not multisig, was never fully
safe; there was always <a href="https://en.wikipedia.org/wiki/Shamir's_Secret_Sharing">Shamir's secret
sharing</a>
to share shards of a key, albeit that's very rarely used, and you can
argue pedantically that full reconstruction means unilateral control.
But Assumption 0 is a lot less safe now due to the recent
<a href="https://eprint.iacr.org/2018/472">work</a>
by Moreno-Sanchez et al. which means, at the very least, that 2 parties
can easily use a 2-party computation based on the Paillier encryption
system to effectively use a single ECDSA pubkey as a 2-2 multisig. So
this assumption is generally unspoken, but in my opinion is now
generally important (i.e. not necessarily correct!).</p>
<p><strong>Assumption 3</strong>: this is rather strange and looks tautological; I could
have even written "transfer of ownership between parties in one
transaction implies transfer of ownership" to be cheeky. The point, if
it is not clear to you, will become clear when I explain what
"CoinJoinXT" means.</p>
<p>Our purpose, now, is to make A's job harder <strong>by trying to invalidate
all of the above assumptions at once</strong>.</p>
<h2>Quick refresher: BIP141</h2>
<p>This has been discussed in other blog posts about various types of
"CoinSwap", so I won't dwell on it.</p>
<p>Segwit fixes transaction malleability
(<a href="https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki">BIP141</a>,
along with BIP143,144 were the BIPs that specified segwit). One of the
most important implications of this is explained directly in BIP 141
itself, to
<a href="https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki#Trustfree_unconfirmed_transaction_dependency_chain">quote</a>
from it:</p>
<blockquote>
<p><em>Two parties, Alice and Bob, may agree to send certain amount of
Bitcoin to a 2-of-2 multisig output (the "funding transaction").
Without signing the funding transaction, they may create another
transaction, time-locked in the future, spending the 2-of-2 multisig
output to third account(s) (the "spending transaction"). Alice and
Bob will sign the spending transaction and exchange the signatures.
After examining the signatures, they will sign and commit the funding
transaction to the blockchain. Without further action, the spending
transaction will be confirmed after the lock-time and release the
funding according to the original contract.</em></p>
</blockquote>
<p>In short, if we agree a transaction, then we can fix its txid and sign
transactions which use its output(s). The BIP specifically references
the Lightning Network as an example of the application of this pattern,
but of course it's not restricted to it. We can have Alice and Bob
agree to any arbitrary set of transactions and pre-sign them, in
advance, with all of them having the funding transaction as the root.</p>
<h2>CoinJoinXT - the basic case</h2>
<p>CoinJoin involves 2 or more parties contributing their utxos into 1
transaction, but using the above model they can do the same to a funding
transaction, but then pre-sign a set of more than one spending
transaction. Here's a simple schematic:</p>
<div class="highlight"><pre><span></span><code><span class="err">A 1btc ---></span>
<span class="err"> F (2,2,A,B) --+</span>
<span class="err">B 1btc ---> |</span>
<span class="err"> |</span>
<span class="err"> +-->[Proposed transaction graph (PTG) e.g. ->TX1->TX2->TX3 ..]</span>
</code></pre></div>
<p>In human terms, you can envisage that: Alice and Bob would like to start
to negotiate a set of conditional contracts about what happens to their
money. Then they go through these steps:</p>
<ol>
<li>One side proposes F (the funding transaction) and a full graph of
unsigned transactions to fill out the PTG above; e.g. Alice
proposes, Bob and Alice share data (pubkeys, destination addresses).
Note that the set doesn't have to be a chain (TX1->TX2->TX3...),
it can be a tree, but each transaction must require sign-off of both
parties (either, at least one 2-2 multisig utxo, or at least one
utxo whose key is owned by each party).</li>
<li>They exchange signatures on all transactions in the PTG, in either
order. Of course, they abort if signatures don't validate.</li>
<li>With this in place (i.e. <strong>only</strong> after valid completion of (2)),
they both sign (in either order) F.</li>
<li>Now both sides have a valid transaction set, starting with F. Either
or both can broadcast them. [The transactions are <em>all</em> guaranteed
to occur as long as at least one of them wants
it]{style="text-decoration: underline;"}. Contrariwise, <strong>none</strong> of
them is valid without F being broadcast.</li>
</ol>
<p>This does achieve one significant thing: <strong>one transaction such as TX2
can transfer coins to, say, Bob's wallet, giving Alice nothing; and yet
we can still get the overall effect of a CoinJoin. In other words,
we've opened up the possibility to violate Heuristic 3 as well as
Heuristic 1, in the same (short) interaction.</strong></p>
<p>This construction works fine if <em>all</em> inputs used in transactions in the
PTG are descendants of F; but this makes the construction very limited.
So we'll immediately add more details to allow a more general use-case,
in the next section.</p>
<h2>Introducing Promises</h2>
<p>If we allowed any of the transactions (TX1, TX2, ...) in the PTG in our
previous example to have an input which did <em>not</em> come from the funding
transaction F, then we would have introduced a risk; if Alice added utxo
UA to, say, TX2, then, before Bob attempted to broadcast TX2, she could
double spend it. This would break the atomicity of the graph, which was
what allowed the crucial additional interesting feature (in bold,
above): that an individual transaction could transfer funds to one
party, without risks to the other. To address this problem, we call
these additional inputs <strong>promise utxos</strong> and make use of <strong>refund
transactions</strong>.</p>
<div class="highlight"><pre><span></span><code><span class="err">A 1btc ---></span>
<span class="err"> F (2,2,A,B) ---</span>
<span class="err">B 1btc ---> | +--> external payout 0.5 btc to Bob</span>
<span class="err"> | |</span>
<span class="err"> +->[TX1 --> TX2 --> TX3 --> TX4]</span>
<span class="err"> | ^</span>
<span class="err"> | |</span>
<span class="err"> | |</span>
<span class="err"> | +--- utxo A1</span>
<span class="err"> |</span>
<span class="err"> +--> refund locktime M, pay out *remaining* funds to A: 1btc, B: 0.5btc</span>
</code></pre></div>
<p>In words: if, between the negotiation time and the time of broadcast of
TX3, Alice spends A1 in some other transaction, Bob will still be safe;
after block M he can simply broadcast the presigned refund transaction
to claim the exact number of coins he is owed at that point in the
graph.</p>
<p>The above addresses the case of a single external input being included
in a chain of transactions in the PTG (here, TX1,2,3,4). Extending this,
and generalising to allowing external inputs in many transactions, is
straightforward; we can add such in-PTG backouts at every step,
redeeming all remaining funds to parties according to what they're
owed.</p>
<p>To summarize this section and how it differs from the original, simpler
construction:</p>
<p>Alice and Bob have a choice:</p>
<ol>
<li>They can set up a fully trustless PTG, without promises. They are
then guaranteed to achieve "all or nothing": either all
cooperative signing works, then all transactions can be broadcast
(as long as <em>at least one</em> of them wants to), or nothing
(including F) is broadcast at all.</li>
<li>They can set up a PTG including promises from one or both parties.
Now they don't get "all or nothing" but only ensure that the
transactions that complete are a subset, in order, from the start F.
To achieve this they add presigned backouts at (probably every)
step, so that if the chain "breaks" somewhere along, they will
recover all the funds remaining that are owed to them.</li>
</ol>
<p>The tradeoff is: (2) is not perfectly atomic, but it allows the
transaction graph to include utxos from outside of F's ancestory,
particularly useful for privacy applications. In a sequence of 10
coinjoins, you may be happy to risk that TXs 6-10 don't end up
happening, if it doesn't cost you money. Case (2) is more likely to be
of interest.</p>
<h2>Interlude - overview of features of CoinJoinXT</h2>
<p>There's a large design space here.</p>
<ul>
<li>We can have N parties, not just 2.</li>
<li>We can have as many transactions as we like.</li>
<li>We can have a tree with F as root, rather than a chain.</li>
<li>We can have as many promise utxos from any of the N parties as we
like.</li>
</ul>
<p>A mixture of these features may give different tradeoffs in terms of
<em>intrinsic fungibility</em> vs <em>deniability</em> vs <em>cost</em>; the tradeoff
discussed in the introduction.</p>
<p><strong>Interactivity</strong> - unlike either a CoinSwap of types discussed earlier
in this blog, or doing multiple CoinJoins (to get a better fungibility
effect than just a single one), this only requires one "phase" of
interactivity (in terms of rounds, it may be 3). The two parties
connect, exchange data and signatures, and then immediately disconnect.
(This is what I called no-XBI in the previous <a href="https://web.archive.org/web/20200603010653/https://joinmarket.me/blog/blog/the-half-scriptless-swap/">blog
post</a>).</p>
<p><strong>Boundary</strong> - the adversary A, as was hinted at in the introduction, in
this model, will not necessarily be able to easily see on the blockchain
where the start and end points of this flow of transactions was. To the
extent that this is true, it's an enormous win, but more on this later.</p>
<h2>Example</h2>
<p><img alt="ExampleCJXT" src="../../../../../../20200603010653im_/https:/joinmarket.me/static/media/uploads/.thumbnails/onchaincontract3.png/onchaincontract3-614x422.png">{width="614"
height="422"}</p>
<p>Here we are still restricting to 2 parties for simplicity of the
diagram. There is still a chain of 4 TXs, but here we flesh out the
inputs and outputs. About colors:</p>
<p>Blue txos are co-owned by the two parties, envisioned as 2 of 2 multisig
(although as originally mentioned, the technical requirement is only
that each transaction is signed by both parties).</p>
<p>Red inputs are <strong>promise utxos</strong> as described in the earlier section.</p>
<p>Each promise has a corresponding backout transaction pre-signed as
output consuming the bitcoins of the
[previous]{style="text-decoration: underline;"} transaction to the one
consuming that promise.</p>
<p>Notice that this example contains two possible setups for each
individual transaction in the chain; it can pay out only to one party
(like TX3 which pays bob 0.6btc), or it can pay "CoinJoin-style"
equal-sized outputs to 2 (or N) parties. Choosing this latter option
means you are consciously deciding to blur the line between the
<em>intrinsic-fungibility</em> model and the <em>deniability</em> <em>model,</em> which, by
the way, is not necessarily a bad idea.</p>
<h2>The return of A - amounts leak.</h2>
<p>As mentioned, our adversary A has a very important problem - he may not
know that the above negotiation has happened, unlike a simple CoinJoin
where the transactions are watermarked as such (and this is particularly
true if Alice and Bob do <em>not</em> use equal-sized outputs). The boundary
may be unclear to A.</p>
<p>So, what strategy <em>can</em> A use to find the transaction graph/set? He can
do <a href="https://en.wikipedia.org/wiki/Subset_sum_problem">subset
sum</a>
analysis.</p>
<p>If Alice and Bob are just 'mixing' coins, so that they are paid out
the same amount that they paid in, I'll assert that subset sum is
likely to work. It's true that A's job is quite hard, since in
general, he would have to do such subset-sum analysis on a huge array of
different possible sets of (inputs, outputs) on chain; but nevertheless
it's the kind of thing that can be done by a professional adversary,
over time. The fact that subset sum analysis is theoretically
exponential time and therefore not feasible for very large sets may not
be relevant in practice.</p>
<p>In our example above it may not be hard to identify the two inputs from
Alice (1btc, 0.3btc) as corresponding to 3 outputs (0.8btc, 0.2btc,
0.3btc), albeit that the latter two - 0.2, 0.3 were part of CoinJoins.
Remember that this was a tradeoff - if we <em>didn't</em> make equal sized
outputs, to improve deniability/hiding, we'd no longer have any
ambiguity there.</p>
<h2>Breaking subset-sum with Lightning</h2>
<p><img alt="" src="../../../../../../20200603010653im_/https:/joinmarket.me/static/media/uploads/.thumbnails/amtdecorr2.png/amtdecorr2-711x392.png">{width="711"
height="392"}</p>
<p>Here's one way of addressing the fact that A can do subset-sum on such
a privacy-enhancing CoinJoinXT instantiation. The PTG is unspecified but
you can imagine it as something similar to the previous example.</p>
<p>Marked in blue is what the adversary A doesn't know, even if he has
identified the specific transaction/graph set (as we've said, that in
itself is already hard). Subset-sum analysis won't work here to
identify which output belongs to Alice and which to Bob; since 5.5 + 1.5
!= 6.6, nor does 5.4 fit, nor does such an equation fit with Alice's
input 5.8 on the right hand side of the equation.</p>
<p>The trick is that the 1.5 output is actually a <strong>dual funded Lightning
channel</strong> between Alice and Bob. The actual channel balance is shown in
blue again because hidden from A: (0.3, 1.2). If the channel is then
immediately closed we have fallen back to a case where subset sum works,
as the reader can easily verify.</p>
<p>But if, as is usually the intent, the channel gets used, the balance
will shift over time, due to payments over HTLC hops to other
participants in the Lightning network. This will mean that the final
closing balance of the channel will be something else; for example,
(0.1, 1.4), and then subset-sum will still not reveal which of the 2
outputs (5.4, 5.5) belong to Alice or Bob.</p>
<p>At a high level, you can understand this as a <strong>bleed-through and
amplification of off-chain privacy to on-chain.</strong></p>
<p>It's worth noting that you clearly get a significant part of this
effect from just the dual-funded Lightning channel; if you consider
change outputs in such a single funding transaction, you see the same
effect:</p>
<div class="highlight"><pre><span></span><code><span class="err">Alice</span>
<span class="err">2.46</span>
<span class="err"> -> Lightning funding 0.1</span>
<span class="err"> -> Change 2.41</span>
<span class="err"> -> Change 2.37</span>
<span class="err">2.42</span>
<span class="err">Bob</span>
</code></pre></div>
<p>It's easy to see that there is no delinking effect on the change-outs
<em>if</em> we know that the funding is equal on both sides. However, there's
no need for that to be the case; if the initial channel balance is
(Alice: 0.09, Bob: 0.01) then the change-outs are going to the opposite
parties compared to if the channel funding is (Alice: 0.05, Bob: 0.05).
So this concrete example should help you to understand a crucial aspect
of this:</p>
<ul>
<li>Such a fungibility effect is only achieved if the difference between
the two parties' initial inputs is small enough compared to the
size of the dual-funded Lightning channel</li>
<li>If the size of the inputs is very large compared to the Lightning
channel overall size, which currently at maximum is 2**24 satoshis
(about 0.16btc), then, in order to achieve this obfuscation effect,
we "converge" to the case of something like a 2-in and 2-out
equal-sized coinjoin. It's hard for 2 parties to arrange to have
inputs of equal sizes, and it somewhat loses the deniability feature
we were going for. (You can easily confirm for yourself that there
will be no ambiguity if Alice and Bob's inputs are of completely
different sizes).</li>
</ul>
<p>So how does the picture change if instead of just doing a single
dual-funded Lightning channel, we include it as an output in a
CoinJoinXT structure?</p>
<p>The answer again is deniability. Any contiguous subset of the entire
blockchain has the property of sum preservation, modulo fees: the input
total is \~= the output total. So no particular contiguous subset on the
blockchain flags itself as being such a CoinJoinXT structure - unless
subset sum works for some N subsets (2, as in our examples, or higher).
But with the dual funded Lightning output of the type shown here, at
least for the 2 of 2 case, this doesn't work.</p>
<h2>Remove all traces?</h2>
<p>What's been described up to now doesn't quite achieve the desired goal
of "deniability"; there are still what we might call "fingerprints"
in such a CoinJoinXT structure:</p>
<ul>
<li>Timing correlation: if we don't use nLockTime on these
transactions, then one party might choose to broadcast them all at
once. This is at the least a big clue, although not unambiguous. To
avoid it, have the pre-signed transactions in the PTG all be given
specific timelocks.</li>
<li>Shared control utxos. If we use 2 of 2, or N of N, multisig outputs,
of the current normal p2sh type, then they are observable as such,
and this could easily help A to find the "skeleton" of such a
CoinJoinXT structure. Of course, let's not forget that we can do
CoinJoinXT with various equal sized outputs too, mixing the
"intrinsic fungibility" and "deniability" approaches together,
as discussed, so it's not that CoinJoinXT with p2sh multisig
connecting utxos is useless. But we may want to focus on less
detectable forms, like Schnorr/MuSig based multisig with key
aggregation so that N of N is indistinguishable from 1 of 1, or the
new
<a href="https://eprint.iacr.org/2018/472">construction</a>
that allows an ECDSA pubkey to be effectively a 2 of 2 multisig.</li>
</ul>
<h2>Conclusion</h2>
<p><strong>Proof of Concept</strong> - I put together a some very simple <a href="https://github.com/AdamISZ/CoinJoinXT-POC">PoC
code</a>;
it only covers something like the above first "Example" with 2
parties. Going through such an exercise in practice at least allows one
to see concretely that (a) the interaction between the parties is very
minimal (sub-second) which is great of course, but it gets a little
hairy when you think about how to set up a template of such a
transaction chain that 2 parties can agree on using whatever utxos they
have available as inputs. A substantial chunk of that PoC code was
devoted to that - there is a general <code>Template</code> class for specifying a
graph of transactions, with parametrized input/output sizes.</p>
<p><strong>Practicality today</strong> - Although it can be done today (see previous),
there are barriers to making this work well. Ideally we'd have Schnorr
key aggregation for multisig, and support for dual funded Lightning
channels for the amount decorrelation trick mentioned. Without either of
those, such a transaction graph on the blockchain will be <em>somewhat</em>
identifiable, but I still think there can be a lot of use doing it as an
alternative to large sets of clearly identifiable CoinJoins.</p>
<p><strong>Cost tradeoffs</strong> - left open here is the tradeoffs in terms of
blockchain space usage for each "unit of fungibility", i.e. how much
it costs to gain privacy/fungibility this way. I think it's almost
impossible to come up with definitive mathematical models of such
things, but my feeling is that, exactly to the extent any
"deniability" is achieved, it's cost-effective, and to the extent
it's not, it's not cost-effective.</p>
<p><strong>Coordination model</strong> - Currently we have "in play" at least two
models of coordination for CoinJoin - Joinmarket's market-based model,
and the Chaumian server model currently championed by
<a href="https://github.com/nopara73/ZeroLink">ZeroLink</a>.
<strong>CoinJoinXT as an idea is orthogonal to the coordination mechanism</strong>.
The only "non-orthogonal" aspect, perhaps, is that I think the
CoinJoinXT approach may still be pretty useful with only 2 parties (or
3), more so that CoinJoin with only 2/3.</p>
<p>Finally, where should this fit in one's fungibility "toolchest"?
Lightning is <em>hopefully</em> going to emerge as a principal way that people
gain fungibility for their everyday payments. The area it can't help
with now, and probably not in the future due to its properties, is with
larger amounts of money. So you might naturally want to ensure that in,
say, sending funds to an exchange, making a large-ish payment, or
perhaps funding a channel, you don't reveal the size of your cold
storage wallet. I would see the technique described on this blog post as
fitting into that medium-large sized funds transfer situation. CoinJoin
of the pure "intrinsic fungibility" type, done in repeated rounds or
at least in very large anonymity sets, is the other alternative (and
perhaps the best) for large sizes.</p>The Steganographic Principle2018-04-15T00:00:00+02:002018-04-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2018-04-15:/blog/blog/the-steganographic-principle/<p>a framework for thinking about blockchain privacy issues</p><h3>The steganographic principle</h3>
<h1>The Steganographic Principle</h1>
<p>Some time ago I wrote
<a href="https://gist.github.com/AdamISZ/83a17befd84992a7ad74">this</a>
gist, which is an ill-formed technical concept about a way you could do
steganography leveraging randomness in existing network protocols; but I
also called it a "manifesto", jokingly, because I realised the thinking
behind it is inherently political.</p>
<h2>Cryptography is for terrorists, too</h2>
<p>There are a few reasons why the phrase "If you have nothing to hide, you
have nothing to fear" is wrong and insidiously so. One of the main ones
is simply this: my threat model is <strong>not only my government</strong>, even if
my government is perfect and totally legitimate (to me). But no
government is perfect, and some of them are literally monstrous.</p>
<p>So while it's true that there are uses of cryptography harmonious with a
PG13 version of the world - simply protecting obviously sensitive data
<em>within</em> the control of authorities - there are plenty where it is
entirely ethically right and necessary to make that protection
<strong>absolute</strong>.</p>
<p>The question then arises, as was raised in the above gist, what are the
properties of algorithms that satisfy the requirement of defence even
against hostile authorities?</p>
<p>The modern tradition of cryptography uses Kerckhoff's Law as one of its
axioms, and steganography does not fit into this model. But that's
because the tradition is built by people in industry who are fine with
people <strong>knowing they are using cryptography</strong>. In an environment where
that is not acceptable, steganography is not on a list of options - it's
more like the sine qua non.</p>
<h2>Steganography on blockchains</h2>
<p>On a blockchain, we have already understood this "freedom fighter"
model. It's an essential part of how the thing was even created, and why
it exists. And there are essentially two principal complaints about
Bitcoin and its blockchain, both of which are somewhat related to this:</p>
<ul>
<li>Privacy</li>
<li>Scalability</li>
</ul>
<p>The first is obvious - if we don't create "steganographic" transactions,
then governments, and everyone else, may get to know at least
<em>something</em> about our transactions. The second is less so - but in the
absence of scale we have a small anonymity set. Smaller payment network
effects and smaller anonymity sets obviously hamper use of these systems
by a "freedom fighter". But remember the scale limitations come directly
out of the design of the system with censorship resistance and
independent verification in mind.</p>
<p>Attempts to improve the privacy by altering the <em>way</em> in which
transactions are done have a tendency to make the scalability worse -
the obvious example being CoinJoin, which with unblinded amounts
inevitably involves larger numbers of outputs and larger numbers of
transactions even.</p>
<p>A less obvious example is Confidential Transcations; when we blind
outputs we need to use up more space to create the necessary guarantees
about the properties of the amounts - see the range proof, which with
Borromean ring signatures or bulletproofs need a lot of extra space. The
same is true of ring signature approaches generally to confidentiality.</p>
<p>You can trade off space usage for computation though - e.g. zkSNARKs
which are quite compact in space but take a lot of CPU time to create
(and in a way they take a lot of space in a different sense - memory
usage for proof creation).</p>
<h2>Localised trust</h2>
<p>You can improve this situation by localising trust in space or time.
There are obvious models - the bank of the type set up by digicash. See
the concept of <a href="https://en.wikipedia.org/wiki/Blind_signature">Chaumian
tokens</a>
generally. One project that looked into creating such things was
<a href="https://github.com/Open-Transactions/">OpenTransactions</a>,
another was Loom, also see Truledger.</p>
<p>Trust can be localised in time as well - and the aforementioned zkSnarks
are an example; they use a trusted setup as a bootstrap. This trust can
be ameliorated with a multiparty computation protocol such that trust is
reduced by requiring all participants to be corrupt for the final result
to be corrupt; but it is still trust.</p>
<h2>The tension between privacy and security</h2>
<p>For any attribute which is perfectly (or computationally) hidden, we
have a corresponding security downgrade. If attribute A is required to
satisfy condition C by the rules of protocol P, and attribute A is
blinded to A* by a privacy mechanism M, in such a way that we use the
fact that C* is guaranteed by A*, then we can say that P's security is
"downgraded" by M in the specific sense that the C-guarantee has been
changed to the C*-guarantee, where (inevitably) the C* guarantee is
not as strong, since it requires the soundess of M as well as whatever
assumptions already existed for the soundness of C.</p>
<p>However, the situation is worse - precisely because M is a privacy
mechanism, it reduces public verifiability, and specifically
verifiability of the condition C, meaning that if the C* guarantee
(which we <em>can</em> publically verify) fails to provide C, there will be no
public knowledge of that failure.</p>
<p>To give a concrete example of the above template, consider what happens
to Bitcoin under Confidential Transactions with Pedersen commitments
(set aside the range proof for a moment). Since Pedersen commitments are
perfectly hiding but only computationally binding, we have:</p>
<p>P = Bitcoin</p>
<p>A = Bitcoin amounts of outputs</p>
<p>C = amount balance in transactions</p>
<p>M = CT with Pedersen commitments</p>
<p>A* = Pedersen commitments of outputs</p>
<p>C* = Pedersen commitment balance in transactions</p>
<p>Here the downgrade in security is specifically the computational binding
of Pedersen commitments (note: that's assuming both ECDLP intractability
*and* NUMS-ness of a curve point). Without Pedersen/CT, there are
*no* assumptions about amount balance, since integers are "perfectly
binding" :) With it, any failure of the computational binding is
catastrophic, since we won't see it.</p>
<h2>The tension between privacy and scalability</h2>
<p>For any attribute A which is obfuscated by a privacy mechanism M in
protocol P (note: I'm choosing the word "obfuscation" here to indicate
that the hiding is not perfect - note the contrast with the previous
section), we have a corresponding scalability failure. M may obfuscate
an attribute A by expanding the set of possible values/states from A to
A[N]. To commit to the obfuscation soundly it must publish data of
order \~ N x size(A). Also note that it is <em>possible</em> for the
obfuscation goal to be achieved without an increase in space usage, if
multiple parties can coordinate their transactions, but here we ignore
this possibility because it requires all parties to agree that all
attributes except A to be identical (example: multiple participants must
accept their newly created outputs are equal value). This is not really
a "transaction" in the normal sense.</p>
<p>A concrete example: equal-sized Coinjoin in Bitcoin:</p>
<p>P = Bitcoin</p>
<p>A = receiver of funds in a transaction</p>
<p>A[N] = set of N outputs of equal size</p>
<p>M = Coinjoin</p>
<p>A less obvious example but fitting the same pattern; ElGamal commitment
based Confidential Transactions (as opposed to Pedersen commitments
based)</p>
<p>P = Bitcoin</p>
<p>A = output amount in a transaction</p>
<p>A[N] = ElGamal commitment to amount, here 2 curve points, N=2</p>
<p>M = ElGamal commitments</p>
<p>Here N=2 requires some explaining. An ElGamal commitment is perfectly
binding, and to achieve that goal the commitment must have 2 points, as
the input has two values (scalars), one for blinding and the other for
binding the amount. So we see in this case the expansion in practice is
more than just a single integer, it's from a single bitcoin-encoded
integer to two curve points. But the details obviously vary; the general
concept is to whatever extent we obfuscate, without throwing in extra
security assumptions, we require more data.</p>
<h2>Verification - public or private?</h2>
<p>The structure above is trying to make an argument, which I believe is
pretty strong - that this represents searching for privacy, in a
blockchain context, in slightly the wrong way.</p>
<p>If we try to make the <em>blockchain</em> itself private, we are slightly
pushing against its inherent nature. Its crucial feature is
<strong>public verifiability</strong>, and
while it's true that this does not require all attributes properties to
be "unblinded" nor "unobfuscated", we see above that introducing
blinding or obfuscation is problematic; you either degrade security in a
way that's not acceptable because it introduces invisible breaks, or you
degrade scalability (such as using a perfectly binding commitment
requiring no compression, or a zero knowledge proof taking up a lot of
space or computation time), or you degrade trustlessness (see: trusted
setup zkps). I have no absolute theorem that says that you cannot get
rid of all of these problems simultaneously; but it certainly seems
hard!</p>
<p>This is where the idea of a "steganographic blockchain" comes in; if
instead of trying to hide attributes of transactions, we try to make the
<em>meaning</em> of transactions be something not explicit to the chain, but
agreed upon by arbitrary participants using mechanisms outside it. This
allows one to leverage the blockchain's principal feature - censorship
resistant proof of state changes, in public, without inheriting its main
bugs - lack of privacy and scalability, and without degrading its own
security.</p>
<p>Examples:</p>
<ul>
<li>Colored coins</li>
<li>Crude example: atomic swaps</li>
<li>Lightning and second-layer</li>
<li>Chaumian tokens</li>
<li>Client-side validation (single use seals)</li>
<li>Scriptless scripts</li>
</ul>
<h2>High bandwidth steganography</h2>
<p>The biggest practical problem with steganography has always been
bandwidth; if you use non-random data such as images or videos, which
are often using compression algorithms to maximise their signal to noise
ratio, you have the problem of getting sufficient "cover traffic" over
your hidden message.</p>
<p>Note that this problem does not occur <strong>at all</strong> in cases where your
hidden message is embedded into another message which is random. This is
the case with digital signatures; ECDSA and Schnorr for example are both
publish as two random values each of which is about 32 bytes.</p>
<p>To go back to the previously mentioned example of scriptless scripts, we
can see that the atomic swap protocol based on it as described in my
<a href="https://web.archive.org/web/20200603112526/https://joinmarket.me/blog/blog/flipping-the-scriptless-script-on-schnorr/">blog
post</a>,
exploits this directly. On chain we see two (not obviously related)
transactions with Schnorr signatures that are, to the outside observer,
in no way related; the hiding of the connection is perfect, but the
binding/atomicity of the two payments is still secure, just not
perfectly so (it's based on the ECDLP hardness assumption, but then so
are ordinary payments).</p>
<p>Note how this is a different philosophy/approach to hiding/privacy:
since such a swap leaves no fingerprint on-chain, the concept of
anonymity set blurs; it's strictly all transactions (assuming Schnorr in
future, or ECDSA-2PC now), even if most people do not use the technique.
To get that same effect with an enforced privacy overlay mechanism M for
all participants, we tradeoff the security or scalability issues
mentioned above.</p>
<p>This is the reason for my slightly click-baity-y subtitle "High
Bandwidth Steganography". A big chunk of the Bitcoin blockchain is
random (as those who've tried to compress it have learned to their
chagrin), and so it's not quite as hard to usual to hide transaction
semantics (the ideal case will be inside signatures using scriptless
script type constructs), so in a sense we can get a very high bandwidth
of data communicated client to client without using any extra space on
chain, and without "polluting" the chain with extra security
assumptions.</p>Flipping the scriptless script on Schnorr2018-03-15T00:00:00+01:002018-03-15T00:00:00+01:00Adam Gibsontag:joinmarket.me,2018-03-15:/blog/blog/flipping-the-scriptless-script-on-schnorr/<p>using scriptless scripts for atomic swaps</p><h3>Flipping the scriptless script on Schnorr</h3>
<h2>Outline</h2>
<p>It's by now very well known in the community of Bitcoin enthusiasts that
the <a href="https://en.wikipedia.org/wiki/Schnorr_signature">Schnorr
signature</a>
may have great significance; and "everyone knows" that its significance
is that it will enable signatures to be aggregated, which could be
<strong>great</strong> for scalability, and nice for privacy too. This has been
elucidated quite nicely in a Bitcoin Core <a href="https://bitcoincore.org/en/2017/03/23/schnorr-signature-aggregation/">blog
post</a>.</p>
<p>This is very true.</p>
<p>There are more fundamental reasons to like Schnorr too; it can be shown
with a simple proof that Schnorr signatures are secure if the elliptic
curve crypto that prevents someone stealing your coins (basically the
"Elliptic Curve Discrete Logarithm Problem" or ECDLP for short) is
secure, and assuming the hash function you're using is secure (see <a href="https://blog.cryptographyengineering.com/2011/09/29/what-is-random-oracle-model-and-why-3/">this
deep dive into the random oracle
model</a>
if you're interested in such things). ECDSA doesn't have the same level
of mathematical surety.</p>
<p>Perhaps most importantly of all Schnorr signatures are <strong>linear</strong> in the
keys you're using (while ECDSA is not).</p>
<p>Which brings me to my lame pun-title : another way that Schnorr
signatures may matter is to do with, in a sense, the <strong>opposite</strong> of
Schnorr aggregation - Schnorr subtraction. The rest of this very long
blog post is intended to lead you through the steps to showing how
clever use of signature subtraction can lead to <span
style="text-decoration: underline;">one</span> very excellent outcome
(there are others!) - a private Coinswap that's simpler and better than
the private Coinswap outlined in my <a href="https://web.archive.org/web/20200506162002/https://joinmarket.me/blog/blog/coinswaps">previous blog
post</a>.</p>
<p>The ideas being laid out in the rest of this post are an attempt to
concretize work that, as far as I know, is primarily that of Andrew
Poelstra, who has coined the term "<strong>scriptless scripts</strong>" to describe a
whole set of applications, usually but not exclusively leveraging the
linearity of Schnorr signatures to achieve goals that otherwise are not
possible without a system like Bitcoin's
<a href="https://en.bitcoin.it/wiki/Script">Script</a>.
This was partly motivated by Mimblewimble (another separate, huge
topic), but it certainly isn't limited to that. The broad overview of
these ideas can be found in these
<a href="https://download.wpsoftware.net/bitcoin/wizardry/mw-slides/2017-05-milan-meetup/slides.pdf">slides</a>
from Poelstra's Milan presentation last May.</p>
<p>So what follows is a series of constructions, starting with Schnorr
itself, that will (hopefully) achieve a goal: an on-chain atomic
coinswap where the swap of a secret occurs, on chain, inside the
signatures - but the secret remains entirely invisible to outside
observers; only the two parties can see it.</p>
<p>If you and I agree between ourselves that the number to subtract is 7,
you can publish "100" on the blockchain and nobody except me will know
that our secret is "93". Something similar (but more powerful) is
happening here; remember signatures are actually just numbers; the
reason it's "more powerful" is that we can enforce the revealing of the
secret by the other party if the signature is valid, and coins
successfully spent.</p>
<p>Before we therefore dive into how it works, I wanted to mention why this
idea struck me as so important; after talking to Andrew and seeing the
slides and talk referenced above, I
<a href="https://twitter.com/waxwing__/status/862724170802761728">tweeted</a>
about it:</p>
<p><strong>If we can take the <em>semantics</em> of transactions off-chain in this kind
of way, it will more and more improve what Bitcoin (or any other
blockchain) can do - we can transact securely without exposing our
contracts to the world, and we can reduce blockchain bloat by using
secrets embedded in data that is already present. The long term vision
would be to allow the blockchain itself to be a *very* lean contract
enforcement mechanism, with all the "rich statefulness" .. client-side
;)<span style="text-decoration: underline;">
</span></strong></p>
<h4>Preliminaries: the Schnorr signature itself</h4>
<p><em>(Notation: We'll use <code>||</code> for concatenation and capitals for elliptic
curve points and lower case letters for scalars.)</em></p>
<p>If you want to understand the construction of a Schnorr signature well,
I can recommend Oleg Andreev's compact and clear
<a href="http://blog.oleganza.com/post/162861219668/eli5-how-digital-signatures-actually-work">description</a>
; also nice is Section 1 in the Maxwell/Poelstra Borromean Ring
Signatures
<a href="https://github.com/Blockstream/borromean_paper">paper</a>,
although there are of course tons of other descriptions out there. We'll
write it in basic form as:</p>
<div class="highlight"><pre><span></span><code><span class="err">s = r + e * x</span>
<span class="err">e = H(P||R||m)</span>
</code></pre></div>
<p>Note: we can hash, as "challenge" a la <a href="https://en.wikipedia.org/wiki/Proof_of_knowledge#Sigma_protocols">sigma
protocol</a>,
just <code>R||m</code> in some cases, and more complex things than just <code>P||R||m</code>,
too; this is just the most fundamental case, fixing the signature to a
specific pubkey; the nonce point <code>R</code> is always required).</p>
<p>For clarity, in the above, <code>x</code> is the private key, <code>m</code> is the message,
<code>r</code> is the "nonce" and <code>s</code> is the signature. The signature is published
as either <code>(s, R)</code> or <code>(s, e)</code>, the former will be used here if
necessary.</p>
<p>Apologies if people are more used to <code>s = r - ex</code>, for some reason it's
always <code>+</code> to me!</p>
<p>Note the linearity, in hand-wavy terms we can say:</p>
<div class="highlight"><pre><span></span><code><span class="err">s_1 = r_1 + e * x_1</span>
<span class="err">s_2 = r_2 + e * x_2</span>
<span class="err">e = H(P_1 + P2 || R_1 + R_2 || m)</span>
<span class="err">=></span>
<span class="err">s_1 + s_2 is a valid signature for public key (P_1 + P_2) on m.</span>
</code></pre></div>
<p>But this is <strong>NOT</strong> a useable construction as-is: we'll discuss how
aggregation of signatures is achieved properly later, briefly.</p>
<h4>Construction of an "adaptor" signature</h4>
<p>This is the particular aspect of Poelstra's "scriptless script" concept
that gets us started leveraging the Schnorr signature's linearity to do
fun things. In words, an "adaptor signature" is a not a full, valid
signature on a message with your key, but functions as a kind of
"promise" that a signature you agree to publish will reveal a secret, or
equivalently, allows creation of a valid signature on your key for
anyone possessing that secret.</p>
<p>Since this is the core idea, it's worth taking a step back here to see
how the idea arises: you want to do a similar trick to what's already
been done in atomic swaps: to enforce the atomicity of (spending a coin:
revealing a secret); but without Script, you can't just appeal to
something like <code>OP_HASH160</code>; if you're stuck in ECC land, all you have
is scalar multiplication of elliptic curve points; but luckily that
function operates similar to a hash function in being one-way; so you
simply share an elliptic curve point (in this case it will be <code>T</code>), and
the secret will be its corresponding private key. The beatiful thing is,
it <em>is</em> possible to achieve that goal directly in the ECC Schnorr
signing operation.</p>
<p>Here's how Alice would give such an adaptor signature to Bob:</p>
<p>Alice (<code>P = xG</code>), constructs for Bob:</p>
<ul>
<li>Calculate <code>T = tG</code>, <code>R = rG</code></li>
<li>Calculate <code>s = r + t + H(P || R+T || m) * x</code></li>
<li>Publish (to Bob, others): <code>(s', R, T)</code> with <code>s' = s - t</code> (so <code>s'</code>
should be "adaptor signature"; this notation is retained for the
rest of the document).</li>
</ul>
<p>Bob can verify the adaptor sig <code>s'</code> for <code>T,m</code>:</p>
<div class="highlight"><pre><span></span><code><span class="err">s' * G ?= R + H(P || R+T || m) * P</span>
</code></pre></div>
<p>This is not a valid sig: hashed nonce point is <code>R+T</code> not <code>R</code>;</p>
<p>Bob cannot retrieve a valid sig : to recover <code>s'+t</code> requires ECDLP
solving.</p>
<p>After validation of adaptor sig by Bob, though, he knows:</p>
<p>Receipt of <code>t</code> <=> receipt of valid sig <code>s = s' + t</code></p>
<h4>Deniability:</h4>
<p>This is a way of concretizing the concept that all of this will be
indistinguishable to an observer of the blockchain, that is to say, an
observer only of the final fully valid signatures:</p>
<p>Given any <code>(s, R)</code> on chain, create <code>(t, T)</code>, and assert that the
adaptor signature was: <code>s' = s - t</code>, with <code>R' = R - T</code>, so adaptor
verify eqn was: <code>s'G = R' + H(P || R'+T || m)P</code></p>
<h4></h4>
<h4>Moving to the 2-of-2 case, with Schnorr</h4>
<p>For the remainder, we're considering the matter of signing off
transactions from outpoints jointly owned (2 of 2) by Alice and Bob.</p>
<p>Start by assuming Alice has keypair <code>(x_A, P_A)</code>, and Bob <code>(x_B, P_B)</code>.
Each chooses a random nonce point <code>r_A</code>, <code>r_B</code> and exchanges the curve
points with each other (<code>P_A, R_A, P_B, R_B</code>) to create a
scriptPubKey/destination address.</p>
<h4>2-of-2 Schnorr without adaptor sig</h4>
<p>To avoid related-key attacks (if you don't know what that means see e.g.
the "Cancelation" section in
<a href="https://diyhpl.us/wiki/transcripts/scalingbitcoin/milan/schnorr-signatures/">https://diyhpl.us/wiki/transcripts/scalingbitcoin/milan/schnorr-signatures/</a>),
the "hash challenge" is made more complex here, as was noted in the
first section on Schnorr signatures. The two parties Alice and Bob,
starting with pubkeys <code>P_A</code>, <code>P_B</code>, construct for themselves a "joint
key" thusly:</p>
<div class="highlight"><pre><span></span><code><span class="err">P_A' = H(H(P_A||P_B) || P_A) * P_A ,</span>
<span class="err">P_B' = H(H(P_A||P_B) || P_B) * P_B ,</span>
<span class="err">joint_key = P_A' + P_B'</span>
</code></pre></div>
<p>Note that Alice possesses the private key for <code>P_A'</code> (it's
<code>H(H(P_A||P_B) || P_A) * x_A</code>, we call it <code>x_A'</code> for brevity), and
likewise does Bob. From now on, we'll call this "joint_key" <code>J(A, B)</code>
to save space.</p>
<p>Common hash challenge:</p>
<div class="highlight"><pre><span></span><code><span class="err">H(J(A, B) || R_A + R_B || m) = e</span>
<span class="err">s_agg = = r_A + r_B + e(x_A' + x_B')</span>
<span class="err">-> s_agg * G = R_A + R_B + e * J(A, B)</span>
</code></pre></div>
<p>Alice's sig: <code>s_A = r_A + e * x_A'</code>, Bob's sig: <code>s_B = r_B + e * x_B'</code>
and of course: <code>s_agg = s_A + s_B</code>.</p>
<p>There is, as I understand it, more to say on this topic, see
e.g.<a href="http://diyhpl.us/wiki/transcripts/bitcoin-core-dev-tech/2017-09-06-signature-aggregation/">here</a>,
but it's outside my zone of knowledge, and is somewhat orthogonal to the
topic here.</p>
<h4>2-of-2 with adaptor sig</h4>
<p>Now suppose Bob chooses <code>t</code> s.t. <code>T = t * G</code>, and Bob is going to
provide an adaptor signature for his half of the 2-of-2.</p>
<p>Then:</p>
<ol>
<li>Alice, Bob share <code>P_A, P_B, R_A, R_B</code> as above; Bob gives <code>T</code> to
Alice</li>
<li>Alice and Bob therefore agree on
<code>e = H(J(A, B) || R_A + R_B + T || m)</code> (note difference, <code>T</code>)</li>
<li>Bob provides adaptor <code>s' = r_B + e * x_B'</code> (as in previous section,
not a valid signature, but verifiable)</li>
<li>Alice verifies: <code>s' * G ?= R_B + e * P_B'</code></li>
<li>If OK, Alice sends to Bob her sig: <code>s_A = r_A + e * x_A'</code></li>
<li>Bob completes, atomically releasing <code>t</code>: first, construct
<code>s_B = r_B + t + e * x_B'</code>, then combine: <code>s_agg = s_A + s_B</code> and
broadcast, then Alice sees <code>s_agg</code></li>
<li>Alice subtracts:
<code>s_agg - s_A - s' = (r_B + t + e * x_B') - (r_B + e * x_B') = t</code></li>
</ol>
<p>Thus the desired property is achieved: <code>t</code> is revealed by a validating
"completion" of the adaptor signature.</p>
<p><strong>Note</strong>, however that this has no timing control, Bob can jam the
protocol indefinitely at step 6, forcing Alice to wait (assuming that
what we're signing here is a transaction out of a shared-control
outpoint); this is addressed in the fleshed out protocol in the next
section, though.</p>
<p>For the remainder, we'll call the above 7 steps the 22AS protocol, so
<code>22AS(Bob,t, Alice)</code> for Bob, secret <code>t</code>, and Alice. Bob is listed first
because he holds <code>t</code>.</p>
<p>Since this is the most important part of the construction, we'll
summarize it with a schematic diagram:</p>
<p><img src="/web/20200506162002im_/https://joinmarket.me/static/media/uploads/.thumbnails/22AS.jpg/22AS-1056x816.jpg" width="1056" height="816" alt="22AS protocol" /></p>
<p>So this <code>22AS</code> was a protocol to swap a coin for a secret, to do atomic
swaps we need to extend it slightly: have two transactions atomic via
the same secret <code>t</code>.</p>
<h3>The Atomic Swap construct, using 2-of-2 schnorr + adaptor signatures</h3>
<p>This is now <em>fairly</em> straightforward, inheriting the main design from
the existing "atomic swap" protocol.</p>
<p>A. Alice and Bob agree on a pair of scriptPubkeys which are based on 2
of 2 pubkeys using Schnorr, let's name them using <code>D</code> for destination
address (<code>A</code> is taken by Alice): <code>D_1</code> being 2-2 on (<code>P_A1</code>, <code>P_B1</code>) and
<code>D_2</code> being 2-2 on (<code>P_A2</code>, <code>P_B2</code>). Note that these pubkeys, and
therefore destination addresses, are not dependent in any way on
"adaptor" feature (which is a property only of nonces/sigs, not keys).</p>
<p>B. Alice prepares a transaction TX1 paying 1 coin into <code>D_1</code>, shares
txid_1, and requires backout transaction signature from Bob. Backout
transaction pays from txid_1 to Alice's destination but has locktime
<code>L1</code>.</p>
<p>C. Bob does the (nearly) exact mirror image of the above: prepares TX2
paying 1 coin into <code>D_2</code>, shares txid_2, requires backout transaction
signature from Alice. Backout transaction pays from txid_2 to Bob's
destination with locktime <code>L2</code> which is <em>significantly later</em> than <code>L1</code>.</p>
<p>D. Then Alice and Bob broadcast TX1 and TX2 respectively and both sides
wait until both confirmed. If one party fails to broadcast, the other
uses their backout to refund.</p>
<p>E. If both txs confirmed (N blocks), Alice and Bob follow steps 1-4 of
<code>22AS(Bob, t, Alice)</code> (described in previous section) for some <code>t</code>, for
both the scriptPubkeys <code>D_1</code> and <code>D_2</code>, in parallel, but with the same
secret <code>t</code> in each case (a fact which Alice verifies by ensuring use of
same <code>T</code> in both cases). For the first (<code>D_1</code>) case, they are signing a
transaction spending 1 coin to Bob. For the second, <code>D_2</code>, they are
signing a transaction spending 1 coin to Alice. Note that at the end of
these steps Alice will possess a verified adaptor sig <code>s'</code> for <em>both</em> of
the spend-outs from <code>D_1, D_2</code>.</p>
<p>E(a). Any communication or verification failure in those 1-4 steps (x2),
both sides must fall back to timelocked refunds.</p>
<p>F. The parties then complete (steps 5-7) the first <code>22AS(Bob, t, Alice)</code>
for the first transaction TX1, spending to <code>D_1</code> to give Bob 1 coin.
Alice receives <code>t</code> as per step 7.</p>
<p>F(a). As was mentioned in the previous section, Bob can jam the above
protocol at step 6: if he does, Alice can extract her coins from her
timelocked refund from <code>D_1</code> in the period between <code>L1</code> and <code>L2</code>. The
fact that <code>L2</code> is (significantly) later is what prevents Bob from
backing out his own spend into <code>D_2</code> <em>and</em> claiming Alice's coins from
<code>D_1</code> using the signature provided in step 5. (Note this time asymmetry
is common to all atomic swap variants).</p>
<p>G. (Optionally Bob may transmit <code>t</code> directly over the private channel,
else Alice has to read it from the blockchain (as per above <code>22AS</code>
protocol) when Bob publishes his spend out of <code>D_1</code>).</p>
<p>H. Alice can now complete the equivalent of steps 5-7 without Bob's
involvement for the second parallel run for <code>D_2</code>: she has <code>t</code>, and adds
it to the already provided <code>s'</code> adaptor sig for the transaction paying
her 1 coin from <code>D_2</code> as per first 4 steps. This <code>s' + t</code> is guaranteed
to be a valid <code>s_B</code>, so she adds it to her own <code>s_A</code> to get a valid
<code>s_agg</code> for this spend to her of 1 coin, and broadcasts.</p>
<h2>Summing up</h2>
<h4>Privacy implications</h4>
<p>In absence of backouts being published (i.e. in cooperative case), these
scriptPubkeys will be the same as any other Schnorr type ones (N of N
multisig will not be distinguishable from 1 of 1). The signatures will
not reveal anything about the shared secret <code>t</code>, or the protocol carried
out, so the 2 transaction pairs (pay-in to <code>D_1,D_2</code>, pay out from same)
will not be tied together in that regard.</p>
<p>This construction, then, will (at least attempt to) gain the anonymity
set of all Schnorr sig based transactions. The nice thing about
Schnorr's aggregation win is, even perhaps more than segwit, the
economic incentive to use it will be strong due to the size compaction,
so this anonymity set should be big (although this is all a bit pie in
the sky for now; we're a way off from it being concrete).</p>
<p>The issue of amount correlation, however, has <strong>not</strong> been in any way
addressed by this, of course. It's a sidebar, but one interesting idea
about amount correlation breaking was brought up by Chris Belcher
<a href="https://github.com/AdamISZ/CoinSwapCS/issues/47">here</a>
; this may be a fruitful avenue whatever the flavour of Coinswap we're
discussing.</p>
<h4>Comparison with other swaps</h4>
<p>Since we've now, in this blog post and the previous, seen 3 distinct
ways to do an atomic coin swap, the reader is forgiven for being
confused. This table summarizes the 3 different cases:</p>
<table>
<thead>
<tr>
<th><strong>Type</strong></th>
<th><strong>Privacy on-chain</strong></th>
<th><strong>Separate "backout/refund" transactions for non-cooperation</strong></th>
<th><strong>Requires segwit</strong></th>
<th><strong>Requires Schnorr</strong></th>
<th><strong>Number of transactions in cooperative case</strong></th>
<th><strong>Number of transactions in non-cooperative case</strong></th>
<th><strong>Space on chain</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Atomic swap</td>
<td>None; trivially linkable</td>
<td>None; backout is directly in script</td>
<td>No</td>
<td>No</td>
<td>2 + 2</td>
<td>2 + 2</td>
<td>Medium</td>
</tr>
<tr>
<td>CoinSwap</td>
<td>Anonymity set: 2 of 2 transactions (+2 of 3 depending on setup)</td>
<td>Presigned backouts using H(X) and CLTV, break privacy if used</td>
<td>Yes</td>
<td>No</td>
<td>2 + 2</td>
<td>3 + 3</td>
<td>Large-ish</td>
</tr>
<tr>
<td>Scriptless script</td>
<td>Anonymity set: all Schnorr transactions</td>
<td>Presigned backouts uslng locktime; semi-break privacy (other txs may use locktime)</td>
<td>Yes</td>
<td>Yes</td>
<td>2 + 2</td>
<td>2 + 2</td>
<td>Small</td>
</tr>
</tbody>
</table>
<p>The reason that there are "3 + 3" transactions in the non-cooperative
case for CoinSwap is, in that case, both sides pay into a 2-of-2, then
in non-cooperation, they must both spend into the custom "HTLC" (IF
hash, pub, ELSE CLTV, pub), and then redeem *out* of it.</p>
<p>A fundamental difference for the latter 2 cases, compared with the
first, is they must pay into shared-ownership 2 of 2 outputs in the
pay-in transaction; this is to allow backout transactions to be arranged
(a two-party multi-transaction contract requires this; see e.g.
Lightning for the same thing). The first, bare atomic swap is a single
transaction contract, with the contract condtions embedded entirely in
that one transaction(for each side)'s scriptPubKey.</p>
<p>Finally, size on-chain of the transactions is boiled down to
hand-waving, because it's a bit of a complex analysis; the first type
always uses a large redeem script but one signature on the pay-out,
whether cooperative or non-cooperative; the second uses 2 or 3
signatures (assuming something about how we attack the anonymity set
problem) but no big redeem script in cooperative case, while takes up a
*lot* of room in the non-cooperative case, the third is always compact
(even non-cooperative backouts take no extra room).
Schnorr-sig-scriptless-scripts are the big winner on space.</p>
<h4>Extending to multi-hop; Lightning, Mimblewimble</h4>
<p>The first time I think this was discussed was in the mailing list post
<a href="https://lists.launchpad.net/mimblewimble/msg00086.html%20">here</a>,
which discusses how conceivably one could achieve the same setup as HTLC
for Mimblewimble lightning, using this scriptless-script-atomic-swap.
Doubtless these ideas are a long way from being fleshed out, and I
certainly haven't kept up with what's going on there :)</p>
<h4>Other applications of the scriptless script concept</h4>
<p>As a reminder, this document was just about fleshing out how the atomic
swap gets done in a Schnorr-signature-scriptless-script world; the
<a href="https://download.wpsoftware.net/bitcoin/wizardry/mw-slides/2017-05-milan-meetup/slides.pdf">slides</a>
give several other ideas that are related. Multisignature via
aggregation is of course part of it, and is already included even in the
above protocol (for 2 of 2 as a subset of N of N); earlier ideas like
pay-to-contract-hash and sign-to-contract-hash already exist, and don't
require Schnorr, but share a conceptual basis; same for ZKCP, etc.</p>
<h4>Cross chain swap</h4>
<p>I admit to not sharing <em>quite</em> the same breathless excitement about
cross-chain swaps as some people, but it is no doubt very interesting,
if somewhat more challenging (not least because of different "clocks"
(block arrivals) affecting any locktime analysis and confirmation
depth). Poelstra has however also made the very intriguing point that it
is <strong>not</strong> actually required for the two blockchains to be operating on
the same elliptic curve group for the construction to work.</p>SNICKER2017-09-15T00:00:00+02:002017-09-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2017-09-15:/blog/blog/snicker/<p>a proposal for non-interactive coinjoins.</p><h3>SNICKER</h3>
<h2>SNICKER - Simple Non-Interactive Coinjoin with Keys for Encryption Reused</h2>
<p>I'm going to do this backwards - start with the end goal user
experience, and then work backwards to the technical design. This way,
those not wanting to get lost in technical details can still get the
gist.</p>
<h3><img alt="Me misusing a meme as a symbol and not adding any text." height="330" src="../../../../../../20200510162733im_/https:/joinmarket.me/static/media/uploads/.thumbnails/evilplanbaby.jpg/evilplanbaby-400x330.jpg" width="400"></h3>
<p><em>Pictured above: me misusing a meme as a symbol and deliberately not
adding any text to it.</em></p>
<h3><strong>Scenario</strong></h3>
<p><strong>Alisa</strong> lives in Moscow; she is a tech-savvy Bitcoin user, uses Linux
and the command line, and runs a fully verifying Bitcoin Core node. She
doesn't have indexing enabled, but she (sometimes, or long-running)
runs a tool called <code>snicker-scan</code> on the blocks received by her node. It
scans recent Bitcoin blocks looking for transactions with a particular
pattern, and returns to her in a file a list of candidate transactions.
She pipes this list into another tool which uses her own Bitcoin wallet
and constructs proposals: new transactions involving her own utxos and
utxos from these newly found transactions, which she signs herself.
Then, for each one, she makes up a secret random number and sends (the
proposed transactions + the secrets), encrypted to a certain public key,
in each case, so no one but the owner can read it, to a Tor hidden
service which accepts such submissions. For now, her job is done and she
gets on with her day.</p>
<p><strong>Bob</strong> lives in New York. He's a Bitcoin enthusiast who uses it a lot,
and likes to test out new features, but has never written code and
isn't tech-savvy like that. A few hours after Alisa went to bed he
opens one of his mobile wallets and a message pops up:
<code>New coinjoin proposals found. Check?</code>. He heard about this, and heard
that you can improve your privacy with this option, and even sometimes
gain a few satoshis in the process. So he clicks <code>Yes</code>. In the
background his mobile wallet downloads a file of some 5-10MB (more on
this later!). Bob did this once before and was curious about the file;
when he opened it he saw it was text with lots of unintelligible
encrypted stuff like this:</p>
<p><code>QklFMQOVXvpqgjaJFm00QhuJ1iWsnYYV4yJLjE0LaXa8N8c34Hzg5CeQduV.....</code>\
<code>QklFMQI2JR50dOGEQdDdmeX0BwMH4c+yEW1v5/IyT900WBGdYRA/T5mqBMc.....</code></p>
<p>Now his mobile does some processing on this file; it takes a little
while, some seconds perhaps, processing in the background. At the end it
pops up a new message:
<code>Coinjoin transaction found. Would you like to broadcast it?</code> and
underneath it shows the transaction spending 0.2433 BTC out of his
wallet and returning 0.2434 BTC in one of the outputs. It shows that the
other inputs and outputs are not his, although one of them is also for
0.2434 BTC. Does he want to accept? Sure! Free money even if it's only
cents. Even with no free money, he knows that coinjoin makes his privacy
better. So he clicks <code>Yes</code> and it's broadcast. Done.</p>
<h3>The NIC in SNICKER</h3>
<p>Non-interactivity is a hugely desirable property in protocols; this is
particularly the case where privacy is a priority. Firstly, it avoids
the need to synchronize (<strong>Alisa</strong>, and her computer, had gone to sleep
when <strong>Bob</strong> performed his step). Second, to avoid malicious
interruption of an interactive protocol, it can help to identify the
participants, but that is very damaging to the whole point of a protocol
whose goal is privacy. Non-interactivity cuts this particular Gordian
knot; one side can send the message anonymously and the other
participant simply uses the data, but this has the limitation of the
sender finding the receiver, which means some weak identification of the
latter. Even better is if the request can be sent encrypted to the
receiver, then it can be broadcast anywhere for the receiver to notice.
That latter model is the most powerful, and is used here, but it does
have practicality drawbacks as we'll discuss.</p>
<p>So, note that in the above scenario <strong>Alisa</strong> and <strong>Bob</strong> do not meet,
do not synchronize, and need never meet or find out who each other are
in future either. Their "meeting" is entirely abstracted out to one
side publishing an encrypted message and the other side receiving <em>all</em>
such encrypted messages and only reading the one(s) encrypted to his
pubkey. The <em>all</em> part helps preserve Bob's privacy, if he finds a way
to broadcast the final transaction with a reasonable anonymity defence
(see e.g.
<a href="https://github.com/gfanti/bips/blob/master/bip-dandelion.mediawiki">Dandelion</a>;
I'm of the opinion that that battle - making Bitcoin transaction
broadcast anonymous - is something we <em>will</em> win, there is a massive
asymmetry in favour of the privacy defender there).</p>
<h3>Quick background - how to do a Coinjoin</h3>
<p>Here's the obligatory
<a href="https://bitcointalk.org/index.php?topic=279249.0">link</a>
to the Coinjoin OP. You can skip this section if you know Coinjoin well.</p>
<p>Otherwise, I'll give you a quick intro here, one that naturally leads
into the SNICKER concept:</p>
<p>Each input to a transaction requires (for the transaction to be valid) a
signature by the owner of the private key (using singular deliberately,
restricting consideration to p2pkh or segwit equivalent here) over a
message which is \~ the transaction. Each of these signatures can be
constructed separately, by separate parties if indeed the private key
for each input are owned by separate parties. The "normal" coinjoining
process thus involves the following steps (for now, not specifying <em>who</em>
carries out each step):</p>
<ul>
<li>Gather all of the inputs - the utxos that will be spent</li>
<li>Gather all of the destination addresses to various parties, and the
amounts to be paid</li>
<li>Distribute a "template" of the transaction to all parties (i.e.
the transaction without any signatures)</li>
<li>In some order all of the parties sign the transaction; whomever has
a transaction with all signatures complete, can broadcast it to the
Bitcoin network</li>
</ul>
<p>There are different protocols one can choose to get all these steps
done, ranging from simple to complex. A server can be the coordinating
party; blinding can be used to prevent the server knowing input-output
mapping.
<a href="http://crypsys.mmci.uni-saarland.de/projects/CoinShuffle/">Coinshuffle</a>
can be used, creating a kind of onion-routing approach to prevent
parties involved knowing the linkages (doesn't require a server to
coordinate, but requires more complex interactivity). One of the parties
in the join can be the "server", thus that party gains privacy that
the others don't (Joinmarket). Etc.</p>
<p>The difficulties created by any interactivity are considerably
ameliorated in a client-server model (see e.g. the old blockchain.info
<a href="https://en.bitcoin.it/wiki/Shared_coin">SharedCoin</a>(link
outdated) model), the serious tradeoff is the server knowing too much,
and/or a coordination/waiting problem (which may be considered
tolerable; see both SharedCoin and
<a href="https://github.com/darkwallet/darkwallet">DarkWallet</a>;
with a sufficient liquidity pool the waiting may be acceptable).</p>
<p>There are a lot of details to discuss here, but there is always <em>some</em>
interactivity (you can only sign once you know the full transaction,
assuming no custom sighashing^1^), and a model with a server is
basically always going to be more problematic, especially at scale.</p>
<p>So hence we try to construct a way of doing at least simple Coinjoins,
in at least some scenarios, without any server requirement or
coordination. Now I'll present the basic technical concept of how to do
this in SNICKER, in 2 versions.</p>
<h3>First version - snicKER = Keys for Encryption Reused</h3>
<p>To make the Coinjoin non-interactive, we need it to be the case that
Alisa can post a message for Bob, without explicitly requesting to
create a private message channel with him. This requires encrypting a
message that can then be broadcast (e.g. over a p2p network or on a
bulletin board).</p>
<p><em>(In case it isn't clear that either encryption or a private message
channel is required, consider that Alice must pass to Bob a secret which
identifies Bob's output address (explained below), critically, and also
her signature, which is on only her inputs; if these are seen in public,
the input-output linkages are obvious to anyone watching, defeating the
usual purpose of Coinjoin.)</em></p>
<h5>Encryption</h5>
<p>To achieve this we need a public key to encrypt a message to Bob. This
is the same kind of idea as is used in tools like PGP/gpg - only the
owner of the public key's private key can read the message.</p>
<p>In this "First version" we will assume something naughty on Bob's
part: that he has <strong>reused an address</strong>! Thus, a public key will exist
on the blockchain which we assume (not guaranteed but likely; nothing
dangerous if he doesn't) he still holds the private key for.</p>
<p>Given this admittedly unfortunate assumption, we can use a simple and
established encryption protocol such as
<a href="https://en.wikipedia.org/wiki/Integrated_Encryption_Scheme">ECIES</a>
to encrypt a message to the holder of that public key.</p>
<p>Alisa, upon finding such a pubkey, call it <code>PB</code>, and noting the
corresponding utxo <code>UB</code>, will need to send, ECIES encrypted to <code>PB</code>,
several items (mostly wrapped up in a transaction) to Bob to give him
enough material to construct a valid coinjoin without any interaction
with herself:</p>
<ul>
<li>Her own utxos (just <code>UA</code> for simplicity)</li>
<li>Her proposed destination address(s)</li>
<li>Her proposed amounts for output</li>
<li>Her proposed bitcoin transaction fee</li>
<li>The full proposed transaction template using <code>UA</code> and <code>UB</code> as inputs
(the above 4 can be implied from this)</li>
<li>Her own signature on the transaction using the key for <code>UA</code></li>
<li>Her proposed destination address <strong>for Bob</strong>.</li>
</ul>
<h4>Destination</h4>
<p>The last point in the above list is of course at first glance not
possible, unless you made some ultra dubious assumptions about shared
ownership, i.e. if Alisa somehow tried to deduce other addresses that
Bob already owns (involving <em>more</em> address reuse). I don't dismiss this
approach <em>completely</em> but it certainly looks like a bit of an ugly mess
to build a system based on that. Instead, we can use a very well known
construct in ECC; in English something like "you can tweak a
counterparty's pubkey by adding a point that <em>you</em> know the private key
for, but you still won't know the private key of the sum". Thus in
this case, Alice, given Bob's existing pubkey <code>PB</code>, which is the one
she is using to encrypt the message, can construct a new pubkey:</p>
<div class="highlight"><pre><span></span><code><span class="err">PB2 = PB + k*G</span>
</code></pre></div>
<p>for some 32 byte random value <code>k</code>.</p>
<p>Alice will include the value of <code>k</code> in the encrypted message, so Bob can
verify that the newly proposed destination is under his control (again
we'll just assume a standard p2pkh address based on <code>PB2</code>, or a segwit
equivalent).</p>
<p>Assuming Bob somehow finds this message and successfully ECIES-decrypts
it using the private key of <code>PB</code>, he now has everything he needs to (if
he chooses), sign and broadcast the coinjoin transaction.</p>
<h4>A protocol for the most naive version, in broad strokes:</h4>
<ol>
<li>Alisa must have the ability to scan the blockchain to some extent;
she must find scriptSigs or witnesses containing pubkeys which were
later reused in new addresses/scriptPubKeys.</li>
<li>Alisa will use some kind of filtering mechanism to decide which are
interesting. The most obvious two examples are: amounts under
control in Bob's utxos matching her desired range, and perhaps age
of utxos (so likely level of activity of user) or some watermarking
not yet considered.</li>
<li>Having found a set of potential candidates, for each case <code>PB, UB</code>:
Construct a standard formatted message; here is a simple suggestion
although in no way definitive:</li>
</ol>
<div class="highlight"><pre><span></span><code><span class="err"><!-- --></span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="err"> 8(?) magic bytes and 2 version bytes for the message type</span>
<span class="err"> k-value 32 bytes</span>
<span class="err"> Partially signed transaction in standard Bitcoin serialization</span>
<span class="err"> (optionally padding to some fixed length)</span>
</code></pre></div>
<p>We defer discussing how in practice Bob will get access to the message
later; but note that if he has done this, he already knows the value of
<code>P_B</code> and will thus know also <code>U_B</code>. He ECIES-decrypts it, and
recognizes it's for him through correct magic bytes (other messages
encrypted to other pubkeys will come out random).</p>
<p>Then, this format has sufficient information for Bob to evaluate easily.
First, he can verify that <code>U_B</code> is in the inputs. Then he can verify
that for 1 of the 2 outputs (simple model) has a scriptPubKey
corresponding to <code>PB2 = PB + k*G</code>. He can then verify the output amounts
fit his requirements. Finally he can verify the ECDSA signature provided
on <code>U_A</code> (hence "partially signed transaction"). Given this he can, if
he chooses, sign on <code>UB</code> using <code>PB</code> and broadcast. He must of course
keep a permanent record of either <code>k</code> itself or, more likely, the
private key <code>k + x</code> (assuming <code>P = x * G</code>).</p>
<h3>A proof-of-concept</h3>
<p>Before going further into details, and discussing the second (probably
superior but not as obviously workable) version of SNICKER, I want to
mention that I very quickly put together some proof of concept code in
<a href="https://github.com/AdamISZ/SNICKER-POC">this github
repo</a>;
it uses
<a href="https://github.com/Joinmarket-Org/joinmarket-clientserver">Joinmarket-clientserver</a>
as a dependency, implements ECIES in a compatible form to that used by
<a href="https://electrum.org/">Electrum</a>,
and allows testing on regtest or testnet, admittedly with a bunch of
manual steps, using the python script <code>snicker-tool.py</code>. The workflow
for testing is in the README. To extend the testing to more wallets
requires some way to do ECIES as well as some way to construct the
destination addresses as per <code>PB2 = PB + kG</code> above. I did note that,
usefully, the partially signed transactions can be signed directly in
Bitcoin Core using <code>signrawtransaction</code> and then <code>sendrawtransaction</code>
for broadcast, but note that somehow you'll have to recover the
destination address, as receiver, too. Note that there was no attempt at
all to construct a scanning tool for any reused-key transactions here,
and I don't intend to do that (at least, in that codebase).</p>
<h2>Practical issues</h2>
<p>In this section will be a set of small subsections describing various
issues that will have to be addressed to make this work.</p>
<h3>Wallet integration</h3>
<p>One reason this model is interesting is because it's much more
plausible to integrate into an existing wallet than something like
Joinmarket - which requires dealing with long term interactivity with
other participants, communicating on a custom messaging channel,
handling protocol negotiation failures etc. To do SNICKER as a receiver,
a wallet needs the following elements:</p>
<ul>
<li>ECIES - this is really simple if you have the underlying secp256k1
and HMAC dependencies; see
<a href="https://github.com/spesmilo/electrum/blob/master/lib/bitcoin.py#L774-L817">here</a>
and
<a href="https://github.com/AdamISZ/SNICKER-POC/blob/master/ecies/ecies.py#L10-L50">here</a>;
note that the root construction in ECIES is ECDH.</li>
<li>The ability to calculate <strong>and store</strong> the newly derived keys of the
form <code>P' = P + kG</code> where <code>k</code> is what is passed to you, and <code>P</code> is
the pubkey of your existing key controlling the output to be spent.
I would presume that you would have to treat <code>k+x</code>, where <code>P=xG</code>, as
a newly imported private key. Note that we <em>cannot</em> use a
deterministic scheme for this from <code>P</code>, since that would be
calculatable by an external observer; it must be based on a secret
generated by "Alisa".This could be a bit annoying for a wallet,
although of course it's easy in a naive sense.</li>
<li>Ability to parse files containing encrypted coinjoin proposals in
the format outlined above - this is trivial.</li>
<li>Ability to finish the signing of a partially signed transaction.
Most wallets have this out of the box (Core does for example); there
might be a problem for a wallet if it tacitly assumes complete
ownership of all inputs.</li>
</ul>
<p>If a wallet only wanted to implement the receiver side (what we called
"Bob" above), that's it.</p>
<h4>Compatibility/consensus between different wallets</h4>
<p>The only "consensus" part of the protocol is the format of the
encrypted coinjoin proposals (and the ECIES algorithm used to encrypt
them). We could deal with different transaction types being proposed
(i.e. different templates, e.g. 3 outputs or 4, segwit or not), although
obviously it'll be saner if there are a certain set of templates that
everyone knows is acceptable to others.</p>
<h3>Notes on scanning for candidates</h3>
<p>There is no real need for each individual "Alisa" to scan, although
she might wish to if she has a Bitcoin node with indexing enabled. This
is a job that can be done by any public block explorer and anyone can
retrieve the data, albeit there are privacy concerns just from you
choosing to download this data. The data could be replicated on Tor
hidden services for example for better privacy. So for now I'm assuming
that scanning, itself, is not an issue.</p>
<p>A much bigger issue might be finding <strong>plausible</strong> candidates. Even in
this version 1 model of looking only for reused keys, which are
hopefully not a huge subset of the total utxo set, there are tons of
potential candidates and, to start with, none of them at all are
plausible. How to filter them?</p>
<ul>
<li>Filter on amount - if Alisa has X coins to join, she'll want to
work with outputs \< X.</li>
<li>Filter on age - this is more debatable, but very old utxos are less
likely to be candidates for usage.</li>
<li>An "active" filter - this is more likely to be how things work.
Are certain transactions intrinsically watermarked in a way that
indicates that the "Bob" in question is actually interested in
this function? One way this can happen is if we know that the
transaction is from a certain type of wallet, which already has this
feature enabled.</li>
</ul>
<h4>Bootstrapping</h4>
<p>If a set of users were using a particular wallet or service (preferably
a <em>large</em> set), it might be possible to identify their transactions
"Acme wallet transactions". Funnily enough, Joinmarket, because it
uses a set and unusual coinjoin pattern, satisfies this property in a
very obvious way; but there might be other cases too. See the notes in
"second version", below, on how Joinmarket might work specifically in
that case.</p>
<p>Better of course, is if we achieved that goal with a more user-friendly
wallet with a much bigger user-base; I'd ask wallet developers to
consider how this might be achieved.</p>
<p>Another aspect of bootstrapping is the Joinmarket concept - i.e. make a
financial incentive to help bootstrap. If creators/proposers are
sufficiently motivated they may offer a small financial incentive to
"sweeten the pot", as was suggested in the scenario at the start of
this post. This will help a lot if you want the user-set to grow
reasonably large.</p>
<h3>Scalability</h3>
<p>This is of course filed under "problems you really want to have", but
it's nevertheless a very real problem, arguably the biggest one here.</p>
<p>Imagine 10,000 utxo candidates that are plausible and 1000 active
proposers. Imagine they could all make proposals for a large-ish subset
of the total candidates, we could easily imagine 1,000,000 candidates at
a particular time. Each encrypted record takes 500-800 bytes of space,
let's say. Just the data transfer starts to get huge - hundreds of
megabytes? Perhaps this is not as bad as it looks, <em>if</em> the data is
being received in small amounts over long periods.</p>
<p>And let's say we can find a way to get the data out to everybody - they
still have to try to decrypt <strong>every</strong> proposal with <strong>every</strong> pubkey
they have that is a valid candidate (in version 1, that's reused keys,
let's say, or some subset of them). The computational requirement of
that is huge, even if some cleverness could reduce it (decrypt only one
AES block; use high performance C code e.g. based on libsecp256k1).
Again, perhaps if this is happening slowly, streamed over time, or in
chunks at regular integrals, it's not as bad. Still.</p>
<p>It's true that these problems don't arise at small scale, but then the
real value of this would be if it scaled up to large anonymity sets.</p>
<p>Even if this is addressed, there is another problem arising out of the
anonymous submission - any repository of proposals could be filled with
junk, to waste everyone's time. Apart from a
<a href="https://en.wikipedia.org/wiki/Hashcash">hashcash</a>-like
solution (not too implausible but may impose too much cost on the
proposer), I'm not sure how one could address that while keeping
submission anonymity.</p>
<p>At least we have the nice concept that this kind of protocol can improve
privacy on Bitcoin's blockchain without blowing up bandwidth and
computation for the Bitcoin network itself - it's "off-band", unlike
things like <a href="https://www.elementsproject.org/elements/confidential-transactions/investigation.html">Confidential
Transactions</a>
(although, of course, the effect of that is much more powerful). I think
ideas that take semantics and computation off chain are particularly
interesting.</p>
<h3>Conflicting proposals</h3>
<p>This is not really a problem: if Alisa proposes a coinjoin to Bob1 and
Bob2, and Bob1 accepts, then when Bob2 checks, he will find one of the
inputs for his proposed coinjoin is already spent, so it's not valid.
Especially in cases where there is a financial incentive, it just
incentives Bobs to be more proactive, or just be out of luck.</p>
<h3>Transaction structure and 2 party joins</h3>
<p>We have thus far talked only about 2 party coinjoins, which <em>ceteris
paribus</em> are an inferior privacy model compared to any larger number
(consider that in a 2 party coinjoin, the <em>other</em> party necessarily
knows which output is yours). The SNICKER model is not easily extendable
to N parties, although it's not impossible. But DarkWallet used 2 of 2
joins, and it's still in my opinion valuable. Costs are kept lower, and
over time these joins heavily damage blockchain analysis. A larger
number of joins, and larger anonymity set could greatly outweigh the
negatives<em>.</em></p>
<p>Structure: the model used in the aforementioned
<a href="https://github.com/AdamISZ/SNICKER-POC">POC</a>,
although stupid simple, is still viable: 2 inputs, one from each party
(easily extendable to 1+N), 3 outputs, with the receiver getting back
exactly one output of \~ the same size as the one he started with. The
proposer then has 1 output of exactly that size (so 2 equal outputs) and
one change. Just as in Joinmarket, the concept is that fungibility is
gained specifically in the equal outputs (the "coinjoin outputs"); the
change output is of course trivially linked back to its originating
input(s).</p>
<p>But there's no need for us to be limited to just one transaction
structure; we could imagine many, perhaps some templates that various
wallets could choose to support; and it'll always be up to the receiver
to decide if he likes the structure or not. Even the stupid X->X, Y->Y
"coinjoin" I mused about in my Milan presentation
<a href="https://youtu.be/IKSSWUBqMCM?t=47m21s">here</a>(warning:youtube)
might be fun to do (for some reason!). What a particularly good or
"best" structure is, I'll leave open for others to discuss.</p>
<h3>Second version - snicKER = Keys Encrypted to R</h3>
<p>We've been discussing all kinds of weird and whacky "Non-Interactive
Coinjoin" models on IRC for years; and perhaps there will still be
other variants. But arubi was mentioning to me yesterday that he was
looking for a way to achieve this goal <em>without</em> the nasty requirement
of reused keys, and between us we figured out that it is a fairly
trivial extension, <em>if</em> you can find a way to get confidence that a
particular existing utxo is co-owned with an input (or any input).
That's because if you have an input, you have not only a pubkey, but
also a <strong>signature</strong> (both will either be stored in the scriptSig, or in
the case of segwit, in the witness section of the transaction). An
<a href="https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm">ECDSA</a>
signature is published on the blockchain as a pair: <code>(r, s)</code>, where <code>r</code>
is the x-coordinate of a point <code>R</code> on the secp256k1 curve. Now, any
elliptic curve point can be treated as a pubkey, assuming someone knows
the private key for it; in the case of ECDSA, we call the private key
for <code>R</code>, <code>k</code>, that is: <code>R = kG</code>. <code>k</code> is called the nonce (="number used
once"), and is usually today calculated using the algorithm
<a href="https://tools.ietf.org/html/rfc6979">RFC6979</a>,
which determines its value deterministically from the private key
you're signing with, and the message. But what matters here is, the
signer either already knows <code>k</code>, or can calculate it trivially from the
signing key and the transaction. This provides us with exactly the same
scenario as in the first version; Bob knows the private key of <code>R</code>, so
Alisa can send a proposal encrypted to that public key, and can derive a
new address for Bob's destination using the same formula:</p>
<div class="highlight"><pre><span></span><code><span class="err">PB2 = R + k'G</span>
</code></pre></div>
<p>Here I used <code>k'</code> to disambiguate from the signature nonce <code>k</code>, but it's
exactly the same as before. As before, Bob, in order to spend the output
from the coinjoin, will need to store the new private key <code>k+k'</code>. For a
wallet it's a bit more work because you'll have to keep a record of
past transaction <code>k</code> values, or perhaps keep the transactions and
retrieve <code>k</code> as and when. Apart from that, the whole protocol is
identical.</p>
<h4>Finding candidates in the second version</h4>
<p>In version 2, we no longer need Bob to do something dubious (reusing
addresses). But now the proposer (Alisa) has a different and arguably
harder problem than before; she has to find transactions where she has
some reasonable presumption that a specific output and a specific input
are co-owned. You could argue that this is good, because now Alisa is
proposing coinjoins where linkages <em>are</em> known, so she's improving
privacy exactly where it's needed :) (only half true, but amusing). In
a typical Bitcoin transaction there are two outputs - one to
destination, one change; if you can unambiguously identify the change,
even with say 90% likelihood not 100%, you could make proposals on this
basis. This vastly expands the set of <em>possible</em> candidates, if not
necessarily plausible ones (see above on bootstrapping).</p>
<p>Additionally paradoxical is the fact that Joinmarket transactions <em>do</em>
have that property! The change outputs are unambiguously linkable to
their corresponding inputs through subset-sum analysis, see e.g.
<a href="https://github.com/AdamISZ/JMPrivacyAnalysis/blob/master/tumbler_privacy.md#jmsudoku-coinjoin-sudoku-for-jmtxs">here</a>.</p>
<p>Thus, Adlai Chandrasekhar's
<a href="http://adlai.uncommon-lisp.org:5000/">cjhunt</a>
tool (appears down as of writing),
<a href="https://github.com/adlai/cjhunt">code</a>,
identifies all very-likely-to-be Joinmarket transactions through
blockchain scanning, and its output could be used to generate candidates
(the proposed joins could be with those change outputs, using the `R`
values from one of the identified-as-co-owned inputs). See also
<a href="https://citp.github.io/BlockSci/chain/blockchain.html">BlockSci</a>.
Then if Joinmarket had both proposer- and receiver- side code
integrated, it would create a scenario where these type of coinjoins
would most likely be quite plausible to achieve.</p>
<h3>Conclusion</h3>
<p>I think this idea might well be viable. It's simple enough that there
aren't likely crypto vulnerabilities. The short version of the pros and
cons:</p>
<h4>Pros</h4>
<ul>
<li>No interactivity (the point), has many positive consequences, and
high anonymity standard</li>
<li>Relative ease of wallet integration (esp. compared to e.g.
Joinmarket), consensus requirement between them is limited.</li>
<li>Potentially huge anonymity set (different for version 1 vs version
2, but both very large)</li>
</ul>
<h4>Cons</h4>
<ul>
<li>For now only 2 parties and probably stuck there; limited coinjoin
model (although many transaction patterns possible).</li>
<li>Finding plausible candidates is hard, needs a bootstrap</li>
<li>Sybil attack on the encrypted messages; how to avoid the "junk
mail" problem</li>
</ul>
<p>Lastly, it should be fine with Schnorr (to investigate: aggregation in
this model), in version 1 and version 2 forms.</p>
<h3>Footnotes</h3>
<p>1. Sighashing - attempting a non-interactive coinjoin with some
interesting use of <code>SIGHASH_SINGLE</code> and <code>SIGHASH_ANYONECANPAY</code> seems at
least plausible (see
<a href="https://en.bitcoin.it/wiki/OP_CHECKSIG#Procedure_for_Hashtype_SIGHASH_SINGLE">here</a>),
although it's not exactly heartening that no one ever uses
<code>SIGHASH_SINGLE</code> (and its rules are arcane and restrictive), not to even
speak of watermarking. Hopefully the idea expressed here is better.</p>P(o)ODLE2016-06-15T00:00:00+02:002016-06-15T00:00:00+02:00Adam Gibsontag:joinmarket.me,2016-06-15:/blog/blog/poodle/<p>DLEQ proofs as tokens for anti-snooping in coinjoin</p><h3>P(o)ODLE</h3>
<blockquote>
<p><em>Here is a purse of monies ... which I am not going to give to you.</em></p>
</blockquote>
<p>- <a href="https://en.wikipedia.org/wiki/Bells_(Blackadder)">Edmund
Blackadder</a></p>
<p><img src="/web/20200712194227im_/https://joinmarket.me/static/media/uploads/.thumbnails/poodle.jpeg/poodle-225x308.jpeg" width="225" height="308" /></p>
<h3>P(o)ODLE, not POODLE</h3>
<p>This post, fortunately, has nothing to do with faintly ridiculous <a href="https://en.wikipedia.org/wiki/POODLE">SSL 3
downgrade
attacks</a>.
Irritatingly, our usage here has no made-up need for the parenthetical
(o), but on the other hand "podle" is not actually a word.</p>
<h3>The problem</h3>
<p>You're engaging in a protocol (like Joinmarket) where you're using
bitcoin utxos regularly. We want to enforce some scarcity; you can't use
the same utxo more than once, let's say. Utxos can be created all the
time, but at some cost of time and money; so it can be seen as a kind of
rate limiting.</p>
<p>So: you have a bitcoin utxo. You'd like someone else to know that you
have it, <strong>and that you haven't used it before, with them or anyone
else</strong>, <strong>in this protocol,</strong> but you don't want to show it to them. For
that second property (hiding), you want to make a <em>commitment</em> to the
utxo. Later on in the protocol you will open the commitment and reveal
the utxo.</p>
<p>Now, a <a href="https://en.wikipedia.org/wiki/Commitment_scheme">cryptographic
commitment</a>
is a standard kind of protocol, usually it works something like:</p>
<div class="highlight"><pre><span></span><code><span class="err">Alice->Bob: commit: h := hash(secret, nonce)</span>
<span class="err">(do stuff)</span>
<span class="err">Alice->Bob: open: reveal secret, nonce</span>
<span class="c">Bob: verify: h =?= hash(secret, nonce)</span>
</code></pre></div>
<p>Hashing a secret is <em>not</em> enough to keep it secret, at least in general:
because the verifier might be able to guess, especially if the data is
from a small-ish set (utxos in bitcoin being certainly a small enough
set; and that list is public). So usually, this protocol, with a
large-enough random nonce, would be enough for the purposes of proving
you own a bitcoin utxo without revealing it.</p>
<p>But in our case it doesn't suffice - because of the bolded sentence in
the problem description. You could pretend to commit to <em>different</em>
utxos at different times, simply by using different nonces. If you tried
to do that <em>just with me</em>, well, no big deal - I'll just block your
second use. But you <em>could </em>use the same utxos with different
counterparties, and they would be none the wiser, unless they all shared
all private information with each other. Which we certainly don't want.</p>
<p>Contrariwise, if you ditch the nonce and just use Hash(utxo) every time
to every counterparty, you have the failure-to-hide-the-secret problem
mentioned above.</p>
<p>In case you didn't get that: Alice wants to prove to Bob and Carol and
... that she owns utxo \(U\), and she never used it before. Bob and
Carol etc. are keeping a public list of all previously used commitments
(which shouldn't give away what the utxo is, for privacy). If she just
makes a commitment: Hash(\(U +\) nonce) and sends it to Bob and Carol,
they will check and see it isn't on the public list of commitments and
if not, OK, she can open the commitment later and prove honest action.
But her conversations with Bob and Carol are separate, on private
messaging channels. How can Bob know she didn't use <em>the same utxo as
previously used with Carol, but with a different nonce</em>?</p>
<h3>The solution</h3>
<p>This is a bit of a headscratcher; after several IRC discussions, Greg
Maxwell suggested the idea of <strong>proof of discrete logarithm
equivalence</strong> (hence the title), and pointed me at <a href="http://crypto.stackexchange.com/questions/15758/how-can-we-prove-that-two-discrete-logarithms-are-equal">this
crypo.stackexchange
thread</a>.
It's a cool idea (although note that that description is based on DL
rather than ECDL seen here): "shift" the EC point to a new
base/generator point, so that nobody else can read (crudely put), then
append a Schnorr signature acting as proof that the two points have the
same discrete logarithm (= private key) with respect to the two base
points. In detail, consider a Bitcoin private, public keypair \((x,
P)\) for the usual base point/generator \(G\), and consider a
<a href="https://en.wikipedia.org/wiki/Nothing_up_my_sleeve_number">NUMS</a>
alternative generator \(J\) ( a little more on this later).</p>
<p>$$P = xG$$</p>
<p>$$P_2 = xJ$$</p>
<p>Next, Alice will provide her commitment as \(H(P_2)\) in the
handshake initiation stage of the protocol. Then, when it comes time for
Alice to request private information from Bob, on their private message
channel, she will have to open her commitment with this data:</p>
<p>$$P, U, P_2, s, e$$</p>
<p>Here \(s,e\) are a Schnorr signature proving equivalence of the
private key (we called it \(x\) above) with respect to \(G,J\), but
of course without revealing that private key. It is constructed, after
choosing a random nonce \(k\), like this:</p>
<p>$$K_G = kG$$</p>
<p>$$K_J = kJ$$</p>
<p>$$e = H(K_G || K_J || P || P_2)$$</p>
<p>$$s = k + xe$$</p>
<p>Then Bob, receiving this authorisation information, proceeds to verify
the commitment before exchanging private information:</p>
<ol>
<li>Does \(H(P_2)\) equal the previously provided commitment? If yes:</li>
<li>Check that the commitment is not repeated on the public list (or
whatever the policy is)</li>
<li>Verify via the blockchain that \(P\) matches the utxo \(U\)</li>
<li>\(K_G = sG - eP\)</li>
<li>\(K_J = sJ - eP_2\)</li>
<li>Schnorr sig verify operation: Does \(H(K_G || K_J || P ||
P_2) = e\) ?</li>
</ol>
<p>Bob now knows that the utxo \(U\) has not been repeated (the simplest
policy) but Alice has not been exposed to a potential public leakage of
information about the utxo. (It should be noted of course! Bob knows the
utxo from now on, but that's for another discussion about Coinjoin
generally...)</p>
<h3>Why an alternate generator point \(J\)?</h3>
<p>Publishing \(H(P_2)\) gives no information about \(P\), the actual
Bitcoin pubkey that Alice wants to use; in that sense it's the same as
using a nonce in the commitment. But it also gives her no degree of
freedom, as a nonce does, to create different public values for the same
hidden pubkey. No one not possessing \(x\) can deduce \(P\) from
\(P_2\) (or vice versa, for that matter) - <strong>unless</strong> they have the
private key/discrete log of \(J\) with respect to \(G\). If anyone
had this number \(x^*\) such that \(J = x^{*}G\), then it would be
easy to make the shift from one to the other:</p>
<p>$$P_2 = xJ = x(x^{*}G) = x^{*}(xG) = x^{*}P$$</p>
<p>and apply a modular inverse if necessary.</p>
<p>This is why the concept of NUMS is critical. The construction of a NUMS
alternate generator is discussed in <a href="https://elementsproject.org/elements/confidential-transactions/">the same CT doc as
above</a>,
and also in <a href="https://github.com/AdamISZ/ConfidentialTransactionsDoc/blob/master/essayonCT.pdf">my CT
overview</a>,
at the end of section 2.2. Note I use \(J\) here in place of \(H\)
to avoid confusion with hash functions.</p>
<h3>Code and thoughts on implementation</h3>
<p>I did an abbreviated write up of the concept of this post in <a href="https://gist.github.com/AdamISZ/9cbba5e9408d23813ca8#defence-2-committing-to-a-utxo-in-publicplaintext-at-the-start-of-the-handshake">this
gist</a>,
as one of three possible ways of attacking the problem in Joinmarket:
<a href="https://github.com/JoinMarket-Org/joinmarket/issues/156">how can we prevent people initiating transactions over and over again
to collect information on
utxos</a>?
This algorithm is not intended as a <em>complete</em> solution to that issue,
but it's very interesting in its own right and may have a variety of
applications, perhaps.</p>
<p>The algorithm was fairly simple to code, at least in a naive way, and I
did it some time ago using Bitcoin's
<a href="https://github.com/bitcoin-core/secp256k1">libsecp256k1</a>
with the <a href="https://github.com/ludbb/secp256k1-py">Python binding by
ludbb</a>.
An initial version of my Python "podle" module is
<a href="https://github.com/JoinMarket-Org/joinmarket/blob/90ec05329e06beed0fbc09528ef6fb3d2c5d03ba/lib/bitcoin/podle.py">here</a>.</p>
<p>There are lots of tricky things to think about in implementing this; I
think the most obvious issue is how would publishing/maintaining a
public list work? If we just want each utxo to be allowed only one use,
any kind of broadcast mechanism would be fine; other participants can
know as soon as any \(H(P_2)\) is used, or at least to a reasonable
approximation. Even in a multi-party protocol like Joinmarket, the utxo
would be broadcast as "used" only after its first usage by each party,
so it would from then on be on what is effectively a blacklist. But if
the policy were more like "only allow re-use 3 times" this doesn't seem
to work without some kind of unrealistic honesty assumption.</p>