Ideas were first discussed here. Thanks again to arubi on IRC for helping me flesh them out.
We assume that the reader is familiar with CoinJoin as a basic idea - collaboratively providing inputs to a transactions so that it may be made difficult or impossible to distinguish ownership/control of the outputs.
The way that CoinJoin is used in practice is (today mainly using JoinMarket, but others over Bitcoin's history) is to create large-ish transactions with multiple outputs of exactly the same amount. This can be called an "intrinsic fungibility" model - since, although the transactions created are unambiguously recognizable as CoinJoins, the indistinguishability of said equal outputs is kind of "absolute".
However, as partially discussed in the earlier blog post "the steganographic principle", there's at least an argument for creating fungibility in a less explicit way - that is to say, creating transactions that have a fungibility effect but aren't necessarily visible as such - they may look like ordinary payments. I'll call this the deniability model vs the intrinsic fungibility model. It's harder to make this work, but it has the possibility of being much more effective than the intrinsic fungibility model, since it gives the adversary (who we'll talk about in a minute) an additional, huge problem: he doesn't even know where to start.
In trying to create privacy, we treat the "blockchain analyst" as our adversary (henceforth just "A").
Blockchain analysis consists, perhaps, of two broad areas (not sure there is any canonical definition); we can call the first one "metadata", vaguely, and think of it is every kind of data that is not directly recorded on the blockchain, such as personally identifying information, exchange records etc, network info etc. In practice, it's probably the most important. The second is stuff recorded directly on the blockchain - pseudonyms (scriptPubKeys/addresses) and amount information (on non-amount-blinded blockchains as Bitcoin's is currently; for a discussion about that see this earlier blog post); note that amount information includes the implicit amount - network fee.
Timing information perhaps straddles the two categories, because while transactions are (loosely) timestamped, there is also the business of trying to pick up timing and perhaps geographic information from snooping the P2P network.
With regard to that second category, the main goal of A is to correlate ownership of different utxos. An old paper of Meiklejohn et al 2013 identified two Heuristics (let's call them probabilistic assumptions), of which the first was by far the most important:
The second is less important mainly because it had to be caveat-ed quite a bit and wasn't reliable in naive form; but, identification of change addresses generally is a plausible angle for A. The first has been, as far as I know, the bedrock of blockchain analysis and has been referred to in many other papers, was mentioned in Satoshi's whitepaper, and you can see one functional example at the long-existent website walletexplorer.
But I think it's important to observe that this list is incomplete.
I'll now add two more items to the list; the first is omitted because it's elementary, the other, because it's subtle (and indeed you might find it a bit dumb at first sight):
Heuristic/Assumption 0: All inputs controlled by only one pubkey are unilaterally controlled
Heuristic/Assumption 3: Transfer of ownership between parties in one transaction implies payment
So, "Heuristic/Assumption" because assumption is probably a better word for all of these generally, but I want to keep the existing nomenclature, the "?" for 2 is simply because, as mentioned, this one is problematic (although still worthy of consideration).
Assumption 0: basically, that if it's not multisig, was never fully safe; there was always Shamir's secret sharing to share shards of a key, albeit that's very rarely used, and you can argue pedantically that full reconstruction means unilateral control. But Assumption 0 is a lot less safe now due to the recent work by Moreno-Sanchez et al. which means, at the very least, that 2 parties can easily use a 2-party computation based on the Paillier encryption system to effectively use a single ECDSA pubkey as a 2-2 multisig. So this assumption is generally unspoken, but in my opinion is now generally important (i.e. not necessarily correct!).
Assumption 3: this is rather strange and looks tautological; I could have even written "transfer of ownership between parties in one transaction implies transfer of ownership" to be cheeky. The point, if it is not clear to you, will become clear when I explain what "CoinJoinXT" means.
Our purpose, now, is to make A's job harder by trying to invalidate all of the above assumptions at once.
This has been discussed in other blog posts about various types of "CoinSwap", so I won't dwell on it.
Segwit fixes transaction malleability (BIP141, along with BIP143,144 were the BIPs that specified segwit). One of the most important implications of this is explained directly in BIP 141 itself, to quote from it:
Two parties, Alice and Bob, may agree to send certain amount of Bitcoin to a 2-of-2 multisig output (the "funding transaction"). Without signing the funding transaction, they may create another transaction, time-locked in the future, spending the 2-of-2 multisig output to third account(s) (the "spending transaction"). Alice and Bob will sign the spending transaction and exchange the signatures. After examining the signatures, they will sign and commit the funding transaction to the blockchain. Without further action, the spending transaction will be confirmed after the lock-time and release the funding according to the original contract.
In short, if we agree a transaction, then we can fix its txid and sign transactions which use its output(s). The BIP specifically references the Lightning Network as an example of the application of this pattern, but of course it's not restricted to it. We can have Alice and Bob agree to any arbitrary set of transactions and pre-sign them, in advance, with all of them having the funding transaction as the root.
CoinJoin involves 2 or more parties contributing their utxos into 1 transaction, but using the above model they can do the same to a funding transaction, but then pre-sign a set of more than one spending transaction. Here's a simple schematic:
A 1btc ---> F (2,2,A,B) --+ B 1btc ---> | | +-->[Proposed transaction graph (PTG) e.g. ->TX1->TX2->TX3 ..]
In human terms, you can envisage that: Alice and Bob would like to start to negotiate a set of conditional contracts about what happens to their money. Then they go through these steps:
This does achieve one significant thing: one transaction such as TX2 can transfer coins to, say, Bob's wallet, giving Alice nothing; and yet we can still get the overall effect of a CoinJoin. In other words, we've opened up the possibility to violate Heuristic 3 as well as Heuristic 1, in the same (short) interaction.
This construction works fine if all inputs used in transactions in the PTG are descendants of F; but this makes the construction very limited. So we'll immediately add more details to allow a more general use-case, in the next section.
If we allowed any of the transactions (TX1, TX2, ...) in the PTG in our previous example to have an input which did not come from the funding transaction F, then we would have introduced a risk; if Alice added utxo UA to, say, TX2, then, before Bob attempted to broadcast TX2, she could double spend it. This would break the atomicity of the graph, which was what allowed the crucial additional interesting feature (in bold, above): that an individual transaction could transfer funds to one party, without risks to the other. To address this problem, we call these additional inputs promise utxos and make use of refund transactions.
A 1btc ---> F (2,2,A,B) --- B 1btc ---> | +--> external payout 0.5 btc to Bob | | +->[TX1 --> TX2 --> TX3 --> TX4] | ^ | | | | | +--- utxo A1 | +--> refund locktime M, pay out *remaining* funds to A: 1btc, B: 0.5btc
In words: if, between the negotiation time and the time of broadcast of TX3, Alice spends A1 in some other transaction, Bob will still be safe; after block M he can simply broadcast the presigned refund transaction to claim the exact number of coins he is owed at that point in the graph.
The above addresses the case of a single external input being included in a chain of transactions in the PTG (here, TX1,2,3,4). Extending this, and generalising to allowing external inputs in many transactions, is straightforward; we can add such in-PTG backouts at every step, redeeming all remaining funds to parties according to what they're owed.
To summarize this section and how it differs from the original, simpler construction:
Alice and Bob have a choice:
The tradeoff is: (2) is not perfectly atomic, but it allows the transaction graph to include utxos from outside of F's ancestory, particularly useful for privacy applications. In a sequence of 10 coinjoins, you may be happy to risk that TXs 6-10 don't end up happening, if it doesn't cost you money. Case (2) is more likely to be of interest.
There's a large design space here.
A mixture of these features may give different tradeoffs in terms of intrinsic fungibility vs deniability vs cost; the tradeoff discussed in the introduction.
Interactivity - unlike either a CoinSwap of types discussed earlier in this blog, or doing multiple CoinJoins (to get a better fungibility effect than just a single one), this only requires one "phase" of interactivity (in terms of rounds, it may be 3). The two parties connect, exchange data and signatures, and then immediately disconnect. (This is what I called no-XBI in the previous blog post).
Boundary - the adversary A, as was hinted at in the introduction, in this model, will not necessarily be able to easily see on the blockchain where the start and end points of this flow of transactions was. To the extent that this is true, it's an enormous win, but more on this later.
Here we are still restricting to 2 parties for simplicity of the diagram. There is still a chain of 4 TXs, but here we flesh out the inputs and outputs. About colors:
Blue txos are co-owned by the two parties, envisioned as 2 of 2 multisig (although as originally mentioned, the technical requirement is only that each transaction is signed by both parties).
Red inputs are promise utxos as described in the earlier section.
Each promise has a corresponding backout transaction pre-signed as output consuming the bitcoins of the previous transaction to the one consuming that promise.
Notice that this example contains two possible setups for each individual transaction in the chain; it can pay out only to one party (like TX3 which pays bob 0.6btc), or it can pay "CoinJoin-style" equal-sized outputs to 2 (or N) parties. Choosing this latter option means you are consciously deciding to blur the line between the intrinsic-fungibility model and the deniability model, which, by the way, is not necessarily a bad idea.
As mentioned, our adversary A has a very important problem - he may not know that the above negotiation has happened, unlike a simple CoinJoin where the transactions are watermarked as such (and this is particularly true if Alice and Bob do not use equal-sized outputs). The boundary may be unclear to A.
So, what strategy can A use to find the transaction graph/set? He can do subset sum analysis.
If Alice and Bob are just 'mixing' coins, so that they are paid out the same amount that they paid in, I'll assert that subset sum is likely to work. It's true that A's job is quite hard, since in general, he would have to do such subset-sum analysis on a huge array of different possible sets of (inputs, outputs) on chain; but nevertheless it's the kind of thing that can be done by a professional adversary, over time. The fact that subset sum analysis is theoretically exponential time and therefore not feasible for very large sets may not be relevant in practice.
In our example above it may not be hard to identify the two inputs from Alice (1btc, 0.3btc) as corresponding to 3 outputs (0.8btc, 0.2btc, 0.3btc), albeit that the latter two - 0.2, 0.3 were part of CoinJoins. Remember that this was a tradeoff - if we didn't make equal sized outputs, to improve deniability/hiding, we'd no longer have any ambiguity there.
Here's one way of addressing the fact that A can do subset-sum on such a privacy-enhancing CoinJoinXT instantiation. The PTG is unspecified but you can imagine it as something similar to the previous example.
Marked in blue is what the adversary A doesn't know, even if he has identified the specific transaction/graph set (as we've said, that in itself is already hard). Subset-sum analysis won't work here to identify which output belongs to Alice and which to Bob; since 5.5 + 1.5 != 6.6, nor does 5.4 fit, nor does such an equation fit with Alice's input 5.8 on the right hand side of the equation.
The trick is that the 1.5 output is actually a dual funded Lightning channel between Alice and Bob. The actual channel balance is shown in blue again because hidden from A: (0.3, 1.2). If the channel is then immediately closed we have fallen back to a case where subset sum works, as the reader can easily verify.
But if, as is usually the intent, the channel gets used, the balance will shift over time, due to payments over HTLC hops to other participants in the Lightning network. This will mean that the final closing balance of the channel will be something else; for example, (0.1, 1.4), and then subset-sum will still not reveal which of the 2 outputs (5.4, 5.5) belong to Alice or Bob.
At a high level, you can understand this as a bleed-through and amplification of off-chain privacy to on-chain.
It's worth noting that you clearly get a significant part of this effect from just the dual-funded Lightning channel; if you consider change outputs in such a single funding transaction, you see the same effect:
-> Lightning funding 0.1
-> Change 2.41
-> Change 2.37
It's easy to see that there is no delinking effect on the change-outs if we know that the funding is equal on both sides. However, there's no need for that to be the case; if the initial channel balance is (Alice: 0.09, Bob: 0.01) then the change-outs are going to the opposite parties compared to if the channel funding is (Alice: 0.05, Bob: 0.05). So this concrete example should help you to understand a crucial aspect of this:
So how does the picture change if instead of just doing a single dual-funded Lightning channel, we include it as an output in a CoinJoinXT structure?
The answer again is deniability. Any contiguous subset of the entire blockchain has the property of sum preservation, modulo fees: the input total is ~= the output total. So no particular contiguous subset on the blockchain flags itself as being such a CoinJoinXT structure - unless subset sum works for some N subsets (2, as in our examples, or higher). But with the dual funded Lightning output of the type shown here, at least for the 2 of 2 case, this doesn't work.
What's been described up to now doesn't quite achieve the desired goal of "deniability"; there are still what we might call "fingerprints" in such a CoinJoinXT structure:
Proof of Concept - I put together a some very simple PoC code; it only covers something like the above first "Example" with 2 parties. Going through such an exercise in practice at least allows one to see concretely that (a) the interaction between the parties is very minimal (sub-second) which is great of course, but it gets a little hairy when you think about how to set up a template of such a transaction chain that 2 parties can agree on using whatever utxos they have available as inputs. A substantial chunk of that PoC code was devoted to that - there is a general
Template class for specifying a graph of transactions, with parametrized input/output sizes.
Practicality today - Although it can be done today (see previous), there are barriers to making this work well. Ideally we'd have Schnorr key aggregation for multisig, and support for dual funded Lightning channels for the amount decorrelation trick mentioned. Without either of those, such a transaction graph on the blockchain will be somewhat identifiable, but I still think there can be a lot of use doing it as an alternative to large sets of clearly identifiable CoinJoins.
Cost tradeoffs - left open here is the tradeoffs in terms of blockchain space usage for each "unit of fungibility", i.e. how much it costs to gain privacy/fungibility this way. I think it's almost impossible to come up with definitive mathematical models of such things, but my feeling is that, exactly to the extent any "deniability" is achieved, it's cost-effective, and to the extent it's not, it's not cost-effective.
Coordination model - Currently we have "in play" at least two models of coordination for CoinJoin - Joinmarket's market-based model, and the Chaumian server model currently championed by ZeroLink. CoinJoinXT as an idea is orthogonal to the coordination mechanism. The only "non-orthogonal" aspect, perhaps, is that I think the CoinJoinXT approach may still be pretty useful with only 2 parties (or 3), more so that CoinJoin with only 2/3.
Finally, where should this fit in one's fungibility "toolchest"? Lightning is hopefully going to emerge as a principal way that people gain fungibility for their everyday payments. The area it can't help with now, and probably not in the future due to its properties, is with larger amounts of money. So you might naturally want to ensure that in, say, sending funds to an exchange, making a large-ish payment, or perhaps funding a channel, you don't reveal the size of your cold storage wallet. I would see the technique described on this blog post as fitting into that medium-large sized funds transfer situation. CoinJoin of the pure "intrinsic fungibility" type, done in repeated rounds or at least in very large anonymity sets, is the other alternative (and perhaps the best) for large sizes.Share on Twitter Share on Facebook
Adam Gibson (20)