CoinJoinXT - a more flexible, extended approach to CoinJoin

*Ideas were first discussed here. Thanks again to arubi on IRC for helping me flesh them out. *


We assume that the reader is familiar with CoinJoin as a basic idea - collaboratively providing inputs to a transactions so that it may be made difficult or impossible to distinguish ownership/control of the outputs.

The way that CoinJoin is used in practice is (today mainly using JoinMarket, but others over Bitcoin's history) is to create large-ish transactions with multiple outputs of exactly the same amount. This can be called an "intrinsic fungibility" model - since, although the transactions created are unambiguously recognizable as CoinJoins, the indistinguishability of said equal outputs is kind of "absolute".

However, as partially discussed in the earlier blog post "the steganographic principle", there's at least an argument for creating fungibility in a less explicit way - that is to say, creating transactions that have a fungibility effect but aren't necessarily visible as such - they may look like ordinary payments. I'll call this the deniability model vs the intrinsic fungibility model. It's harder to make this work, but it has the possibility of being much more effective than the intrinsic fungibility model, since it gives the adversary (who we'll talk about in a minute) an additional, huge problem: he doesn't even know where to start.

The adversary's assumptions

In trying to create privacy, we treat the "blockchain analyst" as our adversary (henceforth just "A").

Blockchain analysis consists, perhaps, of two broad areas (not sure there is any canonical definition); we can call the first one "metadata", vaguely, and think of it is every kind of data that is not directly recorded on the blockchain, such as personally identifying information, exchange records etc, network info etc. In practice, it's probably the most important. The second is stuff recorded directly on the blockchain - pseudonyms (scriptPubKeys/addresses) and amount information (on non-amount-blinded blockchains as Bitcoin's is currently; for a discussion about that see this earlier blog post); note that amount information includes the implicit amount - network fee.

Timing information perhaps straddles the two categories, because while transactions are (loosely) timestamped, there is also the business of trying to pick up timing and perhaps geographic information from snooping the P2P network.

With regard to that second category, the main goal of A is to correlate ownership of different utxos. An old paper of Meiklejohn et al 2013 identified two Heuristics (let's call them probabilistic assumptions), of which the first was by far the most important:

  • Heuristic 1 - All inputs to a transaction are owned by the same party
  • Heuristic 2 - One-time change addresses are owned by the same party as the inputs

The second is less important mainly because it had to be caveat-ed quite a bit and wasn't reliable in naive form; but, identification of change addresses generally is a plausible angle for A. The first has been, as far as I know, the bedrock of blockchain analysis and has been referred to in many other papers, was mentioned in Satoshi's whitepaper, and you can see one functional example at the long-existent website walletexplorer.

[But I think it's important to observe that this list is incomplete.]{style="text-decoration: underline;"}[]{style="text-decoration: underline;"}

I'll now add two more items to the list; the first is omitted because it's elementary, the other, because it's subtle (and indeed you might find it a bit dumb at first sight):

  • Heuristic/Assumption 0: All inputs controlled by only one pubkey are unilaterally controlled
  • Heuristic/Assumption 1: All inputs to a transaction are owned by the same party
  • Heuristic/Assumption 2(?): One-time change addresses are owned by the same party as the inputs
  • Heuristic/Assumption 3: Transfer of ownership between parties in one transaction implies payment

So, "Heuristic/Assumption" because assumption is probably a better word for all of these generally, but I want to keep the existing nomenclature, the "?" for 2 is simply because, as mentioned, this one is problematic (although still worthy of consideration).

Assumption 0: basically, that if it's not multisig, was never fully safe; there was always Shamir's secret sharing to share shards of a key, albeit that's very rarely used, and you can argue pedantically that full reconstruction means unilateral control. But Assumption 0 is a lot less safe now due to the recent work by Moreno-Sanchez et al. which means, at the very least, that 2 parties can easily use a 2-party computation based on the Paillier encryption system to effectively use a single ECDSA pubkey as a 2-2 multisig. So this assumption is generally unspoken, but in my opinion is now generally important (i.e. not necessarily correct!).

Assumption 3: this is rather strange and looks tautological; I could have even written "transfer of ownership between parties in one transaction implies transfer of ownership" to be cheeky. The point, if it is not clear to you, will become clear when I explain what "CoinJoinXT" means.

Our purpose, now, is to make A's job harder by trying to invalidate all of the above assumptions at once.

Quick refresher: BIP141

This has been discussed in other blog posts about various types of "CoinSwap", so I won't dwell on it.

Segwit fixes transaction malleability (BIP141, along with BIP143,144 were the BIPs that specified segwit). One of the most important implications of this is explained directly in BIP 141 itself, to quote from it:

Two parties, Alice and Bob, may agree to send certain amount of Bitcoin to a 2-of-2 multisig output (the "funding transaction"). Without signing the funding transaction, they may create another transaction, time-locked in the future, spending the 2-of-2 multisig output to third account(s) (the "spending transaction"). Alice and Bob will sign the spending transaction and exchange the signatures. After examining the signatures, they will sign and commit the funding transaction to the blockchain. Without further action, the spending transaction will be confirmed after the lock-time and release the funding according to the original contract.

In short, if we agree a transaction, then we can fix its txid and sign transactions which use its output(s). The BIP specifically references the Lightning Network as an example of the application of this pattern, but of course it's not restricted to it. We can have Alice and Bob agree to any arbitrary set of transactions and pre-sign them, in advance, with all of them having the funding transaction as the root.

CoinJoinXT - the basic case

CoinJoin involves 2 or more parties contributing their utxos into 1 transaction, but using the above model they can do the same to a funding transaction, but then pre-sign a set of more than one spending transaction. Here's a simple schematic:

A 1btc --->
              F (2,2,A,B) --+
B 1btc --->                 |
                            +-->[Proposed transaction graph (PTG) e.g. ->TX1->TX2->TX3 ..]

In human terms, you can envisage that: Alice and Bob would like to start to negotiate a set of conditional contracts about what happens to their money. Then they go through these steps:

  1. One side proposes F (the funding transaction) and a full graph of unsigned transactions to fill out the PTG above; e.g. Alice proposes, Bob and Alice share data (pubkeys, destination addresses). Note that the set doesn't have to be a chain (TX1->TX2->TX3...), it can be a tree, but each transaction must require sign-off of both parties (either, at least one 2-2 multisig utxo, or at least one utxo whose key is owned by each party).
  2. They exchange signatures on all transactions in the PTG, in either order. Of course, they abort if signatures don't validate.
  3. With this in place (i.e. only after valid completion of (2)), they both sign (in either order) F.
  4. Now both sides have a valid transaction set, starting with F. Either or both can broadcast them. [The transactions are all guaranteed to occur as long as at least one of them wants it]{style="text-decoration: underline;"}. Contrariwise, none of them is valid without F being broadcast.

This does achieve one significant thing: one transaction such as TX2 can transfer coins to, say, Bob's wallet, giving Alice nothing; and yet we can still get the overall effect of a CoinJoin. In other words, we've opened up the possibility to violate Heuristic 3 as well as Heuristic 1, in the same (short) interaction.

This construction works fine if all inputs used in transactions in the PTG are descendants of F; but this makes the construction very limited. So we'll immediately add more details to allow a more general use-case, in the next section.

Introducing Promises

If we allowed any of the transactions (TX1, TX2, ...) in the PTG in our previous example to have an input which did not come from the funding transaction F, then we would have introduced a risk; if Alice added utxo UA to, say, TX2, then, before Bob attempted to broadcast TX2, she could double spend it. This would break the atomicity of the graph, which was what allowed the crucial additional interesting feature (in bold, above): that an individual transaction could transfer funds to one party, without risks to the other. To address this problem, we call these additional inputs promise utxos and make use of refund transactions.

A 1btc --->
              F (2,2,A,B) ---
B 1btc --->                 |       +--> external payout 0.5 btc to Bob
                            |       |
                            +->[TX1 --> TX2 --> TX3 --> TX4]
                                         |       ^
                                         |       |
                                         |       |
                                         |       +--- utxo A1
                                         +--> refund locktime M, pay out *remaining* funds to A: 1btc, B: 0.5btc

In words: if, between the negotiation time and the time of broadcast of TX3, Alice spends A1 in some other transaction, Bob will still be safe; after block M he can simply broadcast the presigned refund transaction to claim the exact number of coins he is owed at that point in the graph.

The above addresses the case of a single external input being included in a chain of transactions in the PTG (here, TX1,2,3,4). Extending this, and generalising to allowing external inputs in many transactions, is straightforward; we can add such in-PTG backouts at every step, redeeming all remaining funds to parties according to what they're owed.

To summarize this section and how it differs from the original, simpler construction:

Alice and Bob have a choice:

  1. They can set up a fully trustless PTG, without promises. They are then guaranteed to achieve "all or nothing": either all cooperative signing works, then all transactions can be broadcast (as long as at least one of them wants to), or nothing (including F) is broadcast at all.
  2. They can set up a PTG including promises from one or both parties. Now they don't get "all or nothing" but only ensure that the transactions that complete are a subset, in order, from the start F. To achieve this they add presigned backouts at (probably every) step, so that if the chain "breaks" somewhere along, they will recover all the funds remaining that are owed to them.

The tradeoff is: (2) is not perfectly atomic, but it allows the transaction graph to include utxos from outside of F's ancestory, particularly useful for privacy applications. In a sequence of 10 coinjoins, you may be happy to risk that TXs 6-10 don't end up happening, if it doesn't cost you money. Case (2) is more likely to be of interest.

Interlude - overview of features of CoinJoinXT

There's a large design space here.

  • We can have N parties, not just 2.
  • We can have as many transactions as we like.
  • We can have a tree with F as root, rather than a chain.
  • We can have as many promise utxos from any of the N parties as we like.

A mixture of these features may give different tradeoffs in terms of intrinsic fungibility vs deniability vs cost; the tradeoff discussed in the introduction.

Interactivity - unlike either a CoinSwap of types discussed earlier in this blog, or doing multiple CoinJoins (to get a better fungibility effect than just a single one), this only requires one "phase" of interactivity (in terms of rounds, it may be 3). The two parties connect, exchange data and signatures, and then immediately disconnect. (This is what I called no-XBI in the previous blog post).

Boundary - the adversary A, as was hinted at in the introduction, in this model, will not necessarily be able to easily see on the blockchain where the start and end points of this flow of transactions was. To the extent that this is true, it's an enormous win, but more on this later.


Example CJXT

Here we are still restricting to 2 parties for simplicity of the diagram. There is still a chain of 4 TXs, but here we flesh out the inputs and outputs. About colors:

Blue txos are co-owned by the two parties, envisioned as 2 of 2 multisig (although as originally mentioned, the technical requirement is only that each transaction is signed by both parties).

Red inputs are promise utxos as described in the earlier section.

Each promise has a corresponding backout transaction pre-signed as output consuming the bitcoins of the previous transaction to the one consuming that promise.

Notice that this example contains two possible setups for each individual transaction in the chain; it can pay out only to one party (like TX3 which pays bob 0.6btc), or it can pay "CoinJoin-style" equal-sized outputs to 2 (or N) parties. Choosing this latter option means you are consciously deciding to blur the line between the intrinsic-fungibility model and the deniability model, which, by the way, is not necessarily a bad idea.

The return of A - amounts leak.

As mentioned, our adversary A has a very important problem - he may not know that the above negotiation has happened, unlike a simple CoinJoin where the transactions are watermarked as such (and this is particularly true if Alice and Bob do not use equal-sized outputs). The boundary may be unclear to A.

So, what strategy can A use to find the transaction graph/set? He can do subset sum analysis.

If Alice and Bob are just 'mixing' coins, so that they are paid out the same amount that they paid in, I'll assert that subset sum is likely to work. It's true that A's job is quite hard, since in general, he would have to do such subset-sum analysis on a huge array of different possible sets of (inputs, outputs) on chain; but nevertheless it's the kind of thing that can be done by a professional adversary, over time. The fact that subset sum analysis is theoretically exponential time and therefore not feasible for very large sets may not be relevant in practice.

In our example above it may not be hard to identify the two inputs from Alice (1btc, 0.3btc) as corresponding to 3 outputs (0.8btc, 0.2btc, 0.3btc), albeit that the latter two - 0.2, 0.3 were part of CoinJoins. Remember that this was a tradeoff - if we didn't make equal sized outputs, to improve deniability/hiding, we'd no longer have any ambiguity there.

Breaking subset-sum with Lightning

{width="711" height="392"}

Here's one way of addressing the fact that A can do subset-sum on such a privacy-enhancing CoinJoinXT instantiation. The PTG is unspecified but you can imagine it as something similar to the previous example.

Marked in blue is what the adversary A doesn't know, even if he has identified the specific transaction/graph set (as we've said, that in itself is already hard). Subset-sum analysis won't work here to identify which output belongs to Alice and which to Bob; since 5.5 + 1.5 != 6.6, nor does 5.4 fit, nor does such an equation fit with Alice's input 5.8 on the right hand side of the equation.

The trick is that the 1.5 output is actually a dual funded Lightning channel between Alice and Bob. The actual channel balance is shown in blue again because hidden from A: (0.3, 1.2). If the channel is then immediately closed we have fallen back to a case where subset sum works, as the reader can easily verify.

But if, as is usually the intent, the channel gets used, the balance will shift over time, due to payments over HTLC hops to other participants in the Lightning network. This will mean that the final closing balance of the channel will be something else; for example, (0.1, 1.4), and then subset-sum will still not reveal which of the 2 outputs (5.4, 5.5) belong to Alice or Bob.

At a high level, you can understand this as a bleed-through and amplification of off-chain privacy to on-chain.

It's worth noting that you clearly get a significant part of this effect from just the dual-funded Lightning channel; if you consider change outputs in such a single funding transaction, you see the same effect:

       -> Lightning funding 0.1
       -> Change 2.41
       -> Change 2.37

It's easy to see that there is no delinking effect on the change-outs if we know that the funding is equal on both sides. However, there's no need for that to be the case; if the initial channel balance is (Alice: 0.09, Bob: 0.01) then the change-outs are going to the opposite parties compared to if the channel funding is (Alice: 0.05, Bob: 0.05). So this concrete example should help you to understand a crucial aspect of this:

  • Such a fungibility effect is only achieved if the difference between the two parties' initial inputs is small enough compared to the size of the dual-funded Lightning channel
  • If the size of the inputs is very large compared to the Lightning channel overall size, which currently at maximum is 2**24 satoshis (about 0.16btc), then, in order to achieve this obfuscation effect, we "converge" to the case of something like a 2-in and 2-out equal-sized coinjoin. It's hard for 2 parties to arrange to have inputs of equal sizes, and it somewhat loses the deniability feature we were going for. (You can easily confirm for yourself that there will be no ambiguity if Alice and Bob's inputs are of completely different sizes).

So how does the picture change if instead of just doing a single dual-funded Lightning channel, we include it as an output in a CoinJoinXT structure?

The answer again is deniability. Any contiguous subset of the entire blockchain has the property of sum preservation, modulo fees: the input total is \~= the output total. So no particular contiguous subset on the blockchain flags itself as being such a CoinJoinXT structure - unless subset sum works for some N subsets (2, as in our examples, or higher). But with the dual funded Lightning output of the type shown here, at least for the 2 of 2 case, this doesn't work.

Remove all traces?

What's been described up to now doesn't quite achieve the desired goal of "deniability"; there are still what we might call "fingerprints" in such a CoinJoinXT structure:

  • Timing correlation: if we don't use nLockTime on these transactions, then one party might choose to broadcast them all at once. This is at the least a big clue, although not unambiguous. To avoid it, have the pre-signed transactions in the PTG all be given specific timelocks.
  • Shared control utxos. If we use 2 of 2, or N of N, multisig outputs, of the current normal p2sh type, then they are observable as such, and this could easily help A to find the "skeleton" of such a CoinJoinXT structure. Of course, let's not forget that we can do CoinJoinXT with various equal sized outputs too, mixing the "intrinsic fungibility" and "deniability" approaches together, as discussed, so it's not that CoinJoinXT with p2sh multisig connecting utxos is useless. But we may want to focus on less detectable forms, like Schnorr/MuSig based multisig with key aggregation so that N of N is indistinguishable from 1 of 1, or the new construction that allows an ECDSA pubkey to be effectively a 2 of 2 multisig.


Proof of Concept - I put together a some very simple PoC code; it only covers something like the above first "Example" with 2 parties. Going through such an exercise in practice at least allows one to see concretely that (a) the interaction between the parties is very minimal (sub-second) which is great of course, but it gets a little hairy when you think about how to set up a template of such a transaction chain that 2 parties can agree on using whatever utxos they have available as inputs. A substantial chunk of that PoC code was devoted to that - there is a general Template class for specifying a graph of transactions, with parametrized input/output sizes.

Practicality today - Although it can be done today (see previous), there are barriers to making this work well. Ideally we'd have Schnorr key aggregation for multisig, and support for dual funded Lightning channels for the amount decorrelation trick mentioned. Without either of those, such a transaction graph on the blockchain will be somewhat identifiable, but I still think there can be a lot of use doing it as an alternative to large sets of clearly identifiable CoinJoins.

Cost tradeoffs - left open here is the tradeoffs in terms of blockchain space usage for each "unit of fungibility", i.e. how much it costs to gain privacy/fungibility this way. I think it's almost impossible to come up with definitive mathematical models of such things, but my feeling is that, exactly to the extent any "deniability" is achieved, it's cost-effective, and to the extent it's not, it's not cost-effective.

Coordination model - Currently we have "in play" at least two models of coordination for CoinJoin - Joinmarket's market-based model, and the Chaumian server model currently championed by ZeroLink. CoinJoinXT as an idea is orthogonal to the coordination mechanism. The only "non-orthogonal" aspect, perhaps, is that I think the CoinJoinXT approach may still be pretty useful with only 2 parties (or 3), more so that CoinJoin with only 2/3.

Finally, where should this fit in one's fungibility "toolchest"? Lightning is hopefully going to emerge as a principal way that people gain fungibility for their everyday payments. The area it can't help with now, and probably not in the future due to its properties, is with larger amounts of money. So you might naturally want to ensure that in, say, sending funds to an exchange, making a large-ish payment, or perhaps funding a channel, you don't reveal the size of your cold storage wallet. I would see the technique described on this blog post as fitting into that medium-large sized funds transfer situation. CoinJoin of the pure "intrinsic fungibility" type, done in repeated rounds or at least in very large anonymity sets, is the other alternative (and perhaps the best) for large sizes.