Dr. Roee Sarel
One more thought on recording economic experiments using blockchain: privacy & Zero Knowledge Proofs
In a previous post, I discussed the advantages that blockchain can offer to the conduction of economic experiments. In particular, recording experimental data on an immutable ledger can circumvent the concern that researchers will try to fake data or omit results from selected sessions.
In my review there of why a blockchain-based registry for economic experiment is yet to be found, I neglected one aspect that is more general and not unique to economic experiments: privacy concerns.
At the risk of sounding too pessimistic, I think it would be safe to argue that researchers are generally reluctant to voluntarily share their data publically, on an immutable ledger, in real-time. This has several reasons. First, the implicit competition in the profession gives an incentive not to disclose your data until you are done utilizing it, based on the fear that others will publish something using the same data before you. Second, there may be rent-seeking going on (e.g. researchers who exploit their “market power” as the sole owner of the data to achieve credit in other projects by demanding to by being named as a co-author in exchange for sharing their data). Third, there may be reputation concerns (e.g. one does not want to reveal potential mistakes that might have occurred during data collection). Forth, data protection rules (e.g. the GDPR in Europe) may prohibit the recording of personal data, so that researchers may be legally prohibited from recording the data publicly. Fifth, subjects may refuse to participate in experiments if the data is recoded publicly.
This raises an obvious question: can one find a solution that will allow the data to be recorded on-chain without revealing the content of the data? A potential remedy can be found in the concept of “zero knowledge proofs” (ZKP): an interesting cryptographic concept that allows parties to prove to each other that they possess a piece of knowledge without revealing what that knowledge is. Getting a good grasp of ZKP can be quite challenging, as it is usually explained using (somewhat over-simplified) examples that may or may not include a magical cave.
ZKP have already been used in conjunction with blockchain projects, e.g. in the crypto-currency Zcash) and seem to be a promising venue for tackling problems such as the aforementioned one.
ZKP work roughly as follows:
· One entity – a “prover” – wants to prove something to another entity – a “verifier” – but without revealing the content of that something.
· To do this, the verifier poses a series of questions that can be easily solved using an answer sheet.
· A prover who has access to the answer sheet will always answer correctly, but how can the verifier be sure that the prover didn’t just guess? If there are enough questions, a person who falsely claims to hold the answer sheet will eventually guess wrong. Thus, enough questions should minimize the risk of lucky guesses to a negligible amount. ZKP thus rely on statistical certainty.
· Note that the prover never actually shows the verifier his answer sheet – and the verifier does not learn what is in that sheet. The verifier only concludes that the prover does have the answer sheet.
· Thus, the prover can prove he holds an answer sheet while providing the verifier with “zero knowledge” about the answer sheet itself.
Applying this to the context of recording economic experiments might look something like this:
- Researchers who submit a paper for publication will be asked to prove that the data that is analyzed in the paper is identical to the data in the original file created in the experiment itself.
- To achieve this, the data file in each session is “hashed” – it is transformed into a unique identifier that cannot be reversed-engineered. This hash will be recorded on-chain – not the data itself.
- Journal submission systems will be designed such that the researcher needs to prove that the file he is holding is the one whose hash is identical to the one registered on the chain.
Let me illustrate this process with an example. Suppose that in the (not so far) future, a hypothetical researcher, “Dr. Hasher”, will want to run an experiment about the effect of religion on risk aversion. Hasher is afraid that subjects will refuse to participate, because the data will include information on religious beliefs. However, in this example, journals will not publish papers unless they receive proof that the data wasn’t manipulated.
Hasher proposes as follows: at the end of each experimental session, the excel data files will be hashed so that “session1.xls” is recorded as a long string of numbers and letters (e.g. “JG23r0!kGsgdsjgj”), and “sessio2.xls” will be recorded as another string. In the immutable ledger, one could only see the hash and the time stamp – never the data file. Upon submission, the researcher will submit the file to a secure system that hashes the file and transmits the hash to the journal (i.e. the journal still does not have access to the file itself). The journal verifies that the hash is the same as the one on-chain, and can be confident that the data is original.
For the sake of brevity, I do not wish to get into the disadvantages of this solution too deeply (one problem might be that if the researcher forgets his login details, the data will be lost). What seems important here is to raise the question: can we improve on current practices using blockchain. The answer to this is, at the very least, maybe.