Vault 7 ruined WikiLeaks “perfect record” – if it ever existed

WikiLeaks’ much vaunted “perfect record” – both of “100% accuracy” and source protection – is an important part of the organization’s identity. For years, WikiLeaks supporters and staff alike have boasted that the site’s record makes them more trustworthy than the mainstream media. In 2017 Julian Assange personally described WikiLeaks as “perfect” and claimed that when it came to mainstream journalists, only 2% of them were “credible.” Any truth to WikiLeaks’ claim of a perfect record ended on March 3, 2017 with the publication Vault 7, which included non-existent pages that WikiLeaks had accidentally created while attempting to recreate the database. In doing so, WikiLeaks accidentally helped lead the government to the now-convicted Joshua Schulte.

A sign at a protest for Julian Assange, claiming that WikiLeaks has a perfect record

On June 17, 2020 and again on the 21st, Patrick Thomas Leedom testified at Schulte’s retrial. Leedom had experience working at Microsoft, MITRE and with the FBI in the cyber division’s technical analysis unit. He “primarily [worked on] digital computer forensics, malware analysis as well as working with the incident-response team on deployments.” Leedom assisted with the investigation, focusing his analysis mainly on the Confluence data that was released publicly as Vault 7 Year Zero.

Q. As part of your investigation, did you review the actual material that WikiLeaks posted on the internet?

A. Yes, I did.

Q. What did you review?

A. So I reviewed the actual web pages from WikiLeaks for the releases.

Q. I want to focus in particular on the first, the March 7, 2017 leak. I think you testified earlier that you reached some conclusions about where that information came from; is that right?

A. Yes, I did.

Q. Where did it come from on DevLAN?

A. So that March 7th leak, that all came from Confluence, specifically that March 3rd [2016] Confluence backup.

Q. How much of Confluence was disclosed on March 7, 2017?

A. All of it, or at least everything that was available in that March 3rd backup.

Schulte transcript June 21, 2022 pages 56-57

Leedom explained that he identified a backup file as the source for the Confluence data. Specifically, the backup had been made on March 3, 2016 and was corrupted due to CIA using a faulty backup script that lacked a crucial argument. Because of this missing argument and subsequent corruption, the backup process didn’t know how to process all of the data – which meant not everything was backed up.

Q. Was this significant to that determination that the WikiLeaks material came from a backup file?

A. It is very significant.

Q. How?

[they pull up a new page in the exhibit before Leedom continues]

A. …So there is a command in here which I’m not going to go through every piece of it but this “my SQL dump” this just says hey, backup the database. That’s all it says. There was an issue with this command, it was missing what we call an argument. We will look at this — you see the little -u right after the my SQL dump command, we call that an argument. There was an argument that needed to be provided to this command to properly back up the type of data that was stored in this database. Essentially there was an error when the backup command hit a certain string of bytes that it didn’t understand and it kind of bailed out and only ended up backing up like three quarters of the whole database. So in technical terms we would call that a corrupted backup and the particular type of argument that is missing here is one that would correctly set the encoding for that database so that it would know, oh, I see something I don’t recognize, I’m supposed to treat it like this, and keep going.

Schulte transcript June 21, 2022 pages 58-59

Leedom said the missing data would include tables and some relationships between different pieces of data, and how to find and use them. Because these pieces of data were missing, portions of the database – in someways the connective tissue of a relational database – needed to be reconstructed. The fingerprints of this reconstruction process were what allowed Leedom and other investigators to narrow their focus, before ultimately identifying Schulte as the leaker.

Q. And what type of data was missing from the backup as a result of that error?

A. There were a few tables, like drawers, missing from that database. The most important one, there was a table that matched up essentially like what users and what pages were associated. So if, like, I had a page on Confluence, the table that had the information of saying, like, exactly what pages, my user name and stuff was associated and those edits were associated with, that was all missing.

Schulte transcript June 21, 2022 page 59

So the first step in re-constructing this Confluence database, it’s the same for any database honestly, you have to understand what the database looks like, you have to know what tables are there, where things are stored. This is what we call a relational database, that means there are relationships between those different drawers in the cabinet that you have to understand otherwise you don’t really know how to deal with what you have.

Schulte transcript June 21, 2022 page 61

Q. What would be different?

A. We will have, I think, some pictures, but the whole site would look different. There would be data missing, there would be, like, obvious gaps where, you know, you would have to re-interpret how some of these relationships worked and you might get them right in some parts, you might get them wrong in other parts. So I looked at those errors and inconsistencies to try to determine how this was done.

Schulte transcript June 21, 2022 page 63

As part of their reconstruction process, WikiLeaks restored several deleted pages. While many of the differences were purely visual, and a lot of the data could simply be copied out of the database as-is. The simplicity of that resulted much of the data that WikiLeaks released as part of Year Zero being faithful to the original content, if not its presentation.

Q. Were there other parts that reflected the errors you have been describing?

A. Yes. So, like, while the content for the pages was all there and intact, all of the other stuff that kind of enhances what would be on those pages was missing. A lot of user IDs weren’t available. A lot of pages were incorrectly associated with other page names. There were pages that were both completely missing as well as pages that if you, like, looked at it on DevLAN as it was, a page could have been completely deleted. WikiLeaks actually just restored it as it was so they actually recovered deleted pages to some extent for some of these pages. And from a, like, overall visual presentation perspective, the design elements and templates and fancy fonts and stuff, all of that is gone.

Schulte transcript June 21, 2022 page 64

Q. Does some of what was posted on WikiLeaks here appear as it would have appeared on Confluence on DevLAN?

A.Yes.

Q. Can you explain that?

A. So like I briefly mentioned earlier, all of the page content that is stored in the database, it is actually stored, we will say, pre-formatted. This is a web page, the kind of programming language for web pages is called HTML. All of that data is actually stored in the database so if you wanted to, you know, preserve like these numbered bullets that are indented, this kind of quote thing at the bottom here for the code block down there, that’s actually all in HTML and already formatted, so all have you to do to retain all of that is just copy it out and open it up in a web browser.

Schulte transcript June 21, 2022 page 66

It was the correct rendering contrasted with the incorrect portions and ex nihilo pages that first helped narrow down the origins of the data and told investigators what to look for. Once Joshua Schulte had been identified as the likely source for the files, investigators were able to find copious corroborating evidence, in addition to CSAM once a search warrant was executed.

Q. What, if any conclusions, did you draw about the fact that the page content from the SQL database rendered correctly on WikiLeaks?

A. It certainly made it a lot easier and more feasible when we are thinking about how this data was stolen and like when it got posted, this is how they did it. They had the database.

Schulte transcript June 21, 2022 page 67

At this point, Leedom begins describing the non-existent data that WikiLeaks’ reconstruction process introduced into Vault 7. As Leedom explained, WikiLeaks had “kind of created new pages to aggregate certain things like certain content from users because a lot of those previous relationships were broken” while attempting to reconstruct missing portions of the database. Turning to an example page, Leedom says plainly that “this page doesn’t actually exist on DevLAN” and that “if you had like a correct, full backup, you would know that this actually isn’t a real page.” Leedom’s best guess was that WikiLeaks had confused a user’s edit history with a non-existent “separate space for MacOS projects and that’s why they labeled it as such.”

Q. How is this different from how the page would have looked in Confluence running on DevLAN?

A. So this page, like as is, actually doesn’t exist at all on DevLAN. One thing WikiLeaks did when they rebuilt a lot of these pages is, like, kind of created new pages to aggregate certain things like certain content from users because a lot of those previous relationships were broken so they had to have some way to try and put the pieces back together. So this is essentially, like, all of the pages or attachments that they could find that were related to this user ID string.

[they pull up a new WikiLeaks page in the exhibits before continuing]

Q. In what ways is Government Exhibit 7-1 different from how this page would have appeared on DevLAN?

A. So, this page doesn’t actually exist on DevLAN. Yeah.

Q. Explain a little more about that?

A. Sure. So this page, this says MacOSX. Essentially there was a user on DevLAN that did a lot of work on Mac projects and since that user page association table was gone, the best that WikiLeaks could do with this was they thought that, oh, well this must be like a separate space for just MacOS projects —

MR. SCHULTE: Objection.

THE COURT: Overruled.

A.– this must be a separate space for MacOS projects and that’s why they labeled it as such, and kind of binned all of these things together when, in reality, if you had like a correct, full backup, you would know that this actually isn’t a real page.

Schulte transcript June 21, 2022 page 68-70

Schulte’s response and pushback on this portion of Leedom’s testimony was largely limited to implying that WikiLeaks obtained a more recent copy and reverted it to March 3rd to obscure their sourcing. In Schulte’s theory, WikiLeaks could have waited to publish and spent their time reconstructing the database to an older state to try to protect the source. Schulte’s theory was rejected, and he offered no evidence beyond pointing out that WikiLeaks does things to protect their sources and attempting to create alternate theories about how WikiLeaks could have gotten the data. Schulte had no problem with Leedom’s conclusion that the data came from the malformed backup command, but he suggests that the conclusion is compatible with it being a different backup.

Regardless of Schulte’s groundless theories, it was the information missing from and added to Vault 7 that helped investigators limit their focus to the malformed backups by leaving a trail that lead back to Joshua Schulte despite efforts to destroy logs. While this appears to be the first time government experts have gone on record to definitively say that WikiLeaks published – and inadvertently created – pages that didn’t exist in the original data, this wasn’t the first time WikiLeaks’ published false information. Issues about the authenticity of WikiLeaks’ publications have persisted for as long as the organization has published.

In December 2006, WikiLeaks published their first leak – the Sheik Aweys and the Union of Islamic Courts. WikiLeaks was “uncertain of the authenticity” of it but “thought that readers, using Wikipedia-like features of the site, would help analyse it.” The document’s was never verified and its authenticity was questioned both by the press and by WikiLeaks’ readers. The organization conceded in July 2007 that analysis they posted for the document wasn’t originally written for or by WikiLeaks.

In his book Inside WikiLeaks – My Time with Julian Assange, Daniel Domscheit-Berg described a number of problems with WikiLeaks “authenticity checks,” which Domscheit-Berg confessed he had been deceitful about in hundreds of interviews. These problems ranged from not being able to utilize the hundreds of volunteers who had signed up for WikiLeaks’ list (except, as internal records show, through Sigurdur Thordarson’s coordination and manipulation), Julian Assange and Daniel Domscheit-Berg looked for signs of technical manipulation and used Google searches to see if documents “struck [them] as genuine.”

Another issue was our “authenticity checks”—a deceit I had forced myself to practice in hundreds of interviews. Until late 2009, no one except Julian and I checked the vast majority of documents that had been submitted. Strictly speaking, we weren’t lying when we said we had a pool of around eight hundred volunteer experts at our disposal. But we neglected to mention that we had no mechanism in place for integrating them into our work flow. None of them were able to access the material we received. Instead, Julian and I usually checked whether documents had been manipulated technologically and did a few Google searches to see whether they struck us as genuine. We could only hope that things would turn out all right. Apparently we developed a pretty good sense for what was authentic and what wasn’t; at least as far as I know, we didn’t make any major mistakes. But we could have.

Inside WikiLeaks – My Time with Julian Assange

Bank Julius Bar, the case that helped catapult WikiLeaks to widespread media attention, was also afflicted by problems with verification. As Domscheit-Berg explained, WikiLeaks republished incorrect information from their source and when challenged on it, they gave statements with “made up” information about the organization’s process.

Leaking the Julius Bär documents brought a certain Ralf Schneider* into our lives, a German citizen whose name was among those of the big tax evaders identified by the whistle-blower. At some point, Schneider sent us an e-mail, writing that, while he would love to have a few million to deposit in secret accounts in Switzerland, this was a case of mistaken identity. I was shocked.

The information about the individuals involved in the Julius Bär scandal came from our source. Whoever had provided us with the documents had wanted to help us categorize and understand them, so he had included some background information he had researched about the bank’s clients. In the case of Ralf Schneider, he’d made a mistake. He’d confused the German with a Swiss who had a similar name. So we published the information about a possible mistake just as we did with the material provided by our source. On the site we wrote, “According to three independent sources, this document, the summary and some of the commentary are false or misleading. WikiLeaks is investigating the matter.” Three independent sources? That sounded good. Unfortunately it was made up.

One might ask here why we didn’t simply delete the man’s name. We decided against that because it was common for people connected to something negative to demand that we immediately remove their names. We wanted to investigate these cases before making any corrections.

Schneider had legitimate reason for being upset. When people Googled “Ralf Schneider,” the first hit they saw was about him being involved in the tax evasion scandal. He was able to show, however, that other details from the documents didn’t match him at all. “I do not have, nor did I ever have an account with the Julius Bär bank,” he wrote to us. “I don’t own a house on Mallorca, nor do I maintain a bank account on the Cayman Islands, and I don’t live abroad. I have already instructed my attorney to file a charge of slander with the public prosecutor’s office.”

We didn’t want to change the original documents provided by our source, but preferred instead to use commentary and footnotes. But a year later, when Schneider again complained that a Google search of his name still directed users to us, I made sure that the pages in the search engine’s archive were updated.

Inside WikiLeaks – My Time with Julian Assange [*name changed by Domscheit-Berg]

In October 2009, Wired reported that a whistleblower had admitted to submitting forged documents to WikiLeaks. WikiLeaks published the documents, flagging them as potential fakes. While Wired doesn’t identify the forged documents, the timing and description may match “Steve Jobs purported HIV medical status results, 2008” posted in January 2009. WikiLeaks was widely criticized, which led to the organization pushing back with a post that saying they only republished them, and that because they had said the images might be fakes, they still had never released a misattributed document. When Wired confronted Daniel Domscheit-Berg about it, he spun it as a positive, saying that “a fake document is a story in itself.” It doesn’t seem to be a story that WikiLeaks was interested in, however. (When WikiLeaks first published the documents, they said they had “been spreading around the Internet as email forwards the last few hours,” later updating the page to say they had been cited in multiple news reports. The page also initially described the legimacy of the documents as “theoretically possible” before changing it to “plausible.”) Immediately after Steve Jobs’ death was reported, WikiLeaks tweeted a link to the documents with no commentary, warning or explanation beyond “Purported Steve Jobs medical records.”

In their 2012 statement about the Syria Files release, WikiLeaks said that “In such a large collection of information, it is not possible to verify every single email at once; however, WikiLeaks and its co-publishers have done so for all initial stories to be published. We are statistically confident that the vast majority of the data are what they purport to be.” After it was discovered in 2016 that WikiLeaks had omitted some documents from the Syria Files and that some of the hackers briefly discussed mixing fakes in with the real documents, including information about transferring funds from Syria to Russia, WikiLeaks released a statement saying that “All Syria files obtained by WikiLeaks have been published and are authentic.” In 2022, I confirmed that over one million emails from the Syria Files had never been published.

In the past, WikiLeaks has added caveats as a defense for inaccurate information, forged documents, misleading statements and conspiracy theories posted on Twitter. No such caveat can apply to their mistake with Vault 7, and their unintentional publication of fake information. Any caveat that could be added to address this essentially negates the claim precedes the caveat.

Note: The transcripts were provided by the Calyx Institute with funding from the Wau Holland Foundation. For ease of reference, the page numbers provided are based off the PDF page count instead of Bates numbering.