By October 2016, WikiLeaks regularly said that they had published 10 million documents in 10 years. However, at the time of those claims – as well as for the history of the Syria Files through the Wayback Machine and since – over a million of those documents appear to have been missing entirely. Although WikiLeaks says they’ve published 2.4 million emails known as the Syria Files, only 1.4 million have been made available – leaving a million emails apparently unpublished and unaccounted for.
By looking in the right places on WikiLeaks’ site, their index displays the correct number of published emails, a number that has been repeatedly verified by scripts which manually check for each file (and recheck later if the server reports temporary errors). A copy of the scripts can be downloaded here (archive).* When viewing individual Syria emails, the WikiLeaks website displays the number of files released as 1,432,389. According to WikiLeaks, there were 2,434,899 emails in total. With 1,432,389 released, that leaves 1,002,510 missing – over 40%.
Reviewing pages in the Wayback Machine shows that this is not a new development or a recent bug. It was true in 2020, in 2019, in 2016 and in September 2015. Sometime between May and September 2015, WikiLeaks’ index had increased from a total of 215,517 files published – a count that had stood for several years, and is only 38 files more than the largest and most recent version of the Syria Files package posted onto WikiLeaks’ file server and torrent list.
This is also not the first time the issue of missing Syria Files has been raised, though it is several orders of magnitude larger than before. In 2016, The Daily Dot published evidence that the WikiLeaks release excluded evidence of a €2 billion transfer from Syria to Russia. Given a chance to review the situation and address the missing e-mails, WikiLeaks responded by saying in part, that “all Syria files obtained by WikiLeaks have been published.”
Since then, WikiLeaks has continued to state that the full 2.4 million Syria Files have been published and are available. As recently as March 2021, WikiLeaks repeated the 2.4 million number on Twitter.
In addition to being promoted by WikiLeaks, the inaccurate numbers have been used by the Courage Foundation to help raise funds for WikiLeaks. In text that appears to have been either copy and pasted or directly based upon the timeline and document count on WikiLeaks’ site, Courage repeats the claim twice on their page for WikiLeaks that over 2 million emails from the Syria Files have been released.
This leaves a number of questions:
What happened to the missing emails? Why does WikiLeaks seem to ignore or be unaware of the fact that they’re missing? Does this explain the missing bank transfer email, or is that a separate problem? Will the missing million emails ever be restored? Why is there apparently a similar discrepancy with the Saudi Cables*, and what happened to the promised third part of the Fishrot Files?
*Clarification on Saudi Cables
The raw database was published by WikiLeaks, but not indexed
Answers are unfortunately unlikely, but at least we know where the hole in the data is.
*For the most accurate count when using the scripts (archive), be sure to also run the “results extra” or “scanner2” script, as there are 157 entries out of range. The second script will quickly find these and then not find anymore new entries. However, it will continue to run until it is manually stopped. Therefore it is highly recommended that you only run the scanner2 script under active supervision and stop it as soon as it becomes clear that there are no more hits. Treat it like microwave popcorn. The primary scanner has a limited scope and does NOT have this issue.
Note: I’ve previously mentioned this on Twitter and been asked about it once or twice, but until now there hasn’t been a proper written record of the issue. It felt like it was time to change that.