Nature earns ire over lack of code availability for Google DeepMind protein folding paper

via Nature

A group of researchers is taking Nature to task for publishing a paper earlier this month about Google DeepMind’s protein folding prediction program without requiring the authors publish the code behind the work.

Roland Dunbrack, of Fox Chase Cancer Center in Philadelphia, peer-reviewed the paper but was “not given access to code during the review,” the authors of a letter submitted today, May 14, to Nature – including Dunbrack – write, “despite repeated requests.”

A Nature podcast said AlphaFold3 – unlike AlphaFold2 – “can accurately predict protein-molecule complexes containing DNA, RNA and more. Although the new version is restricted to non-commercial use, researchers are excited by its greater range of predictive abilities and the prospect of speedier drug discovery.”

Not everyone was excited. The authors of the letter, which co-author Stephanie A. Wankowicz of the University of California, San Francisco told Retraction Watch was submitted today to Nature, write they “were disappointed with the lack of code, or even executables accompanying the publication of AlphaFold3 in Nature.” They continue:

Although AlphaFold3 expands AlphaFold2’s capacities to include small molecules, nucleic acids, and chemical modifications, it was released without the means to test and use the software in a high-throughput manner. This does not align with the principles of scientific progress, which rely on the ability of the community to evaluate, use, and build upon existing work. The high-profile publication advertises capabilities that remain locked behind the doors of the parent company.

The authors, who are circulating the letter for additional signatures, write that “the model’s limited availability on a hosted web server, capped at ten predictions per day, restricts the scientific community’s capacity to verify the broad claims of the findings or apply the predictions on a large scale. Specifically, the inability to make predictions on novel organic molecules akin to chemical probes and drugs, one of the central claims of the paper, makes it impossible to test or use this method.”

A May 8 news story by the independent team of journalists at Nature noted the restrictions. Nature editor in chief Magdalena Skipper told Retraction Watch:

Nature has a long-standing policy designed to facilitate the availability of data, materials and code upon reasonable request. While seeking to enhance transparency at every opportunity, Nature accepts that there may be circumstances under which research data or code are not openly available. When making a decision on data and code availability, we reflect on many different factors, including the potential implications for biosecurity and the ethical challenges this presents. In such cases we work with the authors to provide alternatives that will support reproducibility, for example through the provision of pseudocode, which is made available to the reviewers during peer review.

As noted in the code availability statement in the paper: AlphaFold 3 is available as a non-commercial usage only server at https://www.alphafoldserver.com, with restrictions on allowed ligands and covalent modifications. Pseudocode describing the algorithms is available in the Supplementary Information.

The pseudocode, however, “will require months of effort to turn into workable code that approximates the performance, wasting valuable time and resources,” the authors of the letter write. “Even if such a reimplementation is attempted, restricted access raises questions about whether the results could be fully validated.”

The authors of the letter continue:

When journals fail to enforce their written policies about making code available to reviewers and alongside publications, they demonstrate how these policies are applied inequitably and how editorial decisions do not align with the needs of the scientific community. While there is an ever-changing landscape of how science is performed and communicated, journals should uphold their role in the community by ensuring that science is reproducible upon dissemination, regardless of who the authors are.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at [email protected].

11 thoughts on “Nature earns ire over lack of code availability for Google DeepMind protein folding paper”

  1. I think it’s pretty obvious that DeepMind wants to commercialize AlphaFold3, and that giving away the code would be like cloning the goose that lays the golden eggs. Maybe 10 models per day is not enough, but I understand DeepMind’s fiduciary responsibility to turn a profit.

    1. Nothing wrong about prioritizing profits indeed. But not publishing in a Nature journal is not wrong either. It’s a trade-off so many companies have dealt with in the past.

      There is an open-science policy and everybody, incl. DeepMind, should respect it.

    2. Nature is not and shouldn’t become an advertisement board for megacorporations. This sort of preferential treatment and leniency towards data sharing policies is a clear and straight path towards that.

      1. Nature and its mother company Springer have been an advertisement board for for megacorporations for quite some time, e.g. reporting on drug development while salivating that “the weight-loss drug market is forecast to be worth up to US$100 billion by 2030.” https://www.nature.com/articles/d41586-024-01433-6 . Perhaps he OxyContin murderous saga should teach us something about profits vs. health.

  2. The code for the chess-playing adaptive algorithm AlphaZero was never released, nor was it ever made accessible for outsiders to try. All we ever got were a collection of 100 games it had played against a normal chess AI.

    The chess community then tried re-implementing it from scratch based on what was in the publication. The result, LeelaChess, is…one player described it as like an elderly grandmaster. Impressive overall understanding but occasionally nods off and does something befuddled. Clearly not as strong as AlphaZero. So there wasn’t enough information in the publication to replicate the work. (Or they got lucky and we did not.)

    I agree that the journals should decline work that doesn’t have enough information to be replicated.

    1. LeelaChessZero has surpassed AlphaZero probably a few years ago.
      They were even ahead of Deepmind’s “Chess without Search” paper were able to produce better performance than the model trained by Deepmind.

  3. 6th paragraph: “the inability to make predictions on novel organic molecules…” should read “the ability to make predictions on novel organic molecules…”, I presume.

  4. Open science is best. But I would rather have private companies publish their advances, specially big ones, in widely read journals, than squirrel it away without the scientific universe knowing about it. If it comes at the cost of restrictions on code and number of trials, then so be it. A side note: perhaps this encourages major large biotech and pharmaceutical R&D companies to also publish some of the treasure troves of proprietary technology/information that are locked up in their vaults.

  5. It’s time to mandate all codes made available with step by step explanatory notes when published in scientific journals. The codes should be in journal sites as supplementary file rather than depositing in repositories (so that one cannot edit it after publication date). Likewise, the codes for NGS processing and analysis pipelines, RNA/protein fold prediction algorithms, despite if it is standard and published before (this helps to evaluate during review or post-publication that if the authors committed any mistakes). I am saying this for general practice, not just this paper.

  6. This complaint is ridiculous. Google are letting us all use it for free – 20 times a day. Get a dozen students to login and you can run 7000 structures a month. The authors of this letter claim this isnt allowing the “community to evaluate, use, and build upon existing work”??? Google could have very easily kept everything secret and then made us all pay-per-use.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.