https://vantai-tutorials.s3.us-east-2.amazonaws.com/neo_assets/Fig2_IntroducingNeo/fig2_diffusion_animation.mp4GraphML News (March 23rd) - Neo-1 and Lila Sciences round
š§¬Ā VantAI https://www.vant.ai/neo-1 Neo-1, a foundation model for structure prediction and de novo generation capable of doing a bunch of protein design tasks (folding, co-folding, docking, all-atom molecule design, fragment linking, and more) at once instead of different modules. While we are waiting for the tech report, we could guesstimate that Neo-1 is an all-atom latent generative model (perhaps a Diffusion Transformer like in other competitors as itās powered by a hefty cluster of H100s) with some advanced sampling techniques beyond standard guidance - the blog post talks about optimizing for non-differentiable properties with reward-like models and it sounds quite similar to the https://arxiv.org/abs/2410.08134.
As impressive as the modeling advances are, true aficionados know that data diversity and distribution is even more important at scale - on that front VantAI introduce NeoLink, a massive data generation flywheel based on cross-linking mass-spectrometry (XLMS). Reported experiments suggest it brings massive improvements in quality, so itās likely to be the key innovation and the point of further scaling up. The graphics in the blog post are amazing and the graphic designer should get a raise š.
šøĀ Lila Sciences https://www.lila.ai/news/join-our-mission with $200M seed funding. Lila will focus on materials discovery and automated self-driving labs while alluding to Superscience, an AI 4 Science equivalent of Super Intelligence you often hear from LLM folks which would massive speed up exploration pipelines. Lila is part of the Flagship Pioneering ecosystem (you might know https://generatebiomedicines.com/ and their Chroma generative model made some noise last year) and attracted funding from General Catalyst, March Capital, ARK, and other famous VCs (even Abu Dhabi Investment Authority). Knowing that the OpenAI VP of post-training
https://x.com/LiamFedus/status/1901740085416218672, the area is likely to attract even more VC funding in the near future.
Weekend reading:
https://arxiv.org/abs/2503.09008 by Huidong Liang and Oxford folks - introduces new long-range graph datasets extracted from road networks in OpenStreetMap. Good news: graphs are quite large and sparse (100k nodes with 100+ diameter). Less good news: GraphSAGE is still SOTA š«
https://arxiv.org/abs/2502.02379 by Corinna Coupette, Jeremy Wayland, et al - studies the quality of 11 graph classification datasets, only NC11, MolHIV, and LRGB datasets are ok, others should be thrown to garbage.
https://arxiv.org/pdf/2503.05771 by Keqiang Yan and large Texas A&M collab - introduces HIENet, an ML potential rivaling MACE-MP0, Equiformer, and ORB on energy, forces, and stresses predictions.
https://arxiv.org/abs/2503.15650 by Antonis Vasileiou, Stefanie Jegelka, Ron Levie, and Christopher Morris - everything you wanted to know about GNNs linked to VC dimension, Rademacher complexity, PAC-Bayes, and learning theory. MATH ALERT