When Ali Ghodsi, the CEO of a $940 million data-crunching company with clients ranging from Netflix to Shell, first pitched his idea to companies as a free service, no one wanted in. They all thought it was too academic to have any real use.
In order to cross what’s known in Silicon Valley as the “Valley of Death” — where academic ideas fail to translate into real products — Ghodsi and co-founder Ion Stoica had to start their own company. Called Databricks, the platform plows through massive amounts of information to provide users with the highly-customized experience they’ve come to expect of platforms like Netflix.
Four years and nearly a billion dollars in funding (from big-name backers like Andreessen Horowitz) later, Ghodsi and Stoica are unveiling an ambitious new initiative that tackles one of the biggest problems in medicine: many of the most promising drug candidates for diseases like cancer and Alzheimer’s ultimately fail because they don’t work the way researchers hoped they would.
The new initiative pulls from the tools embedded in Databricks’ existing platform, called Spark. But instead of crunching user data for companies like Netflix, the tool crunches genetics data for drug companies.
Pharmaceutical giant Regeneron is the first client. The New York-based company makes several popular drugs for skin and eye conditions like macular degeneration and dermatitis, and it’s exploring medicines for ailments like asthma, pain, and cancer. The company also maintains a sizable genetics database of anonymized information from more than 300,000 people.
That data, together with Databricks new platform, allows the company to speed through drug development in a way that hasn’t been possible before.
Software that breaks walls and shortens time
Databricks’ new tool, formally known as the Databricks Unified Analytics Platform for Genomics, tackles two major impediments to successful drug development that VCs have tended to overlook.
The first is that it can take scientists months to run the basic kind of analysis they need before pursuing advanced steps towards creating new drugs.
That type of analysis might include scanning millions of anonymized genomes to look for relationships between a tweak in one gene and the overall risk of developing a disease like liver cancer. If drugmakers want to create a new medicine that tackles liver cancer by acting on a genetic variant, they first need to be sure that changing the way that variant works actually influences the spread of the disease.
Databricks’ powerful analytical platform disrupts that process, shrinking down the time it once took to run that kind of analysis from weeks or months to minutes or seconds. The pillars of Spark — which first enabled companies like Netflix and Shell to comb through massive and complex data sets in a fraction of the time it once took — are key to that speed.
A second obstacle faced by drugmakers is that data scientists who run those analyses are often working separately from the biologists who study the disease and its genetics. That means lots of time and effort gets wasted just trying to get the right people together to come up with the right pools of data — not to mention the money involved.
It’s really the boring work of getting your shit together that’s being massively improved here.
Today, the pharmaceutical industry spends on average $2.7 billion for every drug that makes it to market. Roughly half of drug programs that reach clinical trial stages ultimately fail.
“The biggest problem has been having the data live in different places and the fact that it gets pulled together independently by different experts,” Lukas Habegger, Regeneron’s associate director of bioinformatics, told Business Insider.
Critical to the new tool is that it does its work in a central repository that won’t be thwarted by academic silos that currently prevent a lot of progress in the drug development field.
“This breaks all the walls between these teams,” Databricks’ Ion Stoica told Business Insider.
Neither of those problems are intuitively appealing to potential funders, said Regeneron’s Habegger, which makes him all the more grateful for the work Databricks is doing.
“In many ways the biggest achievement here is the least sexy thing. Nobody went and pitched a VC some strategy … that makes it easier to get the data they want. They’ll get asked instead, ‘Oh but does it have blockchain?’ and the answer is, ‘Well no, but that’s irrelevant.’ It’s really the boring work of getting your shit together that’s being massively improved here,” Jeffrey Reid, Regeneron’s head of genome informatics, told Business Insider.
Databricks is far from the only company that’s looking to harness the power of machine learning and artificial intelligence to improve drug development.
Daphne Koller, the former head of Google’s life-extension spinoff, recently left to start her own AI-powered drug development company. The US Department of Defense backed a 2016 plan to use machine learning to pin down clues about the biology that drives tumors. Eric Horvitz, director of Microsoft Research Labs, recently called AI a “sleeping giant for healthcare.”
If Databricks is successful with Regeneron, it could hint at a bright future for the platform’s ambitions in the world of genomics.
“We’re setting up tools to make it trivial, or at least easy, to ask the kinds of questions that only data scientists could ask before,” Reid said.