The Magic of MMM’s
For the past few years, the technology world has been overtaken by Large Language Models, or LLMs. The power of these models is indisputable. Spend time with ChatGPT 4o or Llama 3.1 or Claude 3.5 and you will realize that these models have matured and now provide almost mesmerizing utility. In many ways, they perform tasks better than some of our most advanced technologies, as many people now search ChatGPT more often than they search Google.
For those that aren’t following this closely, here is a summation of how an LLM works in its own words; or as people like to say, the TLDR.
LLM’s are artificial intelligence systems that understand, generate and predict text. They are trained on vast amounts of data from books, articles, websites, etc. During the training, the model learns statistical patterns and relationships between words, sentences, and paragraphs. The text is broken down into tokens (blocks of text) and transformers convert the tokens into vectors (a list of numbers) that represent the tokens’ meaning and relationship in three dimensional space. Here is where it gets a little complicated. Transformers determine how much attention each token should give other tokens using weights, which effectively provide a similarity score between the query of a token and its relationship to all other tokens in response. In other words, transformers turn text into numbers and put those numbers into a three dimensional graph based on their association to other words.
So, if I ask “why does a dog bark,” the model picks up on the key words “why,” “dog,” “bark.” And based on the training of that model, it has learned that words like “protection,” “excitement,” “fear” are the most relevant in response, based on weights that have been assigned to those words.
One of the biggest surprises over the last few years is not how LLM’s work, but how they scale. As we have poured more data into the top of the funnel, and added more compute capacity, largely in the form of bigger GPU clusters, the models have scaled in an almost linear fashion. Pour in twice the amount of data, with twice the amount of compute, and you get a roughly corresponding lift in the performance of the model. We don’t know if this scaling has a limit, but it has already produced models that are dazzling in their ability to query unstructured, messy data and produce harmonized and intelligent responses.
This is where healthcare comes in, I promise.
As an AI system, LLMs offer a profound solution to our current healthcare challenges, especially as context windows grow, increasing the amount of content a model can retain and process. This coupled with improvements in computing capacity, an inevitable byproduct of the investments we are making in AI as a society, creates the necessary ingredients for a biological renaissance. Let’s call this the age of MMMs, or Massive Multimodal Models.
Tempus, as a solution, could not have existed 20 years ago. In order for a company like Tempus to “work” it required advancements in background technologies, namely low cost cloud computing, high-throughput parallelization of molecular sequencing, and improvements in natural language processing and optical character recognition.
Tempus leveraged these tools to build an AI-enabled diagnostic platform that over the past 9 years has amassed hundreds of petabytes of rich, multimodal, de-identified healthcare data from millions of records in oncology and other diseases. This real world data set is derived from routine clinical care and includes vast amounts of outcome and response data. Note, I bolded those words.
Over the next year or two, these models, which are already incredibly powerful, will continue to evolve, with improvements in their ability to process multimodal data and their ability to remember more, reason more, and process more. This will allow them to work with the different modalities of healthcare data at scale – pathology slides, radiology scans, molecular files, unstructured text, lab results, structured results, ultrasounds, electrocardiograms, and etc.
To derive value from these models, all that is needed is vast amounts of this data connected to vast amounts of outcome and response data. Note, I bolded that again.
If you have the answers you can train the models; that’s how it works. Up until now, no one has had these many answers (which drugs patients responded to) connected to this much multimodal data, especially rich molecular data (DNA, RNA, proteins, etc.).
If you use that data for training, and leverage the power of these new models with sufficient compute (as in lots of GPUs), and deep domain expertise to evaluate and fine-tune the models, you will likely unlock a plethora of associations between biological data elements that are routinely collected and outcomes and responses. This is the holy grail.
Imagine being able to sequence a patient and predict with high fidelity whether or not a drug will work. Imagine arming a physician with the knowledge of whether or not a patient will experience an adverse side effect or be at risk for some other condition based on a course of therapy. Imagine arming drug companies with perfect information as to which clinical trials will succeed and fail. Imagine giving them a road map for new drugs that are highly targeted, highly efficient, and highly effective.
In a healthcare system that spends ~$4.8 trillion dollars annually on healthcare, MMMs could be just what the doctor ordered.
Now, we just need to bring all of the ingredients together – large amounts of multimodal healthcare data tied to outcomes and responses, large models, and large amounts of capital and GPU capacity. Add those three together along with a team that has sufficient domain expertise to make sense of the models, and you have what you need for MMMs to take shape.
Tada…