What We’re Reading (Week Ending 07 July 2024) - 07 Jul 2024
Reading helps us learn about the world and it is a really important aspect of investing. The late Charlie Munger even went so far as to say that “I don’t think you can get to be a really good investor over a broad range without doing a massive amount of reading.” We (the co-founders of Compounder Fund) read widely across a range of topics, including investing, business, technology, and the world in general. We want to regularly share the best articles we’ve come across recently. Here they are (for the week ending 07 July 2024):
1. Etched is Making the Biggest Bet in AI – Etched
In 2022, we made a bet that transformers would take over the world.
We’ve spent the past two years building Sohu, the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT).
By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either.
But for transformers, Sohu is the fastest chip of all time. It’s not even close.
With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is an order of magnitude faster and cheaper than even NVIDIA’s next-generation Blackwell (B200) GPUs…
…By feeding AI models more compute and better data, they get smarter. Scale is the only trick that’s continued to work for decades, and every large AI company (Google, OpenAI / Microsoft, Anthropic / Amazon, etc.) is spending more than $100 billion over the next few years to keep scaling. We are living in the largest infrastructure buildout of all time.
Scaling the next 1,000x will be very expensive. The next-generation data centers will cost more than the GDP of a small nation. At the current pace, our hardware, our power grids, and pocketbooks can’t keep up…
…Santa Clara’s dirty little secret is that GPUs haven’t gotten better, they’ve gotten bigger. The compute (TFLOPS) per area of the chip has been nearly flat for four years…
…No one has ever built an algorithm-specific AI chip (ASIC). Chip projects cost $50-100M and take years to bring to production. When we started, there was no market.
Suddenly, that’s changed:
- Unprecedented Demand: Before ChatGPT, the market for transformer inference was ~$50M, and now it’s billions. All big tech companies use transformer models (OpenAI, Google, Amazon, Microsoft, Facebook, etc.).
- Convergence on Architecture: AI models used to change a lot. But since GPT-2, state-of-the-art model architectures have remained nearly identical! OpenAI’s GPT-family, Google’s PaLM, Facebook’s LLaMa, and even Tesla FSD are all transformers…
…We believe in the hardware lottery: the models that win are the ones that can run the fastest and cheapest on hardware. Transformers are powerful, useful, and profitable enough to dominate every major AI compute market before alternatives are ready…
- …As models scale from $1B to $10B to $100B training runs in the next few years, the risk of testing new architectures skyrockets. Instead of re-testing scaling laws and performance, time is better spent building features on top of transformers, such as multi-token prediction.
- Today’s software stack is optimized for transformers. Every popular library (TensorRT-LLM, vLLM, Huggingface TGI, etc.) has special kernels for running transformer models on GPUs. Many features built on top of transformers aren’t easily supported in alternatives (ex. speculative decoding, tree search).
- Tomorrow’s hardware stack will be optimized for transformers. NVIDIA’s GB200s have special support for transformers (TransformerEngine). ASICs like Sohu entering the market mark the point of no return. Transformer killers will need to run on GPUs faster than transformers run on Sohu. If that happens, we’ll build an ASIC for that too!…
…On GPUs and TPUs, software is a nightmare. Handling arbitrary CUDA and PyTorch code requires an incredibly complicated compiler. Third-party AI chips (AMD, Intel, AWS, etc.) have together spent billions on software to little avail.
But since Sohu only runs transformers, we only need to write software for transformers!
Most companies running open-source or internal models use a transformer-specific inference library like TensorRT-LLM, vLLM, or HuggingFace’s TGI. These frameworks are very rigid – while you can tweak model hyperparameters, changing the underlying model code is not really supported. But this is fine – since all transformer models are so similar (even text/image/video ones), tweaking the hyperparameters is all you really need.
2. Evolution of Databases in the World of AI Apps – Chips Ahoy Capital
Transactional Database vendors like MDB focus on storing and managing large volumes of transactional data. MDB also offers Keyword Search & rolled out Vector Search (albeit late vs competitors). Historically MDB Keyword Search has not been as performant as ESTC in use case utilizing large data sets or complex search queries & has less comprehensive Search features to ESTC…
…A vector database stores data as high-dimensional vectors rather than traditional rows and columns. These vectors represent items in a way that captures their semantic meaning, making it possible to find similar items based on proximity in vector space.
Real-World Example:
Imagine you have an online store with thousands of products. Each product can be converted into a vector that captures its attributes, like color, size, and category. When a customer views a product, the vector database can quickly find and recommend similar products by calculating the nearest vectors. This enables highly accurate and personalized recommendations.
In essence, a vector database helps in efficiently retrieving similar items, which is particularly useful in applications like recommendation systems & image recognition…
…RAG combines the strengths of Vector Search and generative AI models to provide more accurate and contextually relevant responses. Here’s how it works: 1) A user submits a query 2) the system converts the query into a vector and retrieves relevant documents or data from the vector database based on similarity 3) the retrieved documents are fed into a generative AI model (LLM), which generates a coherent and contextually enriched response using the provided data.
Multimodal models integrate multiple data types (text, images, audio) for comprehensive understanding and generation. It is crucial for vector databases to support multimodal data to enable more complex and nuanced AI applications. PostGres is a dominant open source vendor in the database market (scored #1 as most used Vector DB in recent Retool AI survey) but on it’s own it does NOT seem to include native support for multi-modality in it’s Vector Search. This limits the use cases it can be applied or used to without using an extension or integration to other solutions…
…Simple AI Use Cases:
Similarity Search has been one of the first and most prominent use cases of using GenAI. When a query is made, the database quickly retrieves items that are close in vector space to the query vector. This is especially useful in applications like recommendation engines & image recognition where finding similar items is crucial. These use cases have been in POC since last year, and are starting to move into production later this year.
Complex AI Use Cases:
Enter Generative Feedback Loop! In a Generative Feedback Loop, the database is not only used for Retrieval of data (main use case in Similarity Search). But it also provides Storage of Generated Data. The database in this case stores new data generated by the AI model if deemed valuable for future queries. This in my view changes the relationship that the AI Application has with a database as it then has to store data back in. A key example for Generative Feedback Loop is an Autonomous Agent…
…An AI autonomous agent and a database work together to perform complex tasks efficiently. The relationship between a database and an AI Agent at first seems similar to other use cases, where the database holds all necessary data and the AI Agent queries the database to retrieve relevant information needed to perform its tasks.
The key difference here is the Learning and Improvement aspect of AI Agents. Instead of just containing historical data, the database has been updated with new data from user interactions and agent activities. The AI Agent then uses this new data to refine its algorithms, improving its performance over time…
…A real life example could be an E-commerce Chatbot. The customer buys a product and leaves a review for that product. The database then updates the new purchase and feedback data, and the AI Agent learns from this feedback to improve future recommendations. In this scenario, the database is not just being queried for data, but it is storing data back from the interaction, the AI Agent is learning from this, creating what is referred to as a Generative Feedback Loop.
3. The Big Bad BREIT Post – Phil Bak
So here it is, our analysis of Blackstone’s Real Estate Income Trust. The data presented is as-of the original publication of June 2023. It should be noted that over the past year everything has played out as we warned, including the gating of Starwood’s SREIT. Last thing I’ll say: I’d have much preferred to be wrong…
…Given the vital role that “NAV” plays in fundraising and performance reporting, it’s surprising that a greater amount of transparency is not provided by sponsors into their valuation methodology. Remind me again why they don’t provide a comprehensive explanation for each input in the DCF model? Contrary to popular assumption, NAV is not based on appraisals that utilize sales comparisons. Instead, it’s based on an opaque discounted cash flow (DCF) methodology that is based on assumptions that are at the discretion of the sponsor who realizes fee streams pegged to the asset values they assign.
BREIT’s self-reported performance is – by their own admission – “not reliable.” Why we didn’t take a closer look at it before is as much a mystery as how they compute it. Management can’t just pull numbers out of thin air, and they’ve done nothing illegal, but they have a lot of discretion on where they estimate share values to be.
According to their prospectus, Blackstone values the fund itself once a month; then once a year it brings in an outsider who prepares a valuation based on their direction. But in its March 28, 2023 prospectus amendment, BREIT removed the steps in bold. (1) a third-party appraisal firm conducts appraisals and renders appraisal reports annually; (2) an independent valuation advisor reviews the appraisal reports for reasonableness; (3) the advisor (Blackstone) receives the appraisal reports and based in part on the most recent appraisals, renders an internal valuation to calculate NAV monthly; (4) the independent valuation advisor reviews and confirms the internal valuations prepared by the advisor. (5) BREIT will promptly disclose any changes to the identity or role of the independent valuation advisor in its reports publicly filed with the SEC.
The verbiage in their disclosures doesn’t suggest that their calculation will be better than relying on market prices. The highlighted portions seem to be saying that Blackstone uses baseless returns in their SEC filings. They are not using a methodology prescribed by the SEC or any regulatory body. They do not adhere to any accounting rules or standards. Nor is their monthly NAV calculation audited by an independent public accounting firm. Blackstone uses it solely to determine the price at which the fund will redeem and sell shares. The NAV also happens to dictate the fees they can earn…
…One of BREIT’s big selling points was the ability to get a dividend of around 4% when interest rates were near zero, but the fund cannot – and has never been able to – cover the dividend payment. The current Class S distribution of 3.74% and Class I yield of 4.6% aren’t fully earned based on a key REIT cash-flow measure: Available Funds from Operations (AFFO). AFFO is used to approximate the recurring free cash flow from an income producing real estate vehicle and calculate the dividend coverage.
Blackstone reports AFFO, but their reported number is janky. It omits the management fees they charge. Their rationale is that they have not taken their fees in cash but instead converted their $4.6 billion in fees into I-Shares, which is a class of BREIT shares that has no sales cost load. But their election to accept shares is optional, the shares they receive are fully earned and they can redeem their shares at stated NAV. What’s more, they have redemption priority over other BREIT investors; there is no monthly or quarterly redemption limitation. Blackstone has already redeemed $658 million in shares.
BREIT’s AFFO also omits recurring real estate maintenance capital expenditures and stockholder servicing fees which are part of the sales load. Computing an AFFO more consistent with public company peers would result in a payout ratio for the first half of 2023 of more than 250%.
BREIT, unlike most big public REITs, has only covered about 13% of their promised dividend distribution. There’s not a single year in which they could cover their payment if everybody elected to receive it. Since inception, the company has delivered $950 million in AFFO and declared $7.3 billion in distributions. That’s a stunning 768% dividend payout ratio…
…BREIT is levered approximately 49% against NAV and closer to 60% as measured against cost – the average cost of BREIT’s secured borrowings stands at approximately 5.5 % before hedges so the cost of their debt exceeds the yield. There are few ways you can turn these numbers into a double digit return. Rents would have to go to the moon. The only way there can be positive leverage over a holding period (IRR) is if there is a shedload of positive income growth. And that’s exactly what BREIT has baked in the valuation cake. Interest rates went up so the NPV should be way down but – in a fabulous coincidence – future cash flow expectations went up by just enough to offset it. The numerator where revenue growth shows up made up for the rise in rates in the denominator…
…Here’s the BREIT Story in a nutshell: They’ve reported an annual return since inception for its Class S investors north of 10% with real estate investments that have a gross current rate of return of less than 5% on their cost. They’ve been buying assets at a 4% cap rate, paying a 4.5% dividend and reporting 10+% returns. And nobody has called bullshit…
…By taking BREIT’s current NOI and dividing it by the NAV, investors can compute the implied cap rate on BREIT’s portfolio as they are valuing it – and compare it with public REITs. Interest rates have moved 200-300 basis points in recent months, and in public markets elevated cap rates have driven a 25% decline in values. A recent analysis of two vehicles in the non-traded REIT space concluded that both funds are being valued at implied cap rates of approximately 4.0% when publicly traded REITs with a similar property sector and geographic are trading at an implied cap rate closer to 5.75% . Applying that 5.75% cap rate to BREIT would result in a reduction in shareholder NAV of more than 50%. The current valuation of roughly $14.68/ share should be closer to $7-8/share.
4. Grant Mitchell — The Potential of AI Drug Repurposing – Jim O’Shaughnessy and Grant Mitchell
[Grant:] I was leading teams that were really pioneering the use of large medical record databases to identify subpopulations where a drug might perform better, might be higher in efficacy or better in safety. And we realized that that’s really, in a way, it’s kind of drug repurposing. It’s taking a drug and finding a population where it works a little bit better in a drug that already exists.
And as David was working in the lab and I was working in the data, we kind of came together and we say, “Can we automate what we’ve done? Can we scale what we’ve done in just one disease?” And given the explosion and the amount of data that exists out there and the improvements in the way that we can harmonize and integrate the data into one place, and then the models that have been built to analyze that data, we thought that maybe it would be possible. And we would check in every few years. 2016, 2017, it wasn’t really possible. We had this dream for a long time. 2018, 2019 is probably when I was talking to you and I was thinking about can we do this?
And really, lately it’s become possible, especially with, like I said before, more data, structured better. You have models like these large language models that are able to digest all of medical literature, output it in a structured fashion, compile it into a biomedical knowledge graph, these really interesting ways to display and analyze this kind of data. And ultimately, that’s how Every Cure was formed, was the concept that the drugs that we have are not fully utilized to treat every disease that they possibly can, and we can utilize artificial intelligence to unlock their life-saving potential.
Jim: Just so incredibly impressive. And a million questions spring to mind. As you know, my oldest sister, Lail, died of lupus. And when you said the cytokine storm, she had a kind of similar thing where she would go into remission, and then there’d be a massive attack, and it wasn’t like clockwork like your colleague’s, but when she died in 1971, it was like nobody knew very much at all about the disease. And in this case, did you find that the cure that worked for your colleague, was that transferable to other people with this similar disease?
Grant: Yeah, so the cure that worked for him, we studied his blood, we sampled his lymph nodes, we did immunohistochemistry and flow cytometry and basically found that their cytokines were elevated, another molecule called VEGF was elevated, there’s T cell activation. This all pointed towards something called the mTOR pathway. And started looking at different drugs that would hit that pathway, settled on a drug called Sirolimus. Sirolimus has been around for decades. It’s actually isolated from a fungus found in the soil on Easter Island. It’s amazing, right? And it shuts down the overactivation of this pathway that leads to this cascade that causes this whole cytokine storm.
For David it works perfectly, and it also works for about a third of the other patients that have a disease like David. And so that’s resulted in the benefit to countless thousands and thousands of patients’ lives. It’s a pretty thrilling and satisfying and motivating thing to be able to figure something like that out and to be able to do it, they have the opportunity to do it more and at scale and have the opportunity to save potentially millions of lives is a huge motivation for my team…
…[Grant:] So we couldn’t quite piece it together, and it was really an aha moment that this should be designed as a nonprofit, and it should be an AI company, because if you want to build the world’s best AI platform for drug repurposing, you’re going to need the world’s best dataset to train it, and you’re not going to get your hands on all the data that you want to get your hands on if you’re a competitor to all these people that are trying to use this data.
So we’re collaborative. We’re non-competitive. We are not profit-seeking. Our primary goal is to relieve patient suffering and save patient lives. So I’ll get to your question about how we’re utilizing that kind of resiliency data that I mentioned before. But first I’m going to help you understand how we use it. I’m going to describe the kind of data set that we’re constructing, and it’s something called a biomedical knowledge graph. It’s well known in the areas and the fields that we’re in, but maybe not a commonly known term to the layman, but it’s effectively a representation in 3D vector space of all of the biomedical knowledge we have as humanity, every drug, every target, every protein, every gene, every pathway, cell type, organ system, et cetera, and how they relate to different phenotypes, symptoms, and diseases.
And so every one of those biomedical concepts that I just described would be represented as a node, and then every relationship that that concept has with another relationship, like a drug treats a disease, there would be an edge. They call it a semantic triple. Drug, treats, disease. So you’ve got a node, an edge, and a node. And imagine a graph of every known signaling molecule and protein and a concept you can imagine, tens of millions of nodes, even more edges, representing all of human knowledge in biology. And that’s what multiple people have constructed. Actually, NIH funded a program called the NCATS Translator Program where a number of these knowledge graphs have been constructed. Other groups are doing it. A lot of private companies have their own. We are compiling them and integrating it with an integration layer that kind of takes the best from the top public ones, and then layers in additional proprietary data that we get from other organizations or data that we generate on our own.
And the example that you just mentioned, a company that is working on tracking genetic diseases and groups of people with the same genetic disease and looking at subpopulations within that group where there might be some resilience to the mutation, and then studying their genome to say, “Okay, what other proteins are being transcribed that might be protective against this mutation?”, and then going out and designing drugs that might mimic that protection. Well, how’s that data going to fit into my knowledge graph? Well, you can imagine that now if I have the data set that they’re working with, I know that there’s a mutation that results in a disease. So a gene associated with disease, that’s a node, an edge, and a node. And I also know that this other protein is protective of that disease.
So that just information that goes into the graph. And the more truth that I put into that graph, the more I can train that graph to identify patterns of successful examples of a drug working for a disease, and then it can try and find that pattern elsewhere where it either identifies nodes and edges that should already be connected or are connected in our knowledge base but no one has actually acted on, or it can maybe even generate a hypothesis on a totally new edge that is novel and has never been considered by experts before. So to answer your question, again, is we’re not doing that work ourselves, but we integrate the knowledge from that work so it can train our models and so we can pursue drug repurposing ideas…
…[Grant:] We’re not designing novel compounds. We think that there’s so much low-hanging fruit with the 3000 drugs that already exist that we are going to spend years and years unlocking the life-saving potential of those. And the reason why we’re focused there is because that is the fastest way to save human lives. If you develop a novel compound, you have to go all the way through the entire clinical development of an approval process. IND, phase one, phase two, phase three trials. This takes years and years and hundreds of millions of dollars, whereas in certain scenarios in drug repurposing, just like with my co-founder David, within weeks of us coming up with the hypothesis that this drug might work for him, as long as we could find a physician that would prescribe it to him, it went directly into his human body just weeks later.
So that brings me to this issue that I think we’re going to see, and you as an investor might make yourself aware of, is that there’s going to be lots and lots of failures in the world of AI-driven drug discovery. And that’s because not only are you an AI company that’s generating hypotheses, you’re also a biotech company that has to validate a novel compound and bring it all the way through the clinic through clinical trials and through regulatory approvals and into patients. So here you are an AI company, you’ve hired up your team of 50 data scientists and experts, and you come up with your hypothesis and you say, “Okay, great.”
You’re not Amazon that gets to A/B test where they’re going to put a button on the user interface and then they get feedback by the end of the day and okay, move the button here instead of here. When you come up with your hypothesis after your AI team says, “Okay, this is what the drug we’re going to move forward with,” you now have to go through potentially 10 years and hundreds of millions of dollars of additional development. So you don’t know if your AI team built anything of value. You don’t have that validation feedback loop that you do in other AI consumer-based organizations. So now you’re juggling sustaining an AI corporation that doesn’t have a feedback loop while you have to also pay for the clinical development of a drug. And so it’s a tension that’s hard, hard to manage.
And drug repurposing solves that tension. It allows us to go from hypothesis to validation in a much tighter feedback loop. So what we’re doing is something that both helps patients in the fastest and cheapest way possible, but also, the happy accident is that we push forward the field of data-driven drug discovery because we can inform our models in a faster feedback loop…
…[Grant:] One thing I learned when I was at Quantum Black and at McKinsey is, and we would go up against other machine learning organizations. I remember one time they put us head to head with another group and they said, “Okay, whoever comes with the best insights in the next three months, we’re going to pick to go with a longer contract going forward. And two seemingly similar teams working on the same dataset. We came up with a totally different recommendations than the other team did, and what was actual differentiator between the teams was that we had five medical degrees on our team, not just a bunch of data scientists, but data scientists plus medical experts. And in every step of the way that you’re building these knowledge graphs and designing these algorithms, you’re interfacing with medical expertise to make sure you imbue it with clinical understanding, with biological rationale of how this is actually going to work and how to interpret the typically really messy medical data.
And so if you think about the matrix that we’re producing, this heat map of 3000 drugs cross-referenced with 22,000 diseases creates 66 million possibilities, and we then score those possibilities from zero to one, and normalize them across the whole landscape. So that’s a tricky thing to do is drug A for disease X compared to drug B for disease Y, how do you compare the possibilities of each of those in zero to one? So we create that normalized score, and then we start looking at the highest scores and then filter down from there to say, “Okay, of all the highest probability of success opportunities here, which ones are going to impact patients the most, and which ones can we prove out quickly and efficiently in a lowcost trial with a few metapatients and high signal, so we can do this in three to six to 12 months births and suppose of five-year trial times?”
And the thing to think about, back to the comment about we need medical expertise highly integrated with what we’re doing is that even if you take the top thousand scores there, you’re still in the 0.001% of the highest ranking of scores, and now you got to pick amongst your thousand to get down to the top five. To get down to the top one, what is my first shot on goal going to be? That better be successful for all the things that I’m working on here, and it better help patients and really better work. So the AI can’t do that. You need a really smart head of translational science to make that last sort of decision of what’s going to go into patients and how it’s all going to work…
… [Grant:] we’re a nonprofit because we want to build the world’s best AI platform and we need the best data set to do it to save as many lives as we possibly can with drugs that already exist. So since the drugs already exist, it’s kind of a funny thing. I say we’re the smallest and the biggest pharma company in the world. We’re the biggest because every single drug that already exists is in our pipeline. We’re the smallest because we don’t own any of them. And then we take those drugs and we go after diseases that are totally neglected by the pharmaceutical industry. So it’s by design has to be a nonprofit.
5. How Bull Markets Work – Ben Carlson
Halfway through the year, the S&P 500 was up 15.3%, including dividends.
Despite these impressive gains the bull market has been relatively boring this year.
There have been just 14 trading days with gains of 1% or more. There has been just a single 2% up day in 2024. And there have only been 7 days of down 1% or worse.
Small moves in both directions.
Bull markets are typically boring like this. Uptrends tend to be these slow, methodical moves higher. Bull markets don’t make for good headlines because they’re made up of gradual improvements.
Bear markets, on the other hand, are where the excitement happens. Downtrends are full of both big down days and big up days…
..The best and worst days happen at the same time because volatility clusters. Volatility clusters because investors overreact to the upside and the downside when emotions are high…
…It’s also interesting to note that even though the S&P 500 is having a boring year, it doesn’t mean every stock in the index is having a similar experience.
While the S&P is up more than 15% there are 134 stocks down 5% or worse while 85 stocks are down 10% or more so far this year.
Stock market returns are concentrated in the big names this year, but it’s normal for many stocks to go down in a given year.
Disclaimer: None of the information or analysis presented is intended to form the basis for any offer or recommendation. We currently have a vested interest in Alphabet, Amazon, Meta Platforms, Microsoft, MongoDB, and Tesla. Holdings are subject to change at any time.