What We’re Reading (Week Ending 06 July 2025)

What We’re Reading (Week Ending 06 July 2025) - 06 Jul 2025

Posted at 09:00h in What We're Reading by Ser Jing & Jeremy 0 Comments

0 Likes

Reading helps us learn about the world and it is a really important aspect of investing. The late Charlie Munger even went so far as to say that “I don’t think you can get to be a really good investor over a broad range without doing a massive amount of reading.” We (the co-founders of Compounder Fund) read widely across a range of topics, including investing, business, technology, and the world in general. We want to regularly share the best articles we’ve come across recently. Here they are (for the week ending 06 July 2025):

1. Etched is Making the Biggest Bet in AI – Etched team

We’ve spent the past two years building Sohu, the world’s first specialized chip (ASIC) for transformers (the “T” in ChatGPT).

By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either.

But for transformers, Sohu is the fastest chip of all time. It’s not even close.

With over 500,000 tokens per second in Llama 70B throughput, Sohu lets you build products impossible on GPUs. Sohu is an order of magnitude faster and cheaper than even NVIDIA’s next-generation Blackwell (B200) GPUs…

…No one has ever built an algorithm-specific AI chip (ASIC). Chip projects cost $50-100M and take years to bring to production. When we started, there was no market.

Suddenly, that’s changed:

Unprecedented Demand: Before ChatGPT, the market for transformer inference was ~$50M, and now it’s billions. All big tech companies use transformer models (OpenAI, Google, Amazon, Microsoft, Facebook, etc.).
Convergence on Architecture: AI models used to change a lot. But since GPT-2, state-of-the-art model architectures have remained nearly identical! OpenAI’s GPT-family, Google’s PaLM, Facebook’s LLaMa, and even Tesla FSD are all transformers.

When models cost $1B+ to train and $10B+ for inference, specialized chips are inevitable. At this scale, a 1% improvement would justify a $50-100M custom chip project.

In reality, ASICs are orders of magnitude faster than GPUs. When bitcoin miners hit the market in 2014, it became cheaper to throw out GPUs than to use them to mine bitcoin…

…We believe in the hardware lottery: the models that win are the ones that can run the fastest and cheapest on hardware. Transformers are powerful, useful, and profitable enough to dominate every major AI compute market before alternatives are ready:

Transformers power every large AI product: from agents to search to chat. AI labs have spent hundreds of millions of dollars in R&D to optimize GPUs for transformers. The current and next-generation state-of-the-art models are transformers.
As models scale from $1B to $10B to $100B training runs in the next few years, the risk of testing new architectures skyrockets. Instead of re-testing scaling laws and performance, time is better spent building features on top of transformers, such as multi-token prediction.
Today’s software stack is optimized for transformers. Every popular library (TensorRT-LLM, vLLM, Huggingface TGI, etc.) has special kernels for running transformer models on GPUs. Many features built on top of transformers aren’t easily supported in alternatives (ex. speculative decoding, tree search).
Tomorrow’s hardware stack will be optimized for transformers. NVIDIA’s GB200s have special support for transformers (TransformerEngine). ASICs like Sohu entering the market mark the point of no return. Transformer killers will need to run on GPUs faster than transformers run on Sohu. If that happens, we’ll build an ASIC for that too!…

…Isn’t inference bottlenecked on memory bandwidth, not compute?

Actually, for modern models like Llama-3, no!

Let’s use NVIDIA and AMD’s standard benchmark13: 2048 input tokens and 128 output tokens. Most AI products have much longer prompts than completions (even a new Claude chat has 1,000+ tokens in the system prompt).

On GPUs and on Sohu, inference is run in batches. Each batch loads all of the model weights once, and re-uses them across every token in the batch. Generally, LLM inputs are compute-bound, and LLM outputs are memory-bound. When we combine input and output tokens with continuous batching, the workload becomes very compute bound…

…We can scale up the same trick to run Llama-3-70B with 2048 input tokens and 128 output tokens. Have each batch consist of 2048 input tokens for one sequence, and 127 output tokens for 127 different sequences.

If we do this, each batch will require about (2048 + 127) × 70B params × 2 bytes per param = 304 TFLOPs, while only needing to load 70B params × 2 bytes per param = 140 GB of model weights and about 127 × 64 × 8 × 128 × (2048 + 127) × 2 × 2 = 72GB of KV cache weights. That’s far more compute than memory bandwidth: an H200 would need 6.8 PFLOPS of compute in order to max out its memory bandwidth. And that’s at 100% utilization – if utilization was 30%, you’d need 3x more.

Since Sohu has so much compute with very high utilization, we can run enormous throughputs without bottlenecking on memory bandwidth…

…On GPUs and TPUs, software is a nightmare. Handling arbitrary CUDA and PyTorch code requires an incredibly complicated compiler. Third-party AI chips (AMD, Intel, AWS, etc.) have together spent billions on software to little avail.

But since Sohu only runs transformers, we only need to write software for transformers!

Most companies running open-source or internal models use a transformer-specific inference library like TensorRT-LLM, vLLM, or HuggingFace’s TGI. These frameworks are very rigid – while you can tweak model hyperparameters, changing the underlying model code is not really supported. But this is fine – since all transformer models are so similar (even text/image/video ones), tweaking the hyperparameters is all you really need.

2. Lots More on What’s Going On in Iran’s Markets (Transcript Here) – Tracy Alloway, Joe Weisenthal, and Maciej Wojtal

Maciej: If I can just comment on one thing, because the way you introduced Iran is the perfect way to show the country. It’s the size of Turkey in terms of population and actually geographical size as well. But if you compare the economy of Turkey and Iran, it’s around five times smaller. So Iran is around five times smaller and if you look at the composition of the economy, Turkey has no natural resources, so they have to import the whole energy commodities they consume. So Iran has a similar size of potential non-commodity GDP that it could grow to, from the current let’s say $250 billion to $1.1 trillion GDP that Turkey has. But also on top of this, has resources that are actually – if you combine gas and oil, they are bigger than Saudi Arabia’s and Saudi Arabia is another one, I think $1.3 trillion economy. This is a good way to just frame Iran as to show Iran, as it’s a big country that should really be having much bigger economy. Because of sanctions, various reasons and so on, it’s been underdeveloped. But the scale of this underdevelopment is like 10x.

Tracy: And because of the sanctions we can’t actually go and look up what’s happening in Tehran’s stock market. So why don’t you give us an overview of what it’s been like for the past week given geopolitical events?

Maciej: So for the past week it was difficult for everyone to check what was going on in Iran because internet was shut down basically. I could communicate with my team on the ground in Tehran once a day when they had signal and sometimes it was WhatsApp that was working, sometimes Telegram. But it was maybe once or twice per day. So what was going on in the market was simply nothing. The stock market hasn’t opened. The exchange of fire between Iran and Israel happened on a Friday, which is weekend in Iran, and then on the following Saturday there was an important religious holiday, so the market and actually the whole economy was supposed to be closed anyway. The economic activity, the market, was supposed to resume on a Sunday but they didn’t open. So the stock market, pretty much most of the currency market, has been closed for the last two weeks…

…Maciej: For example right now, when you have the stock market, the currency market were shut down, but you could track what’s going on with the exchange rate of the Iranian rial versus dollar either on Telegram chats but also on cryptocurrency exchanges. You have liquid market on stable coins versus Iranian rial inside of Iran where liquidity was limited during the last period anyway, but we could see the changes. So we knew that $1 before the war was at around 830,000 rials, then it went up roughly 15% to 950,000 and now after the ceasefire, it’s back down at 850,000. You can track the market, you can actually make transactions depending on the vol, on the liquidity, but it is possible. To be honest, when I saw those exchange rates moves 15% when you have a war where a lot of commentators were saying that this could turn into a massive worldwide conflict, that 15% in a country like Iran I would say that this is your usual volatility on the currency market…

…Maciej: In Tehran, a lot of residents were just relocating out of Tehran. Tehran is a big city, 12 million people, and they were moving mainly north to some smaller cities by the Caspian Sea. You had massive congestion. People were spending hours in traffic jams trying to get out of Tehran. There was not enough petrol on gas stations just because of this peak in demand. You had some petrol rationing.

Then I was asking them is the economy working, not working? Everything that was non-essential basically was closed. So you couldn’t build, buying materials or anything like this. But groceries, pharmaceuticals, gas stations, banks, this was all open and working properly with some disruptions. But those disruptions, for example, if you wanted to buy groceries in north of Iran where everyone has just relocated, you had some logistical bottlenecks. Distribution was not fast enough, so you had some shortages just for a little while. With banks, some branches were not operating at 100% capacity. Two banks got hacked. You had some cyber attacks on two banks in Iran and one cryptocurrency exchange. The rest of the banking sector was working without any disruptions. You could get cash from any ATM. There were no problems like these…

…Maciej: It’s interesting because there is information, very up-to-date detailed information on Iranian stocks available in Iran. But the majority of this information is not accessible if you’re trying to access it from a computer with your IP address outside of Iran. A lot of this information is restricted to Iran IP only. You cannot find anywhere on the whole internet. There is no website that shows the stock market index in dollars. When we send it out to our investors or just people who want to read news about the stock market in Iran, we are the only source of this information. This is quite amazing. It’s a country of 90 million people and stock market with 700 companies and there is no single place in internet that would show you the only important index…

…Maciej: In terms of oil, it is not really publicly traded. There is one Iranian monopoly called National Iranian Oil Corporation or company that is responsible for production. I think this is all centralized in one company and this is held by the government so it’s not publicly listed. You have some exposure to oil through oil refineries that are listed but refineries, they’re not sensitive to the price of oil. They are sensitive to the crack spread, which defines their refining margin, so they are not really a proxy to oil prices.

The whole stock market actually is well diversified. You have large sectors such as chemicals – mainly these are like petrochemicals – companies that produce different products, different versions, use natural gas that is in in large supply as a cheap commodity and produce fertilizers or products like this. This is probably 20% of the stock market.

Then you have steel companies. The largest steel company in the Middle East is in Iran. You have car makers that produce more than 1 million cars a year. With car manufacturers, you have all the related industries, suppliers, to the car manufacturing businesses. You have banks – financials is an important sector – plus some consumer exposure, some building materials, cement companies are one of the best performers over the last few years actually…

…Maciej: I’ll get back to the potential for GDP in a moment. But the catalyst is absolutely clear. It must be the opening up of Iran as a country, and opening up of the economy, andthe US sanctions lifted. There must be an agreement between the US and Iran. What needs to happen? Some sort of political change. Political attitude must change on both sides. But to be honest, many analysts were expecting some big dramatic event that needs to happen in Iran for the country to properly open up.

When you look at Iran right now and you compare to let’s say even a few years ago when you had negotiations with the US, what were the biggest problem was, it was always about two things: (1) Iran enriching uranium too much basically, at a wrong level, and (2) Iranian regional policies, so financing proxies from Hezbollah to Hamas, Assad in Syria and so on. These two things were always the problem that they couldn’t negotiate over. When you look at it right now, to a large extent both obstacles are gone…

…Joe: Are there tech companies that trade on the Tehran stock market?

Maciej: There are tech companies. The ones that are listed are related to enterprise software, the Oracle or SAP, German SAP. But you have privately held companies that would like to IPO but they are just waiting for the approval from the regulator, and these are quite amazing companies. You have Snapp, which is like an Uber, but Snapp has more rides in Tehran than Uber in any city in the world. It’s a really world-class company. You have Digikala, which is like Amazon basically, also a large company, one of the biggest success stories.

3. Stablecoins might revolutionise payments, but what if they don’t? – Bryce Elder

That leaves payments:

While in a theoretical tokenized/blockchain based world, stablecoin-based payments would be faster, more efficient and interoperable, in practice at the moment these stablecoin based payments mostly start and finish with fiat, thus requiring on/off-ramps. This on/off ramp requirement adds significant friction/cost to the use of stablecoins for payments, making it less attractive compared to traditional financial systems, in particular if one takes into account the emergence of faster payment rails in the traditional financial system via fintech advancements in recent years. As a result, we find rather unrealistic the expectation of a massive increase in the use of stablecoins in payments. Indeed, our colleagues in US short-term rates research also note that market participants at the front end are skeptical of significant growth in the near term, in part due to the fact that the infrastructure/ecosystem for stablecoins remains underdeveloped. But even if one adopts an optimistic view and assumes, for example, a tenfold increase in the use of stablecoins in payments over the next couple of years, the stablecoin universe would only expand by $15bn x 10 = $150bn.

Stablecoin optimists point to the rapid adoption of the e-CNY, China’s central bank digital yuan, which has grown to a more than Rmb300bn market cap from Rmb13.6bn at the end of 2022. There’s no comparison, JPMorgan says:

First, the digital yuan is a central bank liability and thus it effectively replaces banknotes in circulation. While there does not appear to be a published target share of M0, there have been suggestions that a 10-15% share of M0 is a plausible medium-term goal, which would imply around RMB 1.3-2tr using current M0 levels. By contrast, stablecoins are a form of a tokenized MMF with zero interest, effectively a private sector liability rather than a central bank liability.

Second, the digital yuan does not operate through a fully decentralized blockchain-based ledger. Instead, it operates via a centralized network supervised by the PBoC and competes with other mobile/ electronic payment options in China such as Alipay and WeChat Pay.

Then is it better to think of stablecoins as global equivalents to Alipay and WeChat Pay? JPMorgan says no. Fintech payment companies offering collateralised electronic private money on their own platforms hasn’t proven the need for public blockchains; if anything, it proves the opposite:

Alipay/WeChat Pay digital money are private liabilities and are perhaps more similar to bank deposits in that regard which are also private liabilities. The difference between bank deposits and Alipay/WeChat balances is that the latter are backed by reserve funds that in turn hold public liabilities i.e. central bank reserves, while bank deposits are matched on the asset side by a mix of loans and debt securities, though they do have an additional guarantee via deposit protection arrangements.

In our mind, the strong expansion of Alipay and WeChat Pay should be viewed through the lens of a fintech payments revolution over the past decade in China that utilizes and increases the efficiency of traditional banking/financial system networks, rather than through the lens of a blockchain/crypto ecosystem revolution. In fact, it could be argued that the success and continued advancements in payments by fintechs, such as Alipay and WeChat Pay reduce the need for blockchain-based payment systems in the future.

4. Meet Project Rainier, Amazon’s one-of-a-kind machine ushering in the next generation of AI – Kirsteen Rodger

Project Rainier is designed as a massive “EC2 UltraCluster of Trainium2 UltraServers.” The first part refers to Amazon Elastic Compute Cloud (EC2), an AWS service that lets customers rent virtual computers in the cloud rather than buying and maintaining their own physical servers.

The more interesting bit is Trainium2, a custom-designed AWS computer chip built specifically for training AI systems. Unlike the general-purpose chips in your laptop or phone, Trainium2 is specialized for processing the enormous amounts of data required to teach AI models how to complete all manner of different and increasingly complex tasks—fast.

To put the power of Trainium2 in context: a single chip is capable of completing trillions of calculations a second. If, understandably, that’s a little hard to visualize: consider that it would take one person more than 31,700 years to count to one trillion. A task that would require millennia for a human to complete can be done in the blink of an eye with Trainium2…

…Traditionally, servers in a data center operate independently. If and when they need to share information, that data has to travel through external network switches. This introduces latency (i.e, delay), which is not ideal at such large scale.

AWS’s answer to this problem is the UltraServer. A new type of compute solution, an UltraServer combines four physical Trainium2 servers, each with 16 Trainium2 chips. They communicate via specialized high-speed connections called “NeuronLinks.” Identifiable by their distinctive blue cables, NeuronLinks are like dedicated express lanes, allowing data to move much faster within the system and significantly accelerating complex calculations across all 64 chips.

When you connect tens of thousands of these UltraServers and point them all at the same problem, you get Project Rainier—a mega “UltraCluster.”…

…Communication between components happens at two critical levels: the NeuronLinks provide high-bandwidth connections within UltraServers, while Elastic Fabric Adapter (EFA) networking technology (identified by its yellow cables) connects UltraServers inside and across data centers. This two-tier approach maximizes speed where it’s most needed while maintaining the flexibility to scale across multiple data center buildings.

5. OpenAI has started to form a “moat” – Rihard Jarc

I think anyone who follows the AI space knows about OpenAI and, more specifically, about ChatGPT. Even outside of investors and tech enthusiasts, the verb ChatGPT has gone viral, similar to how the verb Google started. What is even more surprising is that despite ChatGPT being out there for more than 2 years already, just recently, at the end of March, it came to another acceleration point in terms of adoption when the Ghibli photo trend emerged on ChatGPT:

The number of MAUs doubled from 400 million to 800 million in a matter of a few weeks. Looking at the adoption curves of other highly adopted technology platforms, such as TikTok, Facebook, Instagram; ChatGPT, is on a slope of its own.

Another factor to consider is that it is not just a “I must try it moment”. Looking at the number of minutes a user spends on ChatGPT, the minutes are constantly growing and have now reached the 29-minute daily mark.

Remember that at the start of ChatGPT and LLMs, many critics said that people tried it, had fun, and then didn’t use it again. This trend shows that that is not the case and that with each enhanced model version and UX improvement, the stickiness factor becomes bigger…

…OpenAI also now has serious hardware ambitions. In late May of this year, they acquired Jony Ive’s startup, a famous former Apple designer, for nearly $6.5 billion, who will now lead OpenAI’s hardware efforts. What is now almost a consensus opinion among big tech leaders is that AI will unlock the next computing platform, one that is not tied to the smartphone.

And if you listen to those conversations, everyone is calling for a similar device. A device that will be more like a companion system and will be less dependent on a screen. Proactive assistant who will run even when you don’t ask it.

Disclaimer: The Good Investors is the personal investing blog of two simple guys who are passionate about educating Singaporeans about stock market investing. By using this Site, you specifically agree that none of the information provided constitutes financial, investment, or other professional advice. It is only intended to provide education. Speak with a professional before making important decisions about your money, your professional life, or even your personal life. We currently have a vested interest in Alphabet (the company behind AlphaFold), Amazon (the company behind AWS), Apple, Meta Platforms (the company behind Facebook and Instagram), and Tencent (the company behind WeChat Pay). Holdings are subject to change at any time.

Ser Jing & Jeremy

thegoodinvestors@gmail.com

What We’re Reading (Week Ending 06 July 2025)