I have a growing conviction that 2026 will be the year of multi modal AI. There are a handful of trends all coming together at the same time that are set to converge. Multi-modal models get good enough. Inference is getting cheaper and faster (cost curve is important). And the real world starts showing up as first class input. I really believe AI will stop predominantly living in text boxes and instead in places humans actually are. For the last few years, AI has been overwhelmingly text first, and for good reason. Text was the fastest path to usefulness. It was easy to collect, easy to tokenize, relatively cheap to serve, and generally didn’t have the same latency requirements. If you were building an AI product in 2023 or 2024, starting with text was the rational choice. But text always