GPT-4o, AI overviews and our multimodal future
 

Watch the episode
A graphic with a grid background and a stylized flowchart in pink and blue.
Episode 3:GPT-4o, AI overviews and our multimodal future

In Episode 3 of Mixture of Experts, host Tim Hwang is joined by Shobhit Varshney, Chris Hay and Bryan Casey for the OpenAI vs. Google showdown. Shobhit analyzes the showcase demos released by OpenAI and Google. Chris breaks down latency and cost in relation to GPT-4 and Gemini 1.5 Flash. Finally, after years of people proclaiming the death of search, Bryan answers the big question: Are large language models (LLMs) forcing the death of Google Search?

Key takeaways:

  • 0:00 Intro 
  • 3:13 The rise of multimodality
  • 16:54 Collapsing latency and cost
  • 30:12 LLMs eat Google Search

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

📩 Sign up for a monthly newsletter for AI updates from IBM.

Episode transcript

Tim Hong: Hello and welcome to Mixture of Experts. I’m your host Tim Hong. Each week Mixture of Experts brings together a world-class team of researchers, product experts, engineers, and more to debate and distill down the biggest news of the week in AI. Today on the show, the OpenAI and Google showdown of the week: who’s up, who’s down, who’s cool, who’s cringe, what matters, and what was just hype. We’re going to talk about the huge wave of announcements coming out of both companies this week and what it means for the industry as a whole. So for panelists today on the show, I’m supported by an incredible panel, two veterans who have joined the show before and a new contestant has joined the ring. So first off, Shobhit Varshney, he’s the senior partner Consulting for AI in US Canada and LatAm. Shobhit, welcome back to the show.

Shobhit Varshney: Thanks for having me back, Tim. Love this.

Tim Hong: Yeah, definitely glad to have you here. Chris Hey, who is a distinguished engineer and the CTO of customer transformation. Chris, welcome back.

Chris Hey: Hey, nice to be back.

Tim Hong: Yeah, glad to have you back. And joining us for the first time is Brian Casey, who is the director of digital marketing, who has promised a 90-minute monologue on AI and search summaries, which I don’t know if we’re going to get to, but we’re going to have him have a say. Brian, welcome to the show. We’ll have to suffer through Shobhit and Chris for a little bit and then we’ll get to the monologue. But thank you for stuff. Yeah, exactly, exactly. Um, well great, well let’s just go ahead and jump right into it. So obviously there were a huge number of announcements this week. OpenAI came out of the gate with its kind of raft of announcements. Google IO is going on and they did their set of announcements. And so really more things I think were debuted, promised coming out than we’re going to have the chance to cover on this episode. But sort of from my point of view, and I think I wanted to use this as a way of organizing the episode, there were kind of three big themes coming out of Google and OpenAI this week. We sort of take in turn and use to kind of make sense of everything. So I think the first thing is multimodality. Right? Both companies are sort of obsessed with their models taking video input and being able to make sense of it and going from, you know, image to audio, text to audio. And I want to talk a little bit about that. Second thing is latency and costs, right? Everybody touted the fact that their models are going to be cheaper and they’re going to be way faster, right? And, you know, I think if you’re from the outside, you might say, well, it’s kind of a difference in kind, things get faster and cheaper, but I think what’s happening here really potentially might have a huge impact on downstream uses of AI. And so I want to talk a little bit about that dimension and sort of what it means. And then finally, I’ve already kind of previewed a little bit. Google made this big announcement that I think is almost literally going to be like many people’s very first experience with LLMs in full production. Google basically announced that going forwards, the US market and then globally, those users of Google search will start seeing AI summaries at the top of each of their sort of search results. That’s a huge change. We’re going to talk a little bit about what that means and if it’s good, I think is a really big question. So looking forward to diving into it all. So let’s talk a little bit about multimodal first. So there’s two showcase demos from Google and OpenAI and I think both of them kind of roughly got at the same thing which is that in the future you’re going to open up your phone, you’re going to turn on your camera and then you can wave your camera around and your AI will basically be responding in real time. And so Shobhit, I want to bring you in because you were the one who kind of flagged this being like, we should really talk about this because I think the big question that I’m sort of left with is like, you know, where do we think this is all going, right? It’s a really cool feature, but like what kind of products do we think it’s really going to unlock and maybe we’ll start there, but I’m sure I mean this topic goes into all different places, so I’ll give you the floor to start.

Shobhit Varshney: So Monday and Tuesday were just phenomenal inflection points for the industry altogether. It’s getting to a point where an AI can make sense of all these different modalities. It’s an insanely tough problem. We’ve been at this for a while, we’ve not gotten it right. We spent all this time trying to create pipelines to do each of these speech to text and understand and then text. It takes a while to get all of the processing done. The fact that in 2024 we are able to do this, what a time to be alive, man. I just feel that we are getting finally getting to a point where your phone becomes an extension of your eyes, of your listening in and stuff like that, right? And that is a that has a profound impact on some of the workflows in our daily lives. Now, with an IBM, I focus a lot more on enterprises, so I’ll give you more of an enterprise a view of how these technologies are actually going to make a make a difference or not. In both cases, Gemini and OpenAI’s 4o, and by the way, in my case, 4o does not stand for Omni. Omni for me 4o means oh my God, it was really, really that good. So we’re getting to a point where there are certain workflows that we do with enterprises like you are looking at transferring knowledge from one person to the other and usually you’re looking at a screen and you have a bunch of here is what I did, how I solved for it. Yeah, we used to spend a lot of time trying to capture all of that and what happened in the desktop classic BP processes, these are billions of dollars of work that happens, right? Yeah.

Tim Hong: And I think I pause you there like I’m curious if you can explain because again, this is not my world, I’m sure a lot of listeners aren’t, it isn’t their world as well. How did it used to be done, right? Like so if you’re trying to like automate a bunch of these workflows, is it just people writing scripts for every single task or like I’m just kind of curious about what it looks like?

Shobhit Varshney: Yeah, so Tim, let’s let’s pick a more concrete example. Say you have outsourcing a particular piece of work and your finance documents coming in, you’re comparing it against other things, you’re finding errors, you’re going to go back and send the email, things of that nature, right? So we used to spend a lot of time documenting the current process and then we look at that 729 step process and say, I’m going to call an API, I’m going to write some scripts and all kinds of issues used to happen along the way, unhappy PA so and so forth. So the whole process used to be codified in some some level of code and then it’s deterministic. It does one thing in a particular flow really well and you cannot interrupt it, you can’t just barge in and say, no, no, no, this is not what I wanted, can you do something else? So we’re now finally getting to a point where that knowledge work that work that used to get done in a process that will start getting automated significantly with announcements from both Google and OpenAI. So far people would solve it as a decision step-by-step flowchart, but now we’re at paradigm shift where I can in the middle of it interrupt and I can say, hey, see what’s on my desktop and figure it out. I’ve been playing I’ve been playing around with with OpenAI’s 4o, its ability to go look at a video of a screen and things of that nature, it’s pretty outstanding. We are coming to a point where the the speed at which the inference is happening is so quick that now you can physically we can actually bring them into your workflows early. It was just take so long, it was very clunky, it was very expensive, so you couldn’t really justify adding AI into those workflows. It’ll be you do liver arbitrage or things of that nature versus trying automated. So these kind of workflows infusing AI in doing this entire process into a phenomenal unlock. One of my clients is a big CBG company and as we walk into the aisles, they do things like planograms where you’re looking at a picture of the of shelf and these consumer product goods companies would give you us a particular format in which you want to keep different chips and drains and so on so forth and each of those labels are turned around or they are in different place. You have to audit and say, am I placing things on the sh the right way like the consumer product goods wanted to, that’s called planogram, real planogram idea here. So earlier we used to take pictures, a human would go in and note things and say, yes, I have enough of the bottles in the right order. Then we started to take pictures and analyzing it. You start to run into real world issues. You don’t have enough space to back up and take a picture or you go to the next Isis and the lighting is very different and stuff like that. So AI never quite scaled and this is the first time now we’re looking at models like Gemini and others where I can just walk past it and as create a video and just feed the whole 5 minute video in with this context length of 2 million plus and stuff, it can actually ingest it all number do missing. Right? So those those kind of things that were very, very difficult to do for us earlier, those are becoming a piece of cake. The big question here is how do I make sure that the AI phenomenal stuff that we seeing is grounded in enterprise, so it’s my data, my planogram style or my processes, my documents, not getting knowledge from elsewhere. So in all the demos, one of the things that I was missing was how do I make it go down a particular path that I want, right? If the answer is not quite right, how do I control it? So I think a lot more around how do I bring this to my enterprise clients and deliver value for them, those some of the open questions. Chris, I totally, I do want to get into that. I see Chris coming off mute though, so I don’t want to break his role. I don’t know if Chris and you got kind of a view on this or if you disagree, you’re like, ah, it’s actually not that impressive.

Chris Hey: Google Glasses back, baby. Yeah, yeah. No, so I, I think I think multimodality is a huge thing and Shobhit covered it correctly, right? There’s so many use cases in the enterprise but also in consumer based scenarios. And I think one of the things we really need to think about is we’ve been working with LLMs for so long now, which has been great, but the 2D tech space isn’t enough for generative AI. It’s it’s we want to be able to interact real time, we want to be able to interact with audio. You know, and you can take that to things like contact centers where you want to be able to transcribe that audio, you want to then have AIs be able to respond back in a human way and you want to chat with the assistants like like you saw on the OpenAI demo. You know, you don’t want to be sitting there, go, well, you know, my conversation is going to be as fast as my fingers can type. You want to be able to say, hey, you know what do you think about this, what about that? And you want to imagine new scenarios. So you want to say what what does this model look like, what does this image look like? You know, tell me what this is and you want to be able to interact with the world around you and to be able to do that you need multimodal models. And and therefore, like in the Google demo where, you know, yeah, she picked up the glasses again, you know, so I jokingly said Google Glasses back, but but it really is, it’s if you’re going and having a shopping experience retail and you want to be able to look at what the price of a mobile phone is for example, you’re not going to want to stop, get your phone out, type, type, type, you just want to be able to interact with an assistant there and then or see in your glasses what the price is. And I give the mobile phone example for a reason, which is the price that I pay for a mobile phone isn’t the same price as you would pay, right? Because it’s all contract rates. And if I go and speak if I want to get the price of how much am I paying for that phone, it takes an advisor like 20 minutes because they have to go look up your contract details, et cetera, they have to look up what the phone is and then they do a deal. Mhm. In a world of multimodality where you’ve got something like glasses on, it can recognize the object, it knows who you are and then it can go and look up what what the price of the phone is for you and then be able to answer questions that are not generic questions but specific about you, your contract to you, right? Exactly, that that is where multimodality is going to start start to come in, kind of sounds like, right?

Tim Hong: Yeah, totally. I mean, Chris, if I have you right, I mean, this is one of the questions I want to pitch to both you Shobhit and you Chris on this is, you know, actually my mind goes directly back to Google Glass like the the bar where the guy got beat up for wearing Google Glass years ago that was like around the corner from where I used to live at San in San Francisco. Oh wow. And, you know, there’s just been this dream and obviously all the OpenAI demos and Google demos for that matter are all very consumer, right? That you’re walking around with your glasses and you’re looking around the world and, you know, get prices and that kind of thing. This been like a long-standing Silicon Valley dream and it’s been very hard to achieve and I guess one thing I want to run by you is like and the answer might just be both or we don’t know is like if you’re more bullish on the B2B side or on the B2C side, right? Because I hear what Shobhit’s saying and I’m like, oh, okay, I can see why enterprises really get a huge bonus from this sort of thing. Um, and and I guess it’s really funny to me because I think there’s one point of view which is everybody’s talking about the consumer use case but the actual near-term impact may actually be more on the enterprise side but I don’t know if you guys buy that or if you really are like this is the era of Google Glass, you know, it’s it’s back baby?

Shobhit Varshney: So so I can start first, Tim. We have been work with Apple Vision quite a bit with an IBM with our clients and a lot of those are enterprise use cases in a very controlled environment. So things that where things break in the consumer world, you don’t have a controlled environment, you have corner cases that happen a lot, right? In an enterprise setting, if I’m help if I’m wearing my my Vision Pros for two hours at a stretch, doing I’m a mechanic, I’m fixing things, right? That’s a place where I need additional input and I can’t go look at other things like pick up my cell phone and work on it. I’m underneath, I’m I’m fixing something in the middle of it, right? Those use cases because the environment is very controlled, I can do AI with higher accuracy, it’s reputable, I know I can start trusting the answers because I have enough data coming out from it, right? So you’re not trying to solve every problem, but I think we’ll see a higher uptake of these devices by the way. I love the the Ray-Ban glasses from Meta as well, great great to do something quick, but when you don’t want to switch, but I think we we have moving to a point where enterprises will go deliver these at scale. The tech starts to get better and adoption is going to come over on the B2C sign, but in the consumer goods, we’ll have multiple attempts at this like we had with Google Glasses and stuff. It’ll take a few attempts to get better on the enterprise side, we will learn and make the models a lot better, but I think there’s insane amount of value that we’re delivering to our clients with Apple Vision Pro today in enterprise settings. I think it’s going to follow that problem.

Tim Hong: Totally, yeah, and it’s actually interesting, I hadn’t really thought about this in Chris in is like, um, basically like the phone is almost not as big of competition in the enterprise setting, right? Whereas like the example that Chris gave was like literally you’re trying to be like, is this multim modal device faster than using my phone in that interaction which is like a real competition, but if it’s something like a mechanic, you know, they don’t have they don’t they can’t just pull out their phone. Um, Chris, any final thoughts on this and then I want to move us to our next topic?

Chris Hey: Yeah, and I was just going to give another kind of use case scenario. I often think of things like the oil rigs exam example, so a real sort of enterprise space where you’re wandering around and you have to go and do safety checks on various things. And most of their time, if you think of the days before the mobile phone or before the tablet, what they would have to do is go look at the part, do the inspection, the visual inspection and then walk back to a PC to go fill that in. And then these days you do that with a tablet on the rig, right? But but then actually you need to find a component, you’re going to look at, you have to do the defect analysis, you want to be able to take pictures of that, you need the geo-location of where that part is so that the next person can find it and then you want to be able to see the notes that they had before on this and then you got to fill in the safety form, right? So they have to fill in a ton of forms. So there’s a whole set of information if you just think about AI just having, you know, even your phone or glasses, pick either to be able to look at that part, be able to have the notes contextualized in that geospatial space, be able to fill in that form, be able to do an analysis with AI, it’s it’s got a huge impact on enterprise cases and probably multimodality in that sense has probably got a bigger impact, I would say, in the enterprise cases than the consumer spaces even today. And I and I think that’s something we really need to think about. The other one is and again, I know you wanted this to be quick there, Tim, is the clue and generative AI is the generative part, right? So actually I can create images, I can create audio, I can create music, things that don’t exist today. So and with the text part of something like an LLM, then I can create new creative stuff. I can create develops pipelines, Docker files, whatever. So there comes a part where I want to visualize the thing that I create. I don’t want to be copying and pasting from one system to another, right? That’s not any different from the oil rig scenario. So as I start to imagine new new business processes, new pipelines, new tech processes, I then want to be able to have the real-time visualization of that at the same time or be able to interact with that and that’s why multimodality is is really important, probably more so in the enterprise space.

Tim Hong: Yeah, that’s right. I mean, I think some of the experiments you’re seeing with like dynamic visualization generation are just like very cool, right? Because then you basically have you can say like, here’s how I want to interact with the data, the system kind of just generates it right on the fly, which I think is very, very exciting. All right, so next up I want to talk about latency and cost. So this is another big trend. You know, I think it was very interesting that both companies went out of their way to be like, we’ve got this offering and it’s way cheaper for everybody, which I think suggests to me that, you know, these big huge competitors in AI all recognize that like your your per token cost is going to be this huge bar to getting the technology more distributed. So certainly one of the ways they sold 4o was that it was cheaper and as good as GPT, right? Everybody was kind of like, okay, well, why do I pay for pro anymore if I’m just going to get this for for free? And then Google’s bid, of course, was Gemini 1.5 flash, right? Which is, okay, it’s going to be cheaper and faster again. And I know Chris, you threw this sort of topic out, so I’ll kind of let you have the first say, but I think the main question I’m left with is like, what are the downstream impacts of this right? For someone who’s not really paying attention to AI very closely, like is this just matter of like it’s getting cheaper or do you think like these are actually these economics are kind of changing how the technology is actually going to be rolled out?

Chris Hey: I think latency and smaller models and tokens are probably one of the most interesting challenges we have today. So if you think about like the GPT 4 and everybody was talking like, oh, that’s a 1.8 trillion model or whatever it is, that’s great, but the problem with these large models is every layer that you have in the neural network is adding time to get a response back and not not only time but cost. So if you look at the demo that OpenAI did for example, what was really cool about that demo was the fact that when you were speaking to the assistant, it was answering pretty much instantly, right? And that is the real important part. And when we look at previous demos, what you would have to do if you were having a voice interaction is you’d be stitching together kind of three different pipelines. You need to do Speech to Text, then you’re going to run that through the model and then you’re going to do text to speech back way. So you’re getting latency, latency, latency before you you get a response and that timing that it would take because it’s not in the sort of 300 millisecond mark, it was too long for a human being to be able to interact. So you got this massive pause. So actually latency and the kind of tokens per second becomes the most important thing if you want to be able to interact with models quickly and be able to have those conversations. And that’s sort of why also multimodality is really important because if I can do this in one model as well, then it means that I’m not sort of jumping pipelines all the time. So the smaller you can make the model, the faster it’s going to be. Now if you look at the GPT 4 on the model, I don’t know if you’ve played with just a text mode, it is lightning fast when it comes back.

Tim Hong: Very fast now, yeah.

Chris Hey: It’s and noticeably so like it’s just like it feels like every time I’m in there’s like these improvements, right? So and and this is what you’re doing, you’re sort of trading off reasoning versus speed of the model, right? And and as we move into kind of agentic platforms, as we move into multimodality, you need that latency to be super super sharp because you’re not going to be waiting all the time. So there is going to be scenarios where you want to move back to a bigger model that is fine. Um but you’re going to be paying the cost and that cost is going to be the cost uh the price of the tokens in the first place but also the speed of the response. And I think this is the push and pull that model creators are going to be playing against all of the time. And and and therefore if you can get a similar result from a smaller model and you can get a similar result from a faster model and a cheaper model, then you’re going to go for that. But in those cases where it’s not, then you may need to go to the larger model to kind of reason. So this this is really important.

Tim Hong: Totally, yeah, I think there’s a bunch of things to say there. I mean, I think one thing that you’ve pointed out clearly is that like this makes conversation possible, right? Like that you and I can have a conversation in part because I have low latency is kind of the way to think about it. And like now that we’re reaching kind of human like parity on latency, you know, finally these models can kind of converse in a certain way. The other one is actually I really thought about that there is kind of this almost like thinking fast and slow thing where basically like the models can be faster but they’re just not as good at reasoning. And then there’s kind of this like deep thinking mode which actually is like slower in some ways.

Shobhit Varshney: So Tim, the way we are helping enterprise clients again, have that kind of focus in in life, there’s a split. There’s a there’s there are two ways of looking at applying Gen AI in the industry right now. One is at the use case level. You’re looking at the whole workflow into to end, seven different steps. The other is going and looking at it at a subtask level, right? So I’ll just take pick an example, I’ll walk you through it. So say I have an invoice that comes in and I’m taking an application, I’m pulling something out of it, I’m making sure that that’s as for the contract, I’m going to send you an email saying your voice is paid, right? So some sort of a flow like that, right? So say it is seven steps just very simplified, right? I’m going to P things from the backend systems using APIs, step number three. I’m going to go call a fraud detection model that has been working great for three years, step number four. I’m extracting things from a paper, right? An invoice that came in, that extraction I used to be doing with OCR, 85% accuracy. Humans will do the overflow of it. At that point, we’re taking a pause and saying, we have reason to believe that LLMs today can look at an image and extract this with higher accuracy. Yeah, say we get up to 94%, so that’s nine points higher accuracy of pulling things out. So we pause at that point and say, let’s create a set of constraints for step number four to find the right athletes. And the constraint could be what’s the latency like we just spoke, how quickly I need the result or can this take 30 seconds and I’ll be okay with it? Second could be around cost. If I’m doing this a thousand times, I have a cost envelope to work with versus a human doing. If I’m doing it a million times, I can invest a little bit more if I can get accuracy out of it, right? So the ROI becomes important. Then you’re looking at security constraints around, does this data have any identified PHI data, PII data that really can’t leave the cloud, I have to bring things closer or is this something that is military grade secrets and has to be on prem, right? So have certain constraints around that. So you come up with a list of five, six constraints and then that lets you decide whether what kind of an LLM will actually check off all these different constraints and then you you start comparing and bringing it in. So the split that we seeing in the market is one way with LLM agents and with these multimodal models, they’re trying to accomplish the entire flow work for end to end like you saw with Google’s returning the shoes, right? It’s taking an image of it, is going and looking at your Gmail to find the receipt, starting the return, giving your a QR code with the whole return process done, so just figured out how to go create the entire end-to-end workflow. But where the enterprises are still focused is more on the subtask level. That point, we are saying this step, step number four is worth switching and I have enough evals before and after, I have enough metrics to understand and I can control that. I can audit that much better. The thing that from an enterprise perspective, these end to end multimodal models, it’ll be difficult for us to explain to SEC, for example, why we rejected somebody’s benefits on a credit card, things of that nature. So I think in the in the enterprise world, we’re going to go down the path of let me define the process, I’m going to pick small models to Chris’s point to do that piece better and then eventually start moving over to hey, now let me make sure that those that framework evals and all of that stuff can be applied to intoing multim models.

Tim Hong: I guess I do want to maybe bring in Brian here. You like release the Brian on this conversation. Um, because I’m curious about like kind of like the marketer view on all this, right? Because I think there’s one point of view which is yes, yes, Chris, Shobhit, like this is all nerd stuff, right? Like I, yeah, no, it’s like latency and cost and speed and whatever. The big thing is that you can actually talk to these AIs, right? And I guess I’m kind of curious from your point of view about like, I mean, one really big thing that came out of like the OpenAI announcements was we’re going to use this latency thing largely to kind of create this feature that just feels a lot more human and lifelike than, you know, typing and chatting within AI. And I guess I’m kind of curious about like, you know, what you think about that move, right? Like is that ultimately like going to help the adoption of AI? Is it just kind of like a weird sci-fi thing that OpenAI wants to do? And also, I mean, I think if if you’ve got any thoughts on, you know, how it impacts the enterprise as well, was just like, do companies suddenly say, oh, I understand this now, right? It’s because it’s like the AI from her, I can buy this. Um, just kind of interesting thinking about like the the sort of surface part of this because it actually will really have a big impact on the market as well. It’s kind of like the technical advances are driving the the marketing of this.

Brian Casey: I I mean, I do think when you when you look at like some of the initial reviews of I want to say like the Pin and Rabbit, like I remember one of the one of the scenarios that was being demoed was I think I think he was looking at a car and he was asking a question about it and the whole interaction took like 20 seconds there. And he went through, he was just showing that he could do the whole thing on his phone in the same amount of time. But the thing that I was thinking about when I was watching that was like, he just did like 50 steps on his phone. That was awful. As opposed to just pushing a button and asking a question. And it was like it was very clear that the UX interaction of just like asking the question and looking at the thing was a way better experience than pushing the 50 buttons on your phone, but the 50 buttons still won just because it was faster to do 50 buttons than to, you know, deal with the latency impact of um of where we were before. And so it actually it reminded me a lot of just the way I used to hear remember hearing Spotify talk early about the way that they thought about latency and the things that they did to just make the first 15 seconds of a song land um essentially so that it felt like, you know, a like a file that you had on your device because I think from their perspective they if it felt like every time you wanted to listen to a song that was buffering as opposed to sitting on your device, you were never going to really adopt on that thing because it’s horrible experience relative to just having the file locally. And so they put in all this work so that it felt the same and that wound up being a huge part of how the technology ended up getting and the product ended up getting adopted. And, you know, I do think there’s a lot of a lot of stuff we’re doing that is almost like, I don’t want to say back office, but like just enterprise processes around how people do things, operational things. But there are plenty of ways where people are thinking about the way that we do more with like agents in terms of how that involves like customer experience, whether it’s support interactions, whether it’s like bots on the site, you can just clearly imagine that that’s going to play a bigger role in customer experience going far forward. And if you feel like every time you ask a question that you’re waiting 20 seconds to get a response from this thing, like you’re just getting the other person on the end of that interaction is just getting matter and matter and matter the entire time where the more it feels like you’re talking to person and that they’re responding to you as fast as you’re talking, I think the more likely it is that people are going to accept that as an interaction model. And so I do think that that latency and like making that feel to you like to your point about having a human beings being zero latency, I think that’s a necessary condition for a lot of these interaction models and so it’s going to be super important going forward. And to me, it’s also when I think about the Spotify thing, it’s like our people are going to do interesting things to solve for the first 15 seconds of an interaction as opposed to the F the entire interaction. Like, you know, can you get there was a lot of talk about like OpenAI model, I want to say like responding with like sure or just like some space filling entry point, um so it like it could catch up with the rest of the the dialogue. So I think it’ll I think people will prioritize that a lot because it’ll matter a lot.

Tim Hong: I love the idea that like to save to save cost basically OpenAI’s like for the first few turns of the conversation we deliver the really fast model so it feels like you’re really having like a nice flowing conversation and then basically once you build confidence they like fall back to like the slower model that has better results where you’re like, oh, this person is a good conversation list, but they’re also smart too, right? Is like kind of what they’re trying to do by kind of playing with model delivery. Um, so we got to talk about search, but Chris, I saw you go off mute so do you want to do a final quick hit on the question of latency before we move on?

Chris Hey: No, I I was just coming to come up with what Brian was saying there and and what you were saying Tim, I totally agreed. It was always doing this hey and then repeat the question. So I I wonder if underneath the hood as you say is there’s a much smaller classifier model that is just doing that hey piece and then as you say, there’s probably a slightly larger model actually analyzing the real thing. So I I do wonder if there’s two small models or a small model and a slightly larger model in between there for that interaction. So it’s super interesting and but maybe the thing I wanted to add to that is we don’t have that voice model in our hands today, we only have the text model. So I wonder once we get out of the demo environment and then maybe in a three weeks time time or whatever we have that model, whether that’s going to be super annoying every time we ask a question, it’s going to go hey and then repeat the question back. So it’s cool for a demo, but I wonder if that will actually be super annoying in two weeks time.

Tim Hong: All right, so last topic that we got a few minutes on. And this is like Brian’s big moment, so Brian get get yourself ready for this. I mean, Chris, you can get yourself ready because apparently Brian’s gonna, you know, you know, everyone else can leave the meeting, yeah, take our eyebrows off here with his with his uh with his rant. So the the setup for this is that basically Google announced that AI generated overviews will be rolling out to US users and then everybody in the near future. And I think there’s two things that to set you up Brian, I think the first one is this is what we’ve been talking about, right? Like is AI gonna replace search? Here it is, you know, here it is consuming the preeminent search engine. So I think it’s like we’re here, right? This is happening. And then the one is like I’m a little nostalgic, you know, someone who grew up with Google, um, you know, I’m like the 10 Blue Links, you know, like the search engine, you know, it’s like a big part of how I experienced and grew up with the web. And um, you know, this seems to me like kind of a big shift in how we interact with the web as a whole. And so I do want you to kind of first talk a little about what you think it means for the market, um and uh and how you think it’s going to change the economy of the web.

Brian Casey: Yeah, so I follow two communities I would say pretty closely online. I follow The Tech Community pretty closely and then I as a somebody works in marketing, I follow my SEO’s community. And they have very different reactions to what’s going on. I think your first question though of um, you know, is this the equivalent of swallowing the web? Um and from the minute what’s funny is from the minute sort of Chat GPT arrived on the scene, people were proclaiming the death of search. Now for what it’s worth, if you’ve worked in marketing or on the internet for a while, people have proclaimed the death of search as like an annual event month for the last like 25 years. And so um this is just like part for the course on on some level. But what’s interesting to me is that you had this product Chat GPT which is fastest growing consumer product ever, 100 million users faster than anybody else. And what was interesting is it sort of like speed run speedran the sort of growth cycle that usually takes years or decades, like, well, maybe not decades, but like it takes a long time for most consumer companies to do what they did. The interesting thing about that is if it was going to totally disrupt search, you would have expected it to show up and happen sooner than it would have with other products that maybe would have had a slower sort of growth trajectory. Um but that didn’t happen. Like if somebody who watches their search traffic super closely like there’s been no chaotic drop of of this. Like people have continued to use search engines and like one of the reasons I think that that happened is because people actually misunderstood like like the equivalent of like Chat GPT and Google as competitors um with one another. I know Google and OpenAI probably are on some level, but I don’t know that those two products are. And the reason I was thinking about that is like if if Chat GPT didn’t, you know, within the within basically the time plan we’ve had so far, uh disrupt Google, the question is like why why didn’t that happen? And I think you could have a couple different hypothesis for that. Like one you could say the form factor wasn’t right, it wasn’t text that was going to do it, it was we needed Scarlett Johansson on your phone and that’s the thing that’s going to do it. And so they’re maybe leaning into that thought process a little bit. You could say it was hallucinations, like oh the content is just not accurate. Uh yeah, right. So that’s a possibility around it. You could say just like learn consumer behavior, people have been using this stuff for 20 years, it’s going to take a while to get them to do something different. You could say Google’s advantages in distribution, so it’s like we’re on the phone, we got browsers, um it’s really hard to, you know, get the level of penetration that we have. I think all of those probably play some role, but my biggest belief is that it’s actually impossible to separate Google from the internet itself. Google’s kind of like the operating system for the web. So to disrupt Google, you actually are not disrupting search, you have to disrupt the internet. Um and it turns out that that’s an incredibly high bar uh to have to disrupt because you’re not only dealing with search, you’re dealing with the capabilities whether it’s banks or airlines or, you know, retail, whatever it is of every single website that sits on the opposite end of the internet. It turns out that that’s like an enormous amount of capability um that’s built up there. And so I looked at I look at that and say like for as much as like I think this this technology has brought to the table hasn’t done that thing um yet. And so because it hasn’t done that, there hasn’t been some dramatic shift there. The thing that Google search is not good at though, um and I think you see it in a little bit in terms of how they described what they think the utility of AI overviews um will be is that it’s not good complex multi-part questions of saying like if you’re trying to plan if you’re doing anything from like doing a buying decision for a large enterprise product or like planning your kids’s birthday party, like you’re going to have to do like 25 queries along the way there and you just you’ve just accepted and internalized that you have to do 25 queries. I like that is like basically like search is one shot, right? Like you just say it and then responses come back so there’s no yeah, sorry, go ahead.

Brian Casey: Yeah, yeah. And so like the way I was thinking about LLMs is they’re kind of like internet sequel in a way where you can ask this like much more complicated question and then you can actually describe the way that you want the output of that thing to look. It’s like I want to compare these three products on these three dimensions, go get me all this data. And that would have been 40 queries at one point, but now you can do it in one. And search is terrible at doing that right now, you have to go cherry-pick each one of those data points. But the interesting thing is that that’s also maybe the most valuable query to a user because you save 30 minutes. And so I think Google looks at that and says, um if we seed that particular space of complex queries to some other platform, like that’s a long-term risk for us. And then if it’s a long-term risk for them, what it ends up being is a long-term risk for the web. Um I think so I actually think it was incredibly important that Google bring this type of capability into into the web even if it ends up being disruptive a little bit from a publisher’s perspective because what it does is at least preserves some of the dynamic we have now of like the web still being an important thing and I hope that used to your point, I have like present and past nostalgia for it, I would say, yeah, exactly. So I think it’s I think it’s important that it continues to evolve if we all want the web to continue to persist as like a healthy dynamic place.

Tim Hong: Yeah, for sure. No, I think that’s a that’s a great take on it. And, you know, Google always used to say, look, we measure our success based on how fast we get you off our website, right? And I think kind of Brian, what you’re pointing out which I think is is very true is that like what they never said was there’s this whole set of queries we never surface that, you know, you really have to kind of keep keep searching for, right? And like that’s that ends up being kind of like a the the the search volume of the future that everybody wants to to capture. Um well, uh so Brian, I think we also had a little intervention from AI the thumbs up thing, we were joking about that before the show. It’s just yeah, my ranking for worst AI feature of all time. Um so um but um make up the thumbnail on the on the video, that’s right. Yeah, exactly. Um well great, so we’ve got just a few minutes left, Shobhit, but Chris, any final parting shots on this topic?

Shobhit Varshney: Sure. So I I’m very bullish, I think AI overviews have a lot of future as long as there’s a good mechanism of feedback incorporating and making it hyper personalized. A simple query like I want to go have dinner tonight, say I tell you I want looking for a Thai restaurant. Yeah, if you look if I go on on OpenTable or Yelp or Google and try to find that, there’s a particular way in which I think through it. The filters that I apply are very different from how Chris was do it, right? So the way I make a decision, if somebody’s making that decision for me, great. The reason why Tik Tok works so much better than Netflix on an average, I think I I was um listening to a video by Scott and he mentioned that we spend about 155 minutes a week browsing Netflix on an average in the US, something of that nature, like pretty exited amount of time versus Tik Tok has just completely taken that fallacy of choice out for you. When you go on Tik Tok, the video that they have pick, there’s just so many data points. The 17-second video, average 16 minutes of viewing time across your Tik Tok engagement and you have so many data points coming out of it, seven 71 of them every few seconds, right? So they have hyper personalized it based on how you interact with things, right? Because they have not not asking you to go pick a channel, a choice that nature, just showing you the next next next thing in the sequence, hence the stickiness. They’ve understood the brains of teenagers and then and that demographic really, really well. I think that’s the direction that Google will go into it. It’ll start hyper personalizing based on all the content if they’re reading and finding out where the receipt of my shoes are, they know what I actually ended up ordering at a restaurant that I went to, right? So the full feedback loop coming into the Google ecosystem, I think it’s going to be brilliant if they get to a point where they just make a prediction on which restaurant is going to work for me, everything they know about me. That’s right.

Tim Hong: Yeah, I mean, the future is they just going to book it for you and a car is going to show up and you’re going to get in, it’s going to take you some place, right? Uh so confirmation, they’ll send a confirmation from your email, exactly, right? Uh Chris, 30 seconds, you’ve got the last word, 30 seconds.

Chris Hey: Search is going to be a commodity and I think as we see the AI assistant era, I dare you. Yeah, but it will be a commodity because we are going to interact with search via these assistants. It’s going to be the s on my phone which will be enhanced by AI technology. It’s going to be Android and Gemini’s version on there. We we are not going to be interacting with Google search in the way we do today with browsers. That is going to be commoditized and we’re going to be dealing with her assistants who are going to go and fetch those queries for us. So I I think that’s going to be upended and and at the heart of that is going to be latency and multimodality as we said. So uh I think they got to pivot or they’re going to be disrupted.

Tim Hong: Yeah, I was going to say just like if that happens, what’s interesting is that all of the advantage Google has actually vanishes like and then it’s an even playing field against every other LLM which is, you know, that’s a very interesting market situation in that at that point. Yeah, I’m going to pick that up next week. That’s a very, very good topic when we should get more into it. Um great, well, we’re at time. Uh Shobhit, but Chris, uh thanks for joining us on the show again. Uh Brian, we hope to see you again sometime. Um and to all you out there in radio land, if you enjoyed what you heard, you can get us on Apple Podcasts, Spotify and podcast platforms everywhere and we’ll see you next week for Mixture X where so

Explore more episodes
Pink background
Rabbit AI hiccups, GPT-2 chatbot and OpenAI with the Financial Times
In the inaugural episode of Mixture of Experts, our AI experts debate the pros and cons of Rabbit’s R1 device. They also unpack GPT-2’s potential evolution and OpenAI’s licensing deal with the Financial Times. Finally, what do Sam Altman and Taylor Swift have in common?
Pink background
The state of open source, InspectorRAGet and Kolmogorov-Arnold Networks
Let's kick it back to the 90s with Inspector RAGet. In episode 2, the experts weigh in on the explosion of open source, KANs and RAG.
Pink background with MoE logo
Scarlett Johansson, FMTI and Think 2024
What’s going on between Scarlett Johansson and OpenAI? In episode 4, the experts address OpenAI vs. ScarJo, explain the future of FMTI, and review innovations in open source.
Stay on top of AI news with our experts

Follow us on Apple Podcasts and Spotify.

Subscribe to our playlist on YouTube