This week, Meta strikes back with the launch of Llama 3.1! In episode 13 of Mixture of Experts, join host Tim Hwang along with Chris Hay, Shobhit Varshney and Maryam Ashoori as they talk about the week’s biggest AI news and trends. Listen to them analyze the business of AI in relation to the launch of Llama 3.1, including Llama 405B. Then, hear the conversation around Mistral Large 2 and the open-source wave. Finally, the experts talk about GPT 4o-mini and the model price war. Are little models having their moment? Tune in to find out.
The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
📩 Sign up for a monthly newsletter for AI updates from IBM.
Tim Hwang: hello and happy Friday this is mixture of experts and I am your host Tim Hwang. each week mixture of experts brings together a world-class panel of technologists Engineers and more to help make sense of the title wave of news each week in AI. this week on the show we cover two big stories. first metast strikes back with the launch of llama 3.1 and Zuck is out with a brand new look. we talk about the state-of-the-art and language models in open source and we talk about the implications on the business of AI and on AI safety. this is going to be game changer for the market because it’s enabling the open source Community to start building using a very powerful model that is available to them to create to smaller models and put it back to the market. second open AI continues its string of launches with GPT 40 mini a relatively tiny wildly cheap model. we talk about the ongoing Frontier Model price war and how sustainable it is over the long run. Chris did I hear you say embedded models open AI on d
Chris Hay: evice that’s that’s the use case that I can’t solve open AI is an API I think they’ll get there.
Tim Hwang: as always I’m joined by an incredible group of panelists that will help us navigate what has been another action-packed week in AI. so today we’ve got three panelists we’ve got Shobhit Varshney senior partner Consulting AI for US Canada and ladam. we’ve got Chris Hay distinguished engineer and is the CTO of customer transformation. and finally joining us for the first time is Maryam Ashoori director of product management at Watson x. so our first story today of course is the launch of llama 3.1 um this is obviously an enormous uh technical Milestone. it’s the first time arguably that we’ve had Frontier AI models that are available in the open source and we’re going to talk all about the technical aspects of this and why you should care as a listener. um I think to just get us started though I think one of the things that uh I also loved about the announcement is that Mark Zuckerberg person Al took to Facebook to announce this and he did not only debut llama 3.1 but he also debuted like a new look. um if you remember kind of classic version Mark Zuckerberg like very pale very serious very nerdy looking you know with the hoodie. um but he’s been looking real like kind of like surf beach guy you know is kind of his new look. um so I think just to start I want to ask each of the panelists uh do you like the old Zuck look or do you like the new Zuck look better. uh Chris to you first.
Chris Hay: old Zuck new Zuck new Zuck.
Tim Hwang: goit NE is the new cool and last but not least Mario what do you think old Zuck new Zuck.
Maryam Ashoori: I’d go for new.
Tim Hwang: okay so we have bold consensus on the new Zuck. I think we’re gonna come to miss the old Zuck I I really like old nerdy Zuck. um but uh but again you know seasons change and we have to go with it. so let me just introduce the story of today. um if you haven’t been watching the news or if you’re even remotely related with AI I think you’ll know that meta has come out and launched llama 3.1. um it is the latest edition of its llama class of models. um and uniquely meta in comparison to say open AI or anthropic really is kind of pursuing open- Source AI in a very big way. and I think one really big thing that we’ve seen is the launch of the gigantic sort of llama 405b model which is a highly capable model state-of-the-art and now just available for free uh in the open source. and I think the first person I want kind of turned to was uh Mariam you were working on this on day one of the launch for Watson X and I’d love to just like for you to talk a little bit about that right because like it was really easy right you just kind of like threw it up and you know people downloaded it I’m kidding of course like I’m kind curious just like get your War story of like what that felt like to launch a model like that and there’s anything like you learned or walked away with from that experience.
Maryam Ashoori: it was actually nothing that I call Easy especially because it’s a giant model so we had to run it cross noes mult multide inferencing. this was the first time that we had such a 400 billion parameter models on our platform but um it was exciting and I’m super excited not just for our customers but also for the community. I think the amazing thing that meta did yesterday with llama 3 llama 3.51 40 five billion the name is getting longer and longer. yeah they really need to work on The Branding for all these I don’t even know what’s going on. um it’s it’s changing the license to let the market to use the model for distillation and teaching the smaller models. this is going to be game changer for the market because it’s enabling the open source Community to start building using a very powerful model that is available to them to create a smaller models and put it back to the market. so for that reason I’m super excited for the opportunity it unlocks.
Tim Hwang: yeah I think I’m I’m thrilled by it just because I think for a long time open source you know has been very exciting but arguably has not been like it’s been lagging a little bit in performance and now a suddenly there’s like what looks to be the possibility that the open source is going to be just as powerful just as exciting just as state-of-the-art in a lot of ways. can I just ask a question on behalf of like the listeners who may be less familiar with the space which is like why is meta doing this. it’s like incredibly expensive to build one of these models and if I have it right they’re like literally just giving it away for free which you know I’m I’m not a big city business guy but like I don’t even know why that works. I don’t know show you want to comment on like what is going on here like why is meta doing this and do we think they could make money on this. they’re losing money aren’t they.
Shobhit Varshney: yeah absolutely so I think we there are certain clients certain vendors like meta or Nvidia that have other sources of revenue right. what they sell by selling the bottel is going to be a rounding error as compared to what NV is going to do with Hardware. so they’re giving away the nimron with meta they all these other social properties that they make money on. so just give you a quick data point on meta itself you on the chief AI off scientist at meta absolutely incredible personality he was sharing a data point on before two years back when somebody’s posting something on Facebook and you’re trying to figure out if there’s misinformation or hate crime abuse and things of that nature the NLP Best in Class would give you about 24 25% hit on identifying that as some bad content. now with the Llama models they’re getting close to 92 94%. so they’re able to do a lot more good and filtering with their models and they’re able to to use that to enhance their own products. right when you’re experiencing Instagram or Whatsapp or or Facebook they’re embedding AI that is now being helped by the entire crowdsourcing of everybody else. right so they have different Avenues of Revenue so for them this is not a lost leader product so to say they’re trying to figure out a right Community around it can build towards it contribute and they are benefiting in their products uh by using this AI. so in the very very small subset of of vendors who can potentially do this Google AWS Asher IBM we’re all in the business of selling the AI to other businesses right. so you won’t come to a point where you can just open up your your models and then IBM comes in and says you know what mic drop I’m going to open source my granite models as well. we want the community to come and help so it’s it’s a very it’s a this Market is about split between companies like meta and IBM who are opening up the models completely versus the clo Source models.
Tim Hwang: yeah definitely and I think part of the question that really is in my mind is you know the kind of pressure this creates for the close Source models right. because I think on some level it’s kind of like look if you’re open AI you’re like we’ve got this like crazy you know machine intelligence and we’re going to rent you access to it right and this is in some ways kind of flips the whole thing on its head right. it basically says look access to that is going to be free. um and I guess Mario I’m curious to someone who’s kind of in the space like do you think that this is ultimately going to force like an open AI or an anthropic to also have to go open source in the end because it feels like once it’s free you know why would you pay for Claude or something like that.
Maryam Ashoori: well look what mol did yesterday with mistol large too they put it out the weights for research only. this is their Flagship model. we’ve been having a lot of conversation in the past about protecting the weight to make sure that it doesn’t go out and here it is it’s out for research only. coh here did something similar a few months ago so I’d see the trend where the trend is going is to nurture that openness but reserve the rights for commercial purposes.
Tim Hwang: I think the final item we might want to touch on before we jump here is Mariam you kind of pointed out M trial was kind of like moving in a very similar Direction here in terms of their kind of like licensing and research. presumably part of that research licensing is also to like say Hey you can red team this model you can make it better you can Surface all the safety issues. um I’m curious about like how you see kind of platforms like uh Mr kind of like f falling into this ecosystem right because they are not one of the really big corporate players right. um and um but they still seem to be really kind of like being able to kind of ride this open source wave. um curious if kind of thoughts on like how they’ll fit into the competition here as it continues to evolve.
Maryam Ashoori: I think it’s important to think about what problem each of these model providers are solving. if you look into mol they are EUR born um favorite of the Europe so that’s that’s the market they are supporting a wide range of European languages not a specific dialect but a wide range over there. so if you are in Europe if you’re speaking that language there is a way higher chance that mol is a better positions model for you. um so I think it’s important to understand what the use case is who is going to be using it and then what is the right model for that Target use case versus generalizing of hey is Mr Large better than anthropic.
Tim Hwang: so I’m going to move us on to the second story that we’re going to cover today. we could obviously be talking a lot more about this but I think almost we’re going to flip to the other big Dimension um that we see kind of evolving in the AI market. so one of them is from close to open right which I think is definitely a big shift and 3.1 really just puts a big meta flag you know on that on that change. I think the other big change that we’ve been tracking and we’ve been talking about in the last few episodes but I want to hit on it really hard is basically the movement from gigantic models to very little models right very fast very cheap. um and and smaller models. um and the kind of Peg for this is just last week open AI announced their latest Salvo in this battle. um which is a model called gp2 uh GPT 40 mini to maram your earlier point about like they really got to improve the name on these but that’s what they released and what’s so striking about this announcement is that the pricing is like legitimately crazy. it’s like 15 cents per 1 million input tokens 60 cents per 1 million output tokens and they point out in their blog post that since 20 uh 2022 the cost per token have dropped 99%. um and I think the first question I just want to launch with and Chris maybe you’re well positioned to answer this is are we just in a price War here like is this even sustainable. like I’m kind of curious about like how much of this is open AI really being able to cut the costs of serving low enough that they can offer these models still out of profit versus them just really kind of in this big like just pushing the price to zero battle against their Rivals because it kind of feels like this is also part of the open source competition as well right. is like how can we offer free to keep up with all the other free options that are happening out in the world. and so I mean I guess Chris to you kind of like do you think we’re in a price War.
Chris Hay: I think we are a little bit in a price War. I think open AI has some other considerations as well because although though we’ve spent all of our time playing with GPT 4 GPT 40 I think the reality is the vast majority of people are running GPT 3.5 which is a bigger model and realistically open AI had to remove their free model and run a cheaper model. and actually that’s kind of what they’ve done there with GPT 40 mini. so yeah wonderful here we go we’re so great here’s a cheaper model Etc but actually they managed to get all of the very large G bt35 off of their books and now they’re running a uh a smaller model to be able to serve the majority of the requests that are hitting chat gbt. so I think they needed to do that just for kind of commercial reasons. I think we are pushing towards smaller models all the time the only time you really need the larger models is for reasoning and planning right. for the smaller models most of the time with good fine tuning you can get the model to do what you want. and I think open AI realized that as well. the world realizes that. open AI has already seen that a lot of people start with GPT 4 as or GPT 40 their starter model when they’re in the Enterprise but very quickly when they go to production they bring it down to a smaller model and that’s been eating their lunch and they wanted to stop that as well. so and then as we move to devices embedded devices they need to be able to play in that space so it’s critically important for open a to have that smaller model in the market space. um so I think that’s great but is it a price War partially but it’s also a cred War as well.
Tim Hwang: Chris did I hear you say embedded models open AI on device that’s that’s the use case I can’t solve open AI is an API I think they’ll get there.
Chris Hay: if you really think about this for a second so if we take a guess at what size the OM Mini model is it’s probably around the 11 billion parameter Mark right maybe a little bit more maybe a little bit less. there is going to come a point when you start dealing with apple when you start dealing with uh Google Etc you are going to have to provide a model at some point to run on a device. and if you don’t you’re going to be locked out of a market. we’ve already seen that with the iPhone uh and their recent announcements. so they are going to have to do something in that space. I think this is a move towards that no doubt. so they’re not offering embedded just now but they will in the future.
Tim Hwang: yeah I think that definitely is going to happen. I think also another thing in Chris and what you’re saying is do we have too much intelligence now. you know this is kind of where you’re pointing towards right. and Mariam I don’t know if you agree as kind of someone who’s working on Watson X is like you know it’s almost like there’s some level always like oh why don’t you want the bigger and better model it can do so many more things but I guess kind of Chris you’re making the argument that like maybe we don’t really need those things. um like there’s almost basically like we’ve now passed the point our models are so capable now that like actually they’re past the point of like what we actually need on an everyday basis.
Maryam Ashoori: well think about it the larger the model the more more powerful it is but also the higher uh the larger compute resources it needs. it translates to latency that’s response time. if your Enterprise you want to use it in production that translates to carbon footprint and energy consumption. that is the topic of conversation these days and that translates to cost. so cost is actually just one of the factors. so in some highly regulated environments it might be even the the other two might be bigger blocker uh to move forward but the comment that you made about the price I I feel like there are two fold to this. if you are a model provider you want to set the price as much as you can to increase the adoption. if you are a consumer of those we see half of the market has already moved from exploration to pilots and 10% to production. that when you get to a scale the the cost adds up. like if you think about that for normal prediction use cases you might be having like 500,000 predictions a day of if you’re a door Dash or I used to work for l so in those environments if you want to use jna just do the calculation price per API call it adds up. so it’s really not sustainable so in order to get it to production and scale model provider has to like find a way to set the price low and the way that we can implement it is usually through smaller moving through smaller um platform so it’s it’s sort of driving the de the demand is driving where the market is going but also this is is the right thing for the whole Market to do.
Chris Hay: and maram just to add to that right so this week as well open AI added the ability to find tune their Mini model and I have to think these two things are related right because when you go to production as you say maram one of the things you’re going to want to do to cut bring that inference down and improve the reliability is take that model and fine-tune it with your data rather than having large prompts.
Maryam Ashoori: yeah see in the market emerging for Enterprise is grabbing a much a smaller trusted models I would say and fine-tune it on their proprietary data the data about their users and the data about the domain. because at the end of the day they want to have something differentiated in the market because these large models everyone has access to that and the power of differentiation is really the proprietary data. in order to do that you should be able to fine tune it with the data that no one else has access to.
Shobhit Varshney: yeah Mariam and I we were on a call with a client yesterday and we got with this nice argument he just started off by saying he’s a head of AI for a big big Fortune company and he started off by saying that hey I expect these models to be intelligent so I don’t like the small ones. I really want big large ones so they can actually do something meaningful for me and we had a nice chat with him explaining him how this whole pricing and stuff like that work. let’s take an example for just the pricing part maram and this is something that you and I do quite a bit on excels trying to showcase the the range right. and your favorite example is let’s take a 30 minute uh recording of a of of a call and let’s summarize it into one page right. and if if you look at some of the tokens make some assumptions around that let’s do this a thousand times right. a thousand times you’re summarizing a call transcript into a page with the 40 Mini model that is a dollar. and if you look at the best-in-class models from Claude and 40 that is about $30 to $40 right. so you get us quite a bit of a range and then you bring in the Llama model the biggest one and the 4 and five billion pret model it’s going to be 80 bucks. so now you have an open-source model that’s costing $80 for those thousand summarizations. you have the best-in-class Frontiers that are half of that at 40 bucks and then you have a dollar if you’re using the 40 mini. even hosting your own models if you’re hosting a llama 8 model which is much smaller than the 40 mini hosting your own model on AWS Azure IBM Google of the world that’s going to be like $34 for you. so now you have just look at the price points a dollar openi mini you don’t have to have any headache on what’s Happening they’re giving you all kinds of indemnifications they’ll give you ways of find tuning it make it your own a dollar $34 is the free really really small llama 388 bilon parameter models then there’s $40 if you’re doing the best in class and there’s $80 if you’re doing the biggest llama model open source just it becomes very real when you start doing a million of these a day.
Tim Hwang: so we have to wrap up uh we’re almost at time of course as always we could spend a lot more time talking about this. I think one of the most interesting things coming out of this conversation is maybe it becomes worth less and less to train bigger and bigger models. and so here’s my spicy take I want to end with with a yes or no question which is at some point in the future will open AI stop training larger and larger models and just focus on the models they have Chris.
Chris Hay: open AI is going to build a model that’s powered by the sun in the future got it.
Tim Hwang: show it.
Shobhit Varshney: open a will keep going at it there’s a lot more to be done to get to human intelligence.
Tim Hwang: and Mariam.
Maryam Ashoori: the regulations is going to stop that at some point.
Tim Hwang: wow okay so at some point open a will stop. um well a lot more to get into there Mariam we’ll just have to have you back on the show at some point regrettably. uh but I hope you had a good time Chris show it again thanks for joining us and uh for all you listeners out there thanks for joining us as well if you enjoyed what you heard you can get us on Apple podcast Spotify and podcast platforms everywhere and we’ll see you next week.
Listen to engaging discussions with tech leaders. Watch the latest episodes.