Meta releases Llama 3.2. In Episode 22 of Mixture of Experts, host Tim Hwang is joined by Maryam Ashoori, Skyler Speakman and Shobhit Varshney to debrief an exciting week of AI news.
First, Meta is back with the release of Llama 3.2 and lightweight 1B/3B models. Next, it’s Climate Week NYC, we chat about the use of generative artificial intelligence (gen AI) in achieving sustainable development goals. Specifically, IBM and NASA’s AI model for weather and climate. Finally, the book version of “AI Snake Oil” officially dropped and the authors claim they will be wrong in “2.5 years”. What do our experts think? Tune in today to find out.
Key takeaways:
The opinions that are expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.
Tim Hwang: What comes next in open source? If you just combine this recipe and map it to other models, I’m expecting a lot of very powerful models. Because AI’s prediction, it’s just pretty limited, right? I guess I might take a bit of issue where AI is fundamentally about prediction. Why exactly are people so excited about the use of AI in sustainable development? So you can see how people are trying to wrangle: how do I balance the compute that’s needed versus the energy consumption? All that and more on today’s episode of Mixture of Experts.
I’m Tim Hwang, and I’m exhausted. It’s been another crazy week of news in artificial intelligence, but we are joined today, as we are every Friday, by a world-class panel of people to help us all sort it out. Maryam Ashoori is Director of Product Management at watsonx.ai; Shobhit Varshney is Senior Partner Consulting on AI for the US, Canada, and Latin America; and Skyler Speakman is a Senior Research Scientist.
So the way we’re going to begin, as we’ve done for the last few episodes, is with a simple round-the-horn question. The guests have not been prepped, so you’ll hear their unvarnished, instinctual response. Here’s the question: In 2025, a few months from now, will there be an open-source model that is absolutely better than any proprietary model on the market? Shobhit, yes or no?
Shobhit Varshney: It’ll get close.
Tim Hwang: Okay. Skyler, yes or no?
Skyler Speakman: No.
Tim Hwang: And Maryam, what do you think?
Maryam Ashoori: Big yes.
Tim Hwang: Okay, whoa! Alright, nice. Very exciting. Well, that’s actually the lead-in for our first segment today. One of the big announcements is the release of LLaMA 3.2. If you’ve been following the news, LLaMA is the best-in-class open-source model that Meta has been advancing.
Their release earlier this week featured a large range of models, small and big. Maryam, I understand you were involved in the release. Do you want to tell us a little about your experiences and how that was?
Maryam Ashoori: Yes, it’s so exciting to be part of that market moment on the first day when the models are released. The excitement on the platform is just amazing.
Tim Hwang: From the outside, what’s different with the 3.2 release? Is it just more open source? What should we be paying attention to?
Maryam Ashoori: Well, there are three key things they released with 3.2. The first is lightweight models, unlocking IoT and edge use cases with the release of LLaMA 3B and 1B. The second is multi-modal vision support. It’s image-in, text-out, unlocking use cases like image captioning, chart interpretation, and visual Q&A.
The beauty is how they did it: they separated the image encoder from the large language model encoder and trained an adapter. This means the core model isn’t changed compared to LLaMA 3.1, so it can be a drop-in replacement for the 11B and 70B variants, but the added image encoder enables it to process images. The third thing is the release of LLaMA Guard for vision, addressing the safety of these multi-modal models, which is also available on our platform for customers.
Tim Hwang: That’s awesome. A lot to go through. Shobhit, you always beat the drum that models are going to get smaller, and it’s a good thing. How does this matter for people implementing this in the enterprise?
Shobhit Varshney: Yes, a lot of my clients are deploying small language models on-device, often because they don’t have good internet access on a factory floor or in the field, especially federal or manufacturing clients. For the last few months, I’ve been super impressed by the momentum towards smaller, more efficient models in the 1B to 3B parameter space. We’ve seen an influx from Google’s Gemma, Apple’s OpenELM, Microsoft’s Phi-3.5.
I downloaded Meta’s 1B parameter model before a flight and experimented for three hours. By the way, I was looking at the Meta Connect event using our Oculus glasses—a completely immersive experience! There are certain things we do for clients where we add a layer of fine-tuning. The fact that these models are small and open means I can fine-tune them and deliver much higher accuracy with a much smaller footprint. That’s where you get the gold—the return on investment from small models you can fine-tune and run on-device opens up a whole lot of use cases you can’t do if you’re calling an API back and forth.
Tim Hwang: Definitely. Skyler, this puts your “no” answer into context. You’re not excited about a 500B parameter model beating the best; you’re excited about the focus on smaller models, right?
Skyler Speakman: If they had come out with a 500B parameter model, that would have been a “yes” for me. But emphasizing the 3B and 1B parameter space gets me excited because it moves away from the “bigger is better” idea. That idea has crowded out other cool research problems. To see a major player like Meta make noise about 1B and 3B models is outstanding work.
It also shifts power dynamics; decision-makers aren’t gated behind access to running a 400B parameter model. If open source continues getting these smaller scales, it’s a really good direction. Kudos to LLaMA for showing skills in this space. Being able to download it before a flight—that type of accessibility is a great direction. There are a couple of other things, like the 128k context window, which was pretty surprising for such a small model.
Tim Hwang: Why is that surprising? Some folks might not be familiar with that subtlety.
Shobhit Varshney: Yeah, the fact you can put more context into the prompt—128,000 tokens—means I can pass in a whole email thread chain on-device. Eventually, we’ll see more small models that can handle images too. Currently, models like Pix2Struct (12B) or Meta’s 11B do images, but I’m hopeful image capabilities will come down to 2-3B parameter models. Doing that on-device, like taking a picture of equipment and asking what’s wrong or what the meter reading is—I’m super excited.
There are a few things I’d like to see in future iterations, like function calling, creating plans, and more agentic flows between these smaller models. I’m very excited about the future.
Maryam, we’ve been working on Granite models for a while, focused on small models. What’s your perspective on a good size threshold for performance? 7B to 2B? Where do you see it?
Maryam Ashoori: Well, it depends on the use case. For IoT or edge, the smaller the better. It impacts latency, speed, energy consumption, carbon footprint, and cost. If we can get the needed performance from a smaller model, that’s well-suited.
But Skyler, to your point, what excites me about this release is how they achieved these lightweight models. If you look at the paper, they took the LLaMA 8B and structurally pruned it—cutting the network to make it smaller. Then they used a very large general-purpose model (the 405B) as a teacher model for distillation to bridge the performance gap. If you just combine this recipe of pruning and distillation and map it to other models, I’m expecting a lot of very powerful models coming to the market.
Tim Hwang: Yeah, for sure. As it gets cheaper and more available, we’ll see more use cases. We’ve been gated by the investment and cost to run models, but as it becomes accessible, why not just plug a model in? It’ll be applied to applications we would have thought ridiculous a few years ago because it was too expensive.
Shobhit Varshney: Hey Maryam, just on the latency part—I was stunned. On the flight, I had the 1B parameter model running, and it was giving me 2,000 tokens per second. That’s like 1,500 words generated per second! That’s the speed I want when a model on my phone responds. I became a believer when I saw that latency.
Tim Hwang: The vision of you on the plane with goggles using a model... your seat neighbor must have been wondering who this guy is! I’m waiting for airline documentation that says, “Please do not run LLMs on devices while the plane is in flight.”
Maryam, before we move on, what comes next? Are we going to see more releases like this? Is this the big release for a while?
Maryam Ashoori: I’m expecting a lot of movement in open source. The future of AI is open. This openness drives innovation and gives you three things:
1. It makes the technology accessible to a wider audience.
2. It allows you to stress-test your technology, advancing safety together with the power of the community.
3. It accelerates innovation and contributions back to building better models for different use cases.
A combination of accessibility, safety enhancement, and acceleration in innovation is what I expect. Because of that, we are going to see a lot more powerful smaller models emerging in the next six months.
Tim Hwang: Two researchers, Arvind Narayanan and his collaborator Sayash Kapoor, came out with a book called AI Snake Oil. It’s an adaptation of their successful Substack, where they point out places where AI is being oversold, overhyped, or deployed in ways that aren’t the best use of the technology.
Arvind took to the internet to say they’re so confident in their arguments that they’ve put a bounty out: if you think they’re wrong on anything in the book, tell them, and they can put a bet on it in 2-5 years. Their argument is that the critiques they point out about AI systems aren’t about technological capabilities but about what we can actually predict in the world.
They say AI really can’t predict individual life outcomes, the success of cultural products like books and movies, or things like pandemics. They argue prediction can only go so far, and since AI is ultimately a prediction machine, that caps what the technology can be used for.
I’m curious if the group buys that argument. Do we think this “prediction” thing is limited, capping what AI can be used for? Skyler, maybe I’ll throw it to you.
Skyler Speakman: I guess I might take a bit of issue with the idea that AI is fundamentally about prediction. The gains we’ve seen recently use the Transformer for next-token prediction, yes. But because it can do that, there are so many other use cases that are not prediction-focused. You have to understand the context of the data, and the underlying model relies on prediction, but it’s so much bigger than just prediction. The downstream tasks you can do after that prediction task are what has moved the space forward. So don’t get too hung up on the prediction capabilities of a model.
Maryam Ashoori: I’m with Skyler on that. In traditional ML, prediction was key, and the majority of enterprise use cases were for prediction. But with generative AI, the prominent use cases are productivity unlocks—content generation, code generation. It can be prediction in a sense, like the next token, but I don’t think that’s the primary use case the technology is designed to deliver. For that reason, I don’t 100% agree that prediction is the primary use case for AI.
Tim Hwang: That’s very interesting. This is a debate I find fascinating. It’s almost like machine learning diverged from computer science because programming a computer is different from testing and fine-tuning a model. You’re saying there’s another distinction: traditional machine learning diverges from the concerns of generative AI. This current generation is so different that there’s a different set of problems. Is that what you’re both chasing after?
Skyler Speakman: I do think there is a divergence away from classical machine learning—your decision trees, regressions, all those PhDs—and generative AI. Those have diverged. I’m trying to keep up; my previous background was in classical machine learning, and now we’re in for a wild ride with generative AI.
Shobhit Varshney: Tim, being a podcast, let me quickly recap the book. I had the pleasure of listening to the audiobook on the flight while I was hacking—very meta! The two authors are brilliant; they are two of the top 100 influential people in AI according to Time Magazine.
They make five points in the book:
1. AI predicts but doesn’t truly understand context.
2. AI will reinforce our biases in areas like policy and hiring.
3. Be skeptical about anything that’s a black-box AI solution (related to Maryam’s point about openness).
4. There should be stricter regulations and accountability, especially when AI makes an outcome with an adverse impact.
5. Ethics in AI has to be focused on beyond just technical capabilities.
None of these are groundbreaking statements we haven’t heard before. But the first one is where Skyler started: AI is making predictions. In many cases, we expect an intern or junior person to make a prediction, look at a pattern, and raise their hand when they see something not working.
My wife is a physician; she spent 14 years in medicine. She has medical assistants or nurse practitioners who help patients. She expects them to raise their hand when they see a pattern break. A patient comes with stats from tests; if something looks different, they should call her as an expert.
I think that’s where we should be with AI. AI is augmenting us. We should be precise: pattern recognition is a good thing; I want AI to do patterns. There’s a gap between pattern recognition and root cause analysis. Causal modeling requires years of experience. That’s the relationship I want with AI: be able to find patterns and raise your hand, then come to me for expert advice. I think we’re heading in a good direction. The book’s name is catchy, but the points are pretty grounded in reality today.
Tim Hwang: Yeah, for sure. I agree, Shobhit; that’s the dream of how the technology should be deployed. Their worry is that the market won’t provide that—there will be a tendency to just implement the AI and have it do everything.
A question for the group: how do we do a good job fighting that? I want to live in the world you’re describing, but people new to the technology tend to apply it for causal stuff, where we want to preserve the human role. In conversations with friends and family, are there things you do to help set the level properly?
Skyler Speakman: An example that came up recently: my parents were both public school teachers. We were talking about whether AI will replace teaching. Similar to the healthcare ideas, I would like to see AI be very measured in education. There’s a human connection that has to come through. To back off a bit from that face-to-face interaction... similar to Shobhit’s medical analogy, we need to see specific roles. An AI instructor would be terrible; I wouldn’t want that world. But having AI assist students and assist the interaction between a human teacher and students would be a cool example where we pull back and don’t go full automation—probably in health as well.
Shobhit Varshney: I will push back a bit, Skyler, on the education piece. If you follow Salman Khan doing Khan Academy and Khanmigo, the impact he’s having surgically with AI... he’s figured out a good blend between teachers, students, and where AI becomes a co-pilot. To your point about creating the human connection, 100%. My mom was a teacher and principal of my school, which did not go well for me! But a teacher today addresses 60 kids in a room and has to talk at the same level for each one. You can’t adapt the training for people from different language backgrounds or who take longer to understand certain sections.
AI can adapt the teaching curriculum to the student. You can take great PhD professors from MIT and translate that coursework for someone in a village in India. AI can play a very positive role. Back to what Tim was saying, we need your parents, Skyler, to tell us where AI should be augmenting—like taking the same lesson and creating multiple flashcards, adapting that lesson. There’s a lot you can do with AI in teaching.
Tim Hwang: Next week, my parents will be on the podcast! We should do a parents episode with everybody’s parents but none of the usual guests. That would be fun. From this, I’ve learned I need to check back in with Khan Academy. The last time I was there, it was YouTube videos. I think that space has really expanded.
Shobhit Varshney: They’re doing a lot of interesting experiments.
Tim Hwang: I want to make sure we get time for the last topic, which is broad and connects stories from the last few weeks that we haven’t covered much. The topic is the relationship between AI and sustainability.
This week was the UN General Assembly, and it was interesting that the US State Department brought together CEOs to talk about how AI will be used for Sustainable Development Goals. Similarly, IBM just released a paper on collaborations with NASA around predicting climate and building available climate models.
Shobhit, I understand you gave a talk on this topic recently. Can you give our listeners a sense of how this connection is evolving? Using this technology for big problems... as someone not deep in the space, I think, “How does ChatGPT help save the world?” I know that’s not the case, but can you give more color on how people are using this tech in the space?
Shobhit Varshney: Absolutely, Tim. IBM does a lot of work here; we have our own commitment to be carbon neutral by 2030, and we’re doing a great job. This week, I spent time in New York with global leaders and celebrities in the space and was humbled by the problems everyone is dealing with.
The conversation focused on how AI can help solve sustainability goals, and we need that compute power to solve these gnarly problems—making predictions on climate at a granular level, forecasting events, optimizing the cost envelope of businesses.
On the flip side, there’s a climate and environmental cost to running these models. A few data points: asking ChatGPT a question consumes a 500ml bottle of water just to cool the data centers. Bloomberg did a study: all data centers together would be the 17th largest country in energy consumption—more than countries like Italy. In Ireland, data centers use 12% of the national energy consumption, more than all households combined.
You see graphs of energy consumption, and it’s staggering. Companies like Microsoft are now partnering with nuclear reactors, like trying to resurrect Three Mile Island, to power them. So you see people trying to wrangle: how do I balance the needed compute versus energy consumption?
My talk was about being “computationally responsible.” We have to figure out the right balance from the chip level up to how we use the models. I suggested that, like cars have an MPG sticker or flights show carbon emissions, we need to be conscious. If I’m using ChatGPT as a calculator to add two numbers versus a physical calculator, there’s a huge delta in energy use. And we might get the answer wrong!
There are good use cases where AI helps. We do a lot of work with forestation, looking at land use increase, predicting catastrophic events with governments worldwide, helping with wildfires. I’m overall impressed with how IBM has taken a position on sustainability, using AI for good. We’re focused on smaller, energy-efficient models, optimizing compute. This is part of our AI Alliance with other companies, collectively trying to reduce the threshold to implement AI worldwide, especially in Africa, Europe, and Asia.
Skyler Speakman: Shobhit, I like that bottle of water analogy. A paper came out from Signal and Hugging Face last week on sustainability and energy use. One unit of analysis was how many cell phone charges an AI query uses. The highest was image generation; a query is getting close to a cell phone’s overnight charge. I really liked that unit of analysis because it brings it home. You put in a query for an image, and that’s the power of a cell phone for a day or two. We need more creative metrics like that to present to the world how power-hungry or water-thirsty these models are. Otherwise, I see megawatt-hours; I’m not an electrical engineer, and I don’t appreciate it. But bottles of water or cell phone charges—that clicks.
Tim Hwang: Would you want it to be metered? Like, as you’re using Claude, it says, “Here’s how much power you’ve used.”
Shobhit Varshney: Maryam, we’ve done a lot of work with Granite models, and we open-sourced them. Do you want to share what we’re doing?
Maryam Ashoori: With Granite, we focus on smaller models for the exact reasons Shobhit mentioned. Let me share data points: hosting a 500B parameter model on A100s roughly requires 16 A100s. Hosting a 20B parameter model requires just one A100. So an API call to a 20B model is 16x more energy efficient than one to a 500B model, just because it uses 16x fewer GPUs—ignoring cost and latency. Just for sustainability.
Because of this, we see the market looking for the smallest model that makes sense and customizing it on proprietary data—user data, domain-specific data—to create something differentiated that delivers the needed performance for a fraction of the cost (energy, carbon footprint, etc.). That’s the guiding principle for Granite: smaller, enterprise-ready models rooted in value and trust, allowing companies to use their own data to make custom models.
Our Granite custom open-source models are released under Apache 2.0 license, giving enterprises the freedom and flexibility to customize for commercial purposes with no restriction. That’s the power of Granite.
Shobhit Varshney: I love that. This week, we also released our Prithvi next-generation models for Granite. Just to share with the audience: IBM has been partnering with NASA. Generally, we have ML models that make predictions on weather patterns. This is the first time it’s ever been done where we’ve created a foundation model where a pixel—a square inch of the earth—is used as a token to predict what will happen next, similar to using text.
We built this foundation model that combines weather and climate data. This model can then be adapted for various use cases. Currently, forecasting rainfall in Florida requires a completely different model from deforestation analysis elsewhere. For the first time, we have a combined model that can be easily adapted, like the foundation models we’ve built. And as a mic drop—it’s completely open source to the community. You can take these Prithvi models from Hugging Face and deploy them for multiple things.
The next iteration, I hope, is what multi-modal models did: you used to have one model for text, one for image, and now they’re combined. I’m hoping we’ll get to that point with foundation models for weather and climate, so the same model can connect what’s happening in different places—changing climate patterns and deforestation—and think through combining them. We’ve made the first step towards a future where foundation models can combine all this data, and the same model can answer all these questions.
Maryam Ashoori: Exactly! I got super excited about this. Think about it: 40 years of NASA satellite images are at our fingertips with these models to use for weather forecasting, climate prediction, seasonal prediction, and to inform decisions for planning mitigations for climate ends. That’s exciting. That’s super exciting.
Tim Hwang: It’s a great note to end on. It’s a model that’s open source; listeners can go download and play with it. It’s a great application beyond “how does a chatbot save sustainability?” There are so many other aspects people don’t think about when this topic comes up.
Well, great, everybody. That’s all the time we have for today. Thanks for joining us. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify, and podcast platforms everywhere. Shobhit, Skyler, Maryam, thanks for joining us, and we hope to have you on again sometime in the future.
Listen to engaging discussions with tech leaders. Watch the latest episodes.