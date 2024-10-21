February

Google launches Gemini 1.5 in limited beta, an advanced language model capable of handling context lengths of up to 1 million tokens.42 The model can process and understand vast amounts of information in a single prompt, improving its ability to maintain context in complex conversations and tasks over extended text. Gemini 1.5 represents a notable leap in natural language processing by providing enhanced memory capabilities and contextual understanding over long inputs.

OpenAI publicly announces Sora, a text-to-video model capable of generating videos up to one minute long from textual descriptions.43 This innovation expands the use of AI-generated content beyond static images, enabling users to create dynamic, detailed video clips based on prompts. Sora is expected to open new possibilities in video content creation.

StabilityAI announces Stable Diffusion 3, its latest text-to-image model. Like Sora, Stable Diffusion 3 uses a similar architecture for generating detailed and creative content from text prompts.44

May

Google DeepMind unveils a new extension of AlphaFold that helps identify cancer and genetic diseases, offering a powerful tool for genetic diagnostics and personalized medicine.45

IBM introduces the Granite™ family of generative AI models as part of its watsonx™ platform. Ranging 3–34 billion parameters, Granite models are designed for tasks such as code generation, time-series forecasting and document processing. Open-sourced and available under the Apache 2.0 license, these models are lightweight, cost-effective and customizable, making them ideal for a wide range of business applications.

June

Apple announces Apple Intelligence, an integration of ChatGPT into new iPhones and Siri.46 This integration allows Siri to perform more complex tasks, hold more natural conversations and better understand and execute nuanced commands.

September

NotebookLM introduces DeepDive, a new multimodal AI capable of transforming source materials into engaging audio presentations structured as a podcast.47 DeepDive's ability to analyze and summarize information from different formats, including webpages, text, audio and video, opens new opportunities for creating personalized and automated content across various platforms. This capability makes it a versatile tool for media production and education.



Current AI trends point to new evolutions of generative AI operating on smaller, more efficient foundation models and the rise of agentic AI, where specific AI models work together to complete user requests faster. Further into the future, autonomous vehicles will be cruising the highways, multimodal AI will create audio, video, text and images in a single platform and AI assistants will help users navigate their personal lives and careers.