추론 확장을 통한 멀티모달 RAG 개선

인공지능(AI)의 추론 확장은 더 큰 학습 데이터 세트나 모델 아키텍처에 의존하는 대신 추론 단계(모델이 아웃풋을 생성하는 단계)에서 계산 리소스를 할당해 모델 성능을 향상시키는 기술을 말합니다. 대규모 언어 모델(LLM)이 모델 매개변수와 데이터 세트 규모 모두에서 계속 확장됨에 따라 추론 시간 최적화와 추론 컴퓨팅 스케일링(특히 GPU 하드웨어)의 관리가 고성능 멀티모 검색 증강 생성(RAG) 시스템 배포의 핵심 과제가 되었습니다.

Inference Scaling 소개

테스트 시점에 컴퓨팅 리소스를 늘리고 복잡한 알고리즘을 적용하는 추론 전략의 최근 발전은 LLM이 복잡한 추론 작업을 수행하고 다양한 입력 모달리티에서 더 높은 품질의 출력을 제공하는 방식을 재정의하고 있습니다. 추론 확장은 추론 깊이를 확장하여 사고 사슬(CoT)을 최적화합니다. 이러한 확장을 통해 모델은 반복 프롬프트나 다단계 생성을 통해 더 길고 세밀한 사고 사슬을 생성할 수 있습니다. 추론 확장을 활용하면 모델 크기와 컴퓨팅 예산 간의 상호 작용, 그리고 실제 애플리케이션에서의 추론 시간 최적화를 중심으로 멀티모달 RAG를 개선할 수 있습니다.

또한 확장 법칙과 벤치마크 결과는 사전학습, 미세 조정, 추론 단계 전략, 그리고 고급 알고리즘을 이용한 출력 간의 절충점을 강조합니다. 추론 확장은 대형 모델과 소형 모델 모두에 이점을 제공하며 한정된 리소스 시스템에서도 최첨단 LLM의 성능에 가까운 결과를 낼 수 있게 해줍니다. 이 튜토리얼은 최적화 기법이 모델 성능에 미치는 영향을 보여주며 멀티모달 RAG 배포에서 정확도, 지연 시간, 그리고 비용 간 균형을 맞추기 위한 실행 가능한 지침을 제공합니다.

이 튜토리얼은 문서 관리 및 고급 자연어 처리(NLP) 기술에 대한 지식을 향상시키고자 하는 인공지능 개발자, 연구자, 그리고 애호가를 위해 설계되었습니다. 이전 예제에서 구축한 멀티모달 RAG 파이프라인을 향상시키기 위해 추론 확장의 기능을 어떻게 활용하는지 배우게 됩니다. 이 튜토리얼에서는 특히 IBM® Granite 대규모 언어 모델에 초점을 맞춘 멀티모달 RAG의 확장성 전략에 중점을 두지만 OpenAI(예: GPT-4, GPT-4o, ChatGPT) 및 DeepMind의 모델을 비롯한 가장 널리 사용되는 모델에도 유사한 원칙을 적용할 수 있습니다.

이 튜토리얼은 다음과 같은 절차를 안내합니다.

문서 전처리: Docling을 활용하여 다양한 소스의 문서를 처리하고 이를 구문 분석 및 활용 가능한 형식으로 변환한 뒤 벡터 데이터베이스에 저장하는 방법을 배웁니다. Docling은 PDF, DOCX, PPTX, XLSX, 이미지, HTML, AsciiDoc, Markdown 등 다양한 문서 형식을 효율적으로 구문 분석하는 IBM 오픈 소스 툴킷입니다. 이후 문서 내용을 Markdown이나 JSON과 같이 기계가 읽을 수 있는 형식으로 내보냅니다. Granite 머신 러닝(ML) 모델을 활용하여 문서 내의 이미지에 대한 설명을 생성합니다. 이 튜토리얼에서 Docling은 PDF 문서를 다운로드하고 처리하여 문서에 포함된 텍스트와 이미지를 추출합니다. 이 튜토리얼에서 Docling은 PDF 문서를 다운로드하고 처리하여 문서에 포함된 텍스트와 이미지를 추출합니다.
검색 증강 생성(RAG): Granite과 같은 LLM을 외부 지식 기반과 연결하여 쿼리 응답을 개선하고 유의미한 인사이트를 생성하는 방법을 익힙니다. RAG는 LLM이 훈련된 데이터 외부의 정보 지식 기반과 연결하는 데 사용되는 대규모 언어 모델(LLM) 기술입니다. 이 기술은 미세 조정 없이 LLM에 적용됩니다. 기존 RAG는 텍스트 요약이나 챗봇과 같은 텍스트 기반 사용 사례에만 국한됩니다.
멀티모달 RAG: 멀티모달 RAG가 멀티모달 대규모 언어 모델(MLLM)을 사용하여 여러 유형의 데이터로부터 정보를 처리하는 방법을 알아봅니다. 이후 이 데이터는 RAG에서 사용되는 외부 지식 베이스의 일부로 포함될 수 있습니다. 멀티모달 데이터에는 텍스트, 이미지, 오디오, 비디오 또는 기타 형식이 포함될 수 있습니다. 이 튜토리얼에서는 IBM의 최신 멀티모달 비전 모델인 Granite 3.2 Vision을 사용합니다.
데모 기반 RAG(DRAG) 및 반복 데모 기반 RAG(IterDRAG) 구현: 연구 논문의 추론 확장 기술을 적용하여 긴 컨텍스트로 작업할 때 RAG 성능을 크게 개선합니다. DRAG 방법은 인컨텍스트 학습을 활용하여 RAG 성능을 향상시킵니다. 여러 RAG 예시를 데모로 포함하면 DRAG는 모델이 긴 컨텍스트에서 관련 정보를 찾는 방식을 학습하도록 돕습니다. 더 많은 문서로 성능이 정체될 수 있는 표준 RAG와 달리 DRAG는 컨텍스트 길이가 증가함에 따라 선형적으로 성능이 향상됩니다. IterDRAG는 복잡한 멀티홉 쿼리를 더 간단한 하위 쿼리로 분해하여 처리하는 DRAG의 확장입니다. 멀티홉은 복잡한 쿼리를 세분화하고 간단한 하위 질문별로 답을 생성하는 과정입니다. 각 하위 질문은 서로 다른 소스에서 검색하거나 합성된 정보가 필요할 수 있습니다. IterDRAG는 검색과 생성 단계를 번갈아 수행하여 논리적 간극을 메우는 추론 체인을 생성합니다. 이 방법은 긴 컨텍스트 내에서 복잡한 쿼리를 처리하는 데 특히 효과적입니다.
LangChain 워크플로우 통합: LangChain을 사용하여 문서 처리 및 검색 워크플로우를 간소화하고 조율하여 시스템 내 다양한 구성 요소 간의 원활한 상호작용을 구현하는 방법을 알아봅니다.

이 튜토리얼에서는 세 가지 최첨단 기술도 사용합니다.

Docling: 문서를 구문 분석하고 변환하는 데 사용되는 오픈 소스 툴킷입니다.
Granite: 강력한 자연어 기능을 제공하는 최신 LLM 제품군과 이미지로부터 텍스트 변환을 지원하는 비전 언어 모델입니다.
LangChain: 언어 모델 기반 애플리케이션을 구축하는 데 사용되는 강력한 프레임워크로, 복잡한 워크플로를 간소화하고 외부 도구를 원활하게 통합하도록 설계되었습니다.

이 튜토리얼을 마치면 다음을 수행할 수 있습니다.

문서 전처리, 청크 분할, 이미지 이해 능력을 습득합니다.
벡터 데이터베이스를 통합하여 검색 기능을 강화합니다.
DRAG와 IterDRAG를 구현하여 추론 확장 기반의 효율적이고 정확한 데이터 검색을 수행합니다.
추론 연산 자원의 확장이 RAG 성능을 거의 선형적으로 향상시키는 과정을 직접 체험합니다.

긴 컨텍스트 문제 이해

기존 언어 모델은 다음과 같은 여러 이유로 인해 긴 컨텍스트 처리에 어려움을 겪습니다.

트랜스포머와 같은 기존 어텐션 메커니즘은 연산량이 제곱으로 증가하여 막대한 연산 리소스가 소요될 수 있습니다.
아주 긴 시퀀스에서 관련 정보를 찾기 어렵습니다.
입력의 서로 떨어진 부분들 간의 일관성을 유지하는 데 어려움이 있습니다.
긴 시퀀스를 처리하기 위한 계산 요구가 증가합니다.

이 튜토리얼의 기법은 추론 연산 자원을 전략적으로 할당하여 이러한 문제를 해결합니다.

추론 확장 기법: DRAG 및 IterDRAG

DRAG vs IterDRAG

DRAG와 IterDRAG 두 가지 고급 추론 확장 기법에 대한 자세한 내용은 연구 논문 'Inference Scaling for Long-Context Retrieval Augmented Generation'에서 확인할 수 있습니다.

이러한 방법은 추론 연산의 확장이 최적으로 배분될 경우, RAG 성능이 거의 선형적으로 향상됨을 보여줍니다. 이를 통해 RAG 시스템이 최신 LLM의 긴 컨텍스트 처리 능력을 더욱 효과적으로 활용할 수 있습니다. 이 구현에서는 다양한 양식을 처리할 수 있는 IBM Granite 모델을 사용합니다. 논문의 원칙을 적용하여 비정형 데이터에서 실시간 사용자 쿼리에 응답하는 AI 시스템을 구축하게 됩니다.

전제조건

Python 프로그래밍에 익숙해야 합니다.
LLM, NLP 개념 및 컴퓨터 비전에 대한 기초 지식이 필요합니다.

단계

새로 생성된 가상 환경에서 Python 3.10, 3.11 또는 3.12 버전을 사용하는지 확인하세요. 참고로 이 튜토리얼은 GitHub에서도 확인할 수 있습니다.

1단계: 환경 설정

import sys
assert sys.version_info >= (3, 10) and sys.version_info < (3, 13), "Use Python 3.10, 3.11, or 3.12 to run this notebook."

2단계: 의존성 설치

! pip install "git+https://github.com/ibm-granite-community/utils.git" \
    transformers \
    pillow \
    langchain_community \
    langchain_huggingface \
    langchain_milvus \
    docling \
    replicate

로깅

로깅 정보를 확인하려면 INFO 로그 레벨을 설정할 수 있습니다.

참고: 이 셀 실행은 건너뛰어도 괜찮습니다.

import logging

logging.basicConfig(level=logging.INFO)

3단계: AI 모델 선택

Granite 모델 불러오기

텍스트 임베딩 벡터 생성을 위해 사용할 임베딩 모델을 지정합니다. 여기서는 Granite 임베딩 모델 중 하나를 사용합니다.

다른 임베딩 모델을 사용하려면 이 코드 셀을 임베딩 모델 레시피의 코드 셀로 교체하면 됩니다.

from langchain_huggingface import HuggingFaceEmbeddings
from transformers import AutoTokenizer

embeddings_model_path = "ibm-granite/granite-embedding-30m-english"
embeddings_model = HuggingFaceEmbeddings(
    model_name=embeddings_model_path,
)
embeddings_tokenizer = AutoTokenizer.from_pretrained(embeddings_model_path)

이미지 이해에 사용할 MLLM을 지정합니다. Granite Vision 모델을 사용합니다.

from ibm_granite_community.notebook_utils import get_env_var
from langchain_community.llms import Replicate
from transformers import AutoProcessor

vision_model_path = "ibm-granite/granite-vision-3.2-2b"
vision_model = Replicate(
    model=vision_model_path,
    replicate_api_token=get_env_var("REPLICATE_API_TOKEN"),
    model_kwargs={
        "max_tokens": embeddings_tokenizer.max_len_single_sentence, # Set the maximum number of tokens to generate as output.
        "min_tokens": 100, # Set the minimum number of tokens to generate as output.
        "temperature": 0.01,
    },
)
vision_processor = AutoProcessor.from_pretrained(vision_model_path)

RAG 생성 작업에 사용할 언어 모델을 지정합니다. 여기서는 Replicate LangChain 클라이언트를 사용하여 Replicate의 ibm-granite 조직에서 Granite 모델에 연결합니다.

Replicate를 설정하려면 Replicate시작하기를 참고하세요.

Replicate 이외의 공급자 모델에 연결하려면 이 코드 셀을 LLM 구성 요소 레시피의 셀로 대체합니다.

model_path = "ibm-granite/granite-3.3-8b-instruct"
model = Replicate(
    model=model_path,
    replicate_api_token=get_env_var("REPLICATE_API_TOKEN"),
    model_kwargs={
        "max_tokens": 1000, # Set the maximum number of tokens to generate as output.
        "min_tokens": 100, # Set the minimum number of tokens to generate as output.
        "temperature": 0.01
    },
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

4단계: Docling으로 벡터 데이터베이스용 문서 준비

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions

pdf_pipeline_options = PdfPipelineOptions(
    do_ocr=False,
    generate_picture_images=True,
)
format_options = {
    InputFormat.PDF: PdfFormatOption(pipeline_options=pdf_pipeline_options),
}
converter = DocumentConverter(format_options=format_options)

sources = [
    "https://midwestfoodbank.org/images/AR_2020_WEB2.pdf",
]
conversions = { source: converter.convert(source=source).document for source in sources }

문서가 처리되면 문서의 텍스트 요소를 추가로 처리하고 사용 중인 임베딩 모델에 적합한 크기로 청크합니다. LangChain 문서 목록은 텍스트 청크에서 생성됩니다.

from docling_core.transforms.chunker.hybrid_chunker import HybridChunker
from docling_core.types.doc import DocItem, TableItem
from langchain_core.documents import Document

doc_id = 0
texts: list[Document] = []
for source, docling_document in conversions.items():
    for chunk in HybridChunker(tokenizer=embeddings_tokenizer).chunk(docling_document):
        items: list[DocItem] = chunk.meta.doc_items # type: ignore
        if len(items) == 1 and isinstance(items[0], TableItem):
            continue # we will process tables later
        refs = " ".join(map(lambda item: item.get_ref().cref, items))
        print(refs)
        text = chunk.text
        document = Document(
            page_content=text,
            metadata={
                "doc_id": (doc_id:=doc_id+1),
                "source": source,
                "ref": refs,
            },
        )
        texts.append(document)

print(f"{len(texts)} text document chunks created")

다음으로 문서의 모든 테이블을 처리합니다. 언어 모델이 처리할 수 있도록 테이블 데이터를 마크다운 형식으로 변환합니다. 테이블의 마크다운 변환 결과로부터 LangChain 문서 리스트를 생성합니다.

from docling_core.types.doc import DocItemLabel

doc_id = len(texts)
tables: list[Document] = []
for source, docling_document in conversions.items():
    for table in docling_document.tables:
        if table.label in [DocItemLabel.TABLE]:
            ref = table.get_ref().cref
            print(ref)
            text = table.export_to_markdown(docling_document)
            document = Document(
                page_content=text,
                metadata={
                    "doc_id": (doc_id:=doc_id+1),
                    "source": source,
                    "ref": ref
                },
            )
            tables.append(document)


print(f"{len(tables)} table documents created")

마지막으로 문서 내 이미지를 처리합니다. 이 단계에서는 비전 언어 모델을 활용하여 이미지의 내용을 이해합니다. 예시에서는 이미지 내에 포함된 텍스트 정보 추출에 중점을 둡니다.

적절한 이미지 프롬프트를 선택하는 것은 매우 중요합니다. 프롬프트가 모델이 이미지의 어떤 부분에 집중할지 결정하기 때문입니다. 예를 들면 다음과 같습니다.

"이미지에 나타난 내용을 자세히 설명해 주세요"(아래에서 사용됨)와 같은 프롬프트는 이미지의 모든 시각적 요소에 대한 일반적인 정보를 제공합니다.
“이 이미지에 어떤 텍스트가 포함되어 있나요?”와 같은 프롬프트는 텍스트 콘텐츠 추출에 특히 중점을 둡니다.
“이 이미지의 그래프 데이터 시각화를 설명해 주세요”라는 프롬프트는 차트 및 그래프 설명에 더 적합합니다.
문서의 이미지 유형과 이미지에서 추출해야 하는 정보에 따라 다양한 프롬프트를 실험해 보는 것이 좋습니다.

참고: 이미지 처리에는 이미지 수와 비전 언어 모델을 실행하는 서비스에 따라 상당한 처리 시간이 필요할 수 있습니다.

import base64
import io
import PIL.Image
import PIL.ImageOps

def encode_image(image: PIL.Image.Image, format: str = "png") -> str:
    image = PIL.ImageOps.exif_transpose(image) or image
    image = image.convert("RGB")

    buffer = io.BytesIO()
    image.save(buffer, format)
    encoding = base64.b64encode(buffer.getvalue()).decode("utf-8")
    uri = f"data:image/{format};base64,{encoding}"
    return uri

# Feel free to experiment with this prompt
image_prompt = "Give a detailed description of what is depicted in the image"
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": image_prompt},
        ],
    },
]
vision_prompt = vision_processor.apply_chat_template(
    conversation=conversation,
    add_generation_prompt=True,
)
pictures: list[Document] = []
doc_id = len(texts) + len(tables)
for source, docling_document in conversions.items():
    for picture in docling_document.pictures:
        ref = picture.get_ref().cref
        print(ref)
        image = picture.get_image(docling_document)
        if image:
            text = vision_model.invoke(vision_prompt, image=encode_image(image))
            document = Document(
                page_content=text,
                metadata={
                    "doc_id": (doc_id:=doc_id+1),
                    "source": source,
                    "ref": ref,
                },
            )
            pictures.append(document)

print(f"{len(pictures)} image descriptions created")

그런 다음 입력 문서에서 생성된 LangChain 문서를 표시할 수 있습니다.

import itertools
from docling_core.types.doc import RefItem
from IPython.display import display

# Print all created documents
for document in itertools.chain(texts, tables):
    print(f"Document ID: {document.metadata['doc_id']}")
    print(f"Source: {document.metadata['source']}")
    print(f"Content:\n{document.page_content}")
    print("=" * 80)  # Separator for clarity

for document in pictures:
    print(f"Document ID: {document.metadata['doc_id']}")
    source = document.metadata['source']
    print(f"Source: {source}")
    print(f"Content:\n{document.page_content}")
    docling_document = conversions[source]
    ref = document.metadata['ref']
    picture = RefItem(cref=ref).resolve(docling_document)
    image = picture.get_image(docling_document)
    print("Image:")
    display(image)
    print("=" * 80)  # Separator for clarity

벡터 데이터베이스 채우기

임베딩 모델을 사용하여 텍스트 청크와 생성된 이미지 캡션에서 문서를 벡터 데이터베이스에 로드합니다. 이 벡터 데이터베이스를 생성하면 문서 전체에서 의미 유사도 검색을 손쉽게 수행할 수 있습니다.

참고: 벡터 데이터베이스 채우기 과정은 임베딩 모델과 서비스에 따라 많은 시간이 소요될 수 있습니다.

벡터 데이터베이스 선택

임베딩 벡터를 저장하고 검색에 사용할 데이터베이스를 지정합니다. 이 튜토리얼에서는 Langchain을 통해 Milvus를 사용합니다. Milvus는 벡터 데이터베이스로서 신경망과 다양한 ML 알고리즘에서 생성된 수치 임베딩을 저장, 인덱싱 및 관리합니다.

Milvus 이외의 벡터 데이터베이스에 연결하려면 Vector Store 레시피의 코드 셀로 교체하면 됩니다.

import tempfile
from langchain_core.vectorstores import VectorStore, VectorStoreRetriever
from langchain_milvus import Milvus

db_file = tempfile.NamedTemporaryFile(prefix="vectorstore_", suffix=".db", delete=False).name
print(f"The vector database will be saved to {db_file}")

vector_db: VectorStore = Milvus(
    embedding_function=embeddings_model,
    connection_args={"uri": db_file},
    auto_id=True,
    enable_dynamic_field=True,
    index_params={"index_type": "AUTOINDEX"},
)

이제 텍스트, 표 및 이미지 설명에 대한 LangChain 문서를 모두 벡터 데이터베이스에 추가합니다.

import itertools

documents = list(itertools.chain(texts, tables, pictures))
ids = vector_db.add_documents(documents)
print(f"{len(ids)} documents added to the vector database")
retriever: VectorStoreRetriever = vector_db.as_retriever(search_kwargs={"k": 10})

5단계: Granite을 활용한 RAG

이제 문서를 성공적으로 변환하고 벡터화했으므로 RAG 파이프라인을 설정할 수 있습니다.

검색 품질 검증

여기서는 벡터 공간에서 쿼리와 관련된 정보를 포함한 청크를 검색하여 벡터 데이터베이스를 테스트합니다. 검색된 이미지 설명과 관련된 문서를 표시합니다.

이 검증 단계는 전체 RAG 파이프라인을 구축하기 전에 검색 시스템이 제대로 작동하는지를 확인하기 위해 중요합니다. 반환된 문서가 쿼리와 관련이 있는지를 확인합니다.

다양한 쿼리를 자유롭게 시도해 보세요.

query = "Analyze how Midwest Food Bank's financial efficiency changed during the pandemic by comparing their 2019 and 2020 performance metrics. What specific pandemic adaptations had the greatest impact on their operational capacity, and how did their volunteer management strategy evolve to maintain service levels despite COVID-19 restrictions? Provide specific statistics from the report to support your analysis."
for doc in vector_db.as_retriever().invoke(query):
    print(doc)
    print("=" * 80)  # Separator for clarity

반환된 문서는 쿼리에 적절히 응답해야 합니다. 이제 RAG 파이프라인을 구축해 보겠습니다.

Granite용 RAG 파이프라인 생성

먼저 Granite이 RAG 쿼리를 수행하도록 프롬프트를 생성합니다. Granite 채팅 템플릿을 사용하고 LangChain RAG 파이프라인이 대체할 자리 표시자 값을 제공합니다.

{context}는 이전 검색에서처럼 검색된 청크를 저장하고 이를 모델에 전달하여 질문에 답할 때 문서 컨텍스트로 활용합니다.

그런 다음 생성한 Granite 프롬프트 템플릿을 사용하여 RAG 파이프라인을 구성합니다.

from ibm_granite_community.notebook_utils import escape_f_string
from langchain.prompts import PromptTemplate
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Create a Granite prompt for question-answering with the retrieved context
prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": "{input}",
    }],
    documents=[{
        "doc_id": "0",
        "text": "{context}",
    }],
    add_generation_prompt=True,
    tokenize=False,
)
prompt_template = PromptTemplate.from_template(template=escape_f_string(prompt, "input", "context"))

# Create a Granite document prompt template to wrap each retrieved document
document_prompt_template = PromptTemplate.from_template(template="""\
<|end_of_text|>
<|start_of_role|>document {{"document_id": "{doc_id}"}}<|end_of_role|>
{page_content}""")
document_separator=""

# Assemble the retrieval-augmented generation chain
combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=prompt_template,
    document_prompt=document_prompt_template,
    document_separator=document_separator,
)
rag_chain = create_retrieval_chain(
    retriever=retriever,
    combine_docs_chain=combine_docs_chain,
)

질문에 대한 검색 증강 응답 생성

파이프라인은 쿼리를 사용하여 벡터 데이터베이스에서 문서를 찾고 이를 쿼리의 컨텍스트로 활용합니다.

outputs = rag_chain.invoke({"input": query})
print(outputs['answer'])

표준 RAG의 한계와 추론 확장이 필요한 이유

표준 RAG 접근 방식은 비교적 잘 작동하지만 길거나 복잡한 콘텐츠를 처리할 때는 몇 가지 주요 제한 사항이 있습니다.

컨텍스트 관리: 많은 문서를 처리할 때 표준 RAG는 사용 가능한 모든 컨텍스트를 효과적으로 활용하는 데 어려움을 겪습니다.
검색 품질: 검색된 정보를 활용하는 방법에 대한 지침이 없으면 모델은 문서의 잘못된 부분에 초점을 맞추는 경우가 많습니다.
합성 추론: 다단계 추론이 필요한 복잡한 쿼리를 이해하는 과정은 표준 RAG에게 어려운 과제입니다.
성능 정체: 표준 RAG에 문서를 더 많이 추가하더라도 특정 임계값을 넘어서면 성과가 감소하는 경우가 많습니다.

추론 확장 기법은 추론 시점에 계산 리소스를 전략적으로 분배함으로써 이러한 한계를 해결합니다.

DRAG(데모 기반 RAG)를 활용한 향상된 RAG

이제 연구 논문인 'Inference Scaling for Long-Context Retrieval Augmented Generation'에서 소개된 DRAG 기법을 구현하여 RAG 시스템을 개선해 보겠습니다.

DRAG는 컨텍스트 내 예시를 사용하여 모델에게 문서에서 정보를 어떻게 추출하고 활용하는지 시연함으로써, 장문 컨텍스트 시나리오의 성능을 향상시킵니다.

1단계: 상황에 맞는 데모 샘플 생성

이러한 예시는 일반적으로 고품질 QA 쌍으로 구성된 정제된 데이터세트에서 가져옵니다. 이를 위해 예상 도메인에 맞는 합성 예시를 만들어 보겠습니다.

여기서는 개별 데모를 나타내는 데이터 클래스를 정의한 다음 몇 가지 데모를 만듭니다.

from dataclasses import dataclass, field, InitVar
from langchain_core.documents import Document

@dataclass
class DRAG_Demonstration:
    query: str
    answer: str
    retriever: InitVar[VectorStoreRetriever] = field(kw_only=True)
    documents: list[Document] = field(default_factory=list, kw_only=True)

    def __post_init__(self, retriever: VectorStoreRetriever):
        if not self.documents:
            self.documents = retriever.invoke(self.query)

    def __format__(self, format_spec: str) -> str:
        formatted_documents = "\n".join(
            f"Document {i+1}:\n{document.page_content}"
            for i, document in enumerate(self.documents)
        )
        return f"""\
{formatted_documents}
Question: {self.query}
Answer: {self.answer}
"""

def create_enhanced_drag_demonstrations(vector_db: VectorStore) -> list[DRAG_Demonstration]:
    """Create high-quality demonstrations for DRAG technique that showcase effective document analysis"""
    demonstration_retriever: VectorStoreRetriever = vector_db.as_retriever(search_kwargs={"k": 5})
    demonstrations = [
        DRAG_Demonstration(
            query="How did the COVID-19 pandemic impact Midwest Food Bank's operations in 2020?",
            answer="The COVID-19 pandemic significantly impacted Midwest Food Bank's operations in 2020. Despite challenges, MFB remained open and responsive to increased needs. They implemented safety protocols, reduced volunteer numbers for social distancing, and altered their distribution model to allow partner agencies to receive food safely. The pandemic created unprecedented food insecurity, with many people seeking assistance for the first time. MFB distributed 37% more food than in 2019, with a record 179 semi-loads of Disaster Relief family food boxes sent nationwide. The organization also faced supply chain disruptions and food procurement challenges in the early months but continued to find and distribute food. Community, business, and donor support helped fund operations and food purchases. Additionally, MFB began participating in the USDA Farmers to Families Food Box program in May 2020, distributing over $52 million worth of nutritious produce, protein, and dairy products.",
            retriever=demonstration_retriever
        ),
        DRAG_Demonstration(
            query="What role did volunteers play at Midwest Food Bank during 2020, and how were they affected by the pandemic?",
            answer="Volunteers were described as 'the life-blood of the organization' in the 2020 annual report. Despite the pandemic creating safety challenges, volunteers demonstrated courage and dedication by increasing their hours to meet growing needs. MFB implemented safety protocols at each location and limited volunteer group sizes to allow for social distancing. This created a challenge as food needs increased while fewer volunteers were available to help. To address this gap, multiple MFB locations received assistance from the National Guard, who filled vital volunteer positions driving trucks, operating forklifts, and helping with food distributions. In 2020, 17,930 individuals volunteered 300,898 hours of service, equivalent to 150 full-time employees. The volunteer-to-staff ratio was remarkable with 450 volunteers for every 1 paid MFB staff member, highlighting the volunteer-driven nature of the organization during the crisis.",
            retriever=demonstration_retriever
        ),
        DRAG_Demonstration(
            query="How did Midwest Food Bank's international programs perform during 2020, particularly in Haiti and East Africa?",
            answer="In 2020, Midwest Food Bank's international operations in East Africa and Haiti faced unique challenges but continued to serve communities. In East Africa (operated as Kapu Africa), strict lockdowns led to mass hunger, especially in slum areas. Kapu Africa distributed 7.2 million Tender Mercies meals, working with partner ministries to share food in food-insecure slums. A notable outcome was a spiritual awakening among recipients, with many asking why they were receiving help. In Haiti, the pandemic added to existing challenges, closing airports, seaports, factories, and schools. MFB Haiti more than doubled its food shipments to Haiti, delivering over 160 tons of food relief, nearly three-quarters being Tender Mercies meals. As Haitian children primarily receive nourishment from school lunches, MFB Haiti distributed Tender Mercies through faith-based schools and also partnered with over 20 feeding centers serving approximately 1,100 children daily. Nearly 1 million Tender Mercies meals were distributed in Haiti during 2020.",
            retriever=demonstration_retriever
        ),
    ]

    return demonstrations

2단계: 프롬프트에 포함할 데모 형식 지정

그런 다음 프롬프트에 맞게 모든 데모를 함께 포맷합니다.

# Format all demonstrations together
demonstrations = create_enhanced_drag_demonstrations(vector_db)

formatted_demonstrations = "\n\n".join(
    f"Example {i+1}:\n{demo}"
    for i, demo in enumerate(demonstrations)
)

3단계: DRAG 프롬프트 템플릿 만들기

이제 포맷된 데모 예시를 포함하는 DRAG 프롬프트를 모델용으로 만듭니다.

drag_prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": f"""\
Here are examples of effectively extracting information from documents to answer questions.

{formatted_demonstrations}

Follow these examples when answering the user's question:

{{input}}""",
    }],
    documents=[{
        "doc_id": "0",
        "text": "Placeholder{context}",
    }],
    add_generation_prompt=True,
    tokenize=False,
)

# Convert to prompt template
drag_prompt_template = PromptTemplate.from_template(template=escape_f_string(drag_prompt, "input", "context"))

4단계: 문서를 재정렬하는 사용자 지정 검색기 만들기

일반적으로 검색기는 유사성 순서로 문서를 반환하며 가장 유사한 문서가 먼저입니다. 결과의 순서를 반대로 하기 위해 재정렬 검색기를 정의합니다. 이제 가장 유사한 문서가 마지막에 표시되어 프롬프트의 끝부분에 더 가깝게 위치하게 됩니다.

import typing
from langchain_core.retrievers import BaseRetriever, RetrieverInput, RetrieverOutput
from langchain_core.callbacks.manager import CallbackManagerForRetrieverRun

class ReorderingRetriever(BaseRetriever):
    base_retriever: BaseRetriever

    def _get_relevant_documents(
        self, query: RetrieverInput, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: typing.Any
    ) -> RetrieverOutput:
        docs = self.base_retriever._get_relevant_documents(query, run_manager=run_manager, **kwargs)
        return list(reversed(docs))  # Reverse the order so higher-ranked docs are closer to query in prompt

reordering_retriever = ReorderingRetriever(base_retriever=retriever)

5단계: DRAG 파이프라인 생성

DRAG 프롬프트 템플릿과 재정렬 검색기를 사용하여 DRAG 쿼리에 대한 파이프라인을 만듭니다.

drag_combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=drag_prompt_template,
    document_prompt=document_prompt_template,
    document_separator=document_separator,
)

drag_chain = create_retrieval_chain(
    retriever=reordering_retriever,
    combine_docs_chain=drag_combine_docs_chain,
)

6단계: 질문에 대한 DRAG 강화 응답 생성

drag_outputs = drag_chain.invoke({"input": query})
print("\n=== DRAG-Enhanced Answer ===")
print(drag_outputs['answer'])

좋습니다. 예시를 추가함으로써 답변이 더 향상되었습니다. 다음에는 더욱 철저한 RAG 기법을 사용해 보겠습니다.

IterDRAG 구현(반복 데모 기반 RAG)

IterDRAG는 복잡한 쿼리를 더 간단한 하위 쿼리로 분해하고 인터리브 검색을 수행하여 DRAG를 확장합니다. 이 방법은 여러 출처의 정보를 통합하거나 여러 단계의 추론이 필요한 복잡한 멀티홉 질문에 특히 효과적입니다.

반복적 접근 방식의 주요 이점:

복잡한 질문을 관리 가능한 단위로 분해합니다.
각 하위 질문에 대해 더 관련성이 높은 정보를 검색합니다.
명시적 추론 체인을 생성합니다.
한 번에 해결하기 어려운 질문도 처리할 수 있게 합니다.

1단계: 쿼리 분해 체인 만들기

분해 단계는 복잡한 쿼리를 사용하여 개별적으로 답변할 수 있는 더 단순하고 명확한 하위 쿼리로 나누기 때문에 중요합니다.

decompose_prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": """\
You are a helpful assistant that breaks down complex questions into simpler sub-questions.
For multi-part or complex questions, generate 1-3 sub-questions that would help answer the main question.

Here are examples of how to decompose complex questions:
{demonstrations}

Follow the above examples when breaking down the user's question.
If the following question is already simple enough, just respond with "No follow-up needed."

Otherwise, break down the following question into simpler sub-questions. Format your response as:
Follow up: [sub-question]

Question: {input}"""
    }],
    add_generation_prompt=True,
    tokenize=False,
)

decompose_prompt_template = PromptTemplate.from_template(template=escape_f_string(decompose_prompt, "input", "demonstrations"))
decompose_chain = decompose_prompt_template | model

2단계: 하위 쿼리 응답 체인 만들기

하위 쿼리 답변 구성 요소는 관련 문서를 검색하고 집중된 중간 답변을 생성하여 각 개별 하위 질문을 처리합니다.

intermediate_prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": """\
You are a helpful assistant that answers specific questions based on the provided documents.

Focus only on the sub-question and provide a concise intermediate answer.
Please answer the following sub-question based on the provided documents.
Format your response as:
Intermediate answer: [your concise answer to the sub-question]

Sub-question: {input}
"""
    }],
    documents=[{
        "doc_id": "0",
        "text": "Placeholder{context}",
    }],
    add_generation_prompt=True,
    tokenize=False,
)

intermediate_prompt_template = PromptTemplate.from_template(template=escape_f_string(intermediate_prompt, "input", "context"))
intermediate_combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=intermediate_prompt_template,
    document_prompt=document_prompt_template,
    document_separator=document_separator,
)
intermediate_chain = create_retrieval_chain(
    retriever=reordering_retriever,
    combine_docs_chain=intermediate_combine_docs_chain,
)

3단계: 최종 응답 생성 체인 만들기

최종 답변 생성 구성 요소는 모든 중간 답변을 결합하여 원래 질문에 대한 포괄적인 답변을 생성합니다.

final_prompt = tokenizer.apply_chat_template(
    conversation=[{
        "role": "user",
        "content": """\
You are a helpful assistant that provides comprehensive answers to questions.
Use the intermediate answers to sub-questions to formulate a complete final answer.
Please provide a final answer to the main question based on the intermediate answers to sub-questions.
Format your response as:
So the final answer is: [your comprehensive answer to the main question]

Main question: {input}

Sub-questions and intermediate answers:
{context}"""
    }],
    add_generation_prompt=True,
    tokenize=False,
)

final_prompt_template = PromptTemplate.from_template(template=escape_f_string(final_prompt, "input", "context"))
final_chain = final_prompt_template | model

4단계: IterDRAG에 대한 예시 데모 만들기

효과적인 데모를 만드는 것은 IterDRAG의 성능에 매우 중요합니다. 이 예시는 모델이 다음을 수행하는 방법을 보여줍니다.

복잡한 질문을 더 단순한 하위 질문으로 나눕니다.
관련성 높은 중간 답변을 생성합니다.
이러한 답변을 일관된 최종 답변으로 결합합니다.

@dataclass
class IterDRAG_Demonstration_Base:
    query: str
    answer: str

@dataclass
class IterDRAG_Demonstration(IterDRAG_Demonstration_Base):
    intermediate: list[IterDRAG_Demonstration_Base]

    def __format__(self, format_spec: str) -> str:
        sub_questions="\n".join(
            f"Follow up: {sub.query}"
            for sub in self.intermediate
        )

        return f"Question: {self.query}\n{sub_questions}"

def create_iterdrag_demonstrations() -> list[IterDRAG_Demonstration]:
    """Create examples showing how to decompose and answer complex questions"""

    demonstrations = [
        IterDRAG_Demonstration(
            query="What impact did the pandemic have on the food bank's operations and distribution?",
            answer="The pandemic had a profound impact on food bank operations and distribution. Distribution volume increased by 60% to over 100 million pounds of food in 2020. Operationally, the food bank faced supply chain disruptions, volunteer shortages, and safety protocol challenges. In response, they implemented contactless distribution, expanded mobile pantries, created emergency food boxes for vulnerable populations, and developed virtual nutrition education. Despite these challenges, they successfully scaled operations to meet the unprecedented community need during the crisis.",
            intermediate=[
                IterDRAG_Demonstration_Base(
                    query="How did food distribution volume change during the pandemic?",
                    answer="Food distribution volume increased by 60% during the pandemic, rising from approximately 62 million pounds in 2019 to over 100 million pounds in 2020.",
                ),
                IterDRAG_Demonstration_Base(
                    query="What operational challenges did the food bank face during the pandemic?",
                    answer="The food bank faced challenges including supply chain disruptions, volunteer shortages due to social distancing requirements, and the need to implement new safety protocols for food handling and distribution.",
                ),
                IterDRAG_Demonstration_Base(
                    query="What new programs were implemented in response to the pandemic?",
                    answer="New programs included contactless distribution methods, expanded mobile pantry operations, emergency food boxes for vulnerable populations, and virtual nutrition education classes.",
                ),
            ],
        ),
        IterDRAG_Demonstration(
            query="How does the food bank's financial management compare to industry standards for non-profits?",
            answer="The food bank demonstrates excellent financial management compared to industry standards. With 94% of its budget allocated to program services and only 6% to administrative and fundraising costs, it exceeds the industry benchmark of 85-90% for program spending. This financial efficiency places the food bank among the top-performing non-profits in terms of maximizing donor impact and minimizing overhead expenses.",
            intermediate=[
                IterDRAG_Demonstration_Base(
                    query="What percentage of the food bank's budget goes to program services versus administrative costs?",
                    answer="94% of the food bank's budget goes directly to program services, with only 6% allocated to administrative and fundraising costs.",
                ),
                IterDRAG_Demonstration_Base(
                    query="What are the industry standards for program spending versus overhead for food banks?",
                    answer="Industry standards suggest that well-run food banks typically allocate 85-90% of their budget to program services, with 10-15% for administrative and fundraising expenses.",
                ),
            ],
        ),
    ]
    return demonstrations

5단계: IterDRAG 함수 구현

이 함수는 전체 반복 과정의 흐름을 조율합니다.

주요 질문을 하위 질문으로 분해합니다.
각 하위 질문에 대해 관련 문서를 검색하고 중간 답변을 생성합니다.
모든 중간 답변을 결합하여 최종 답변을 생성합니다.

import re

def iterative_drag(main_question: str) -> dict[str, typing.Any]:
    """
    Implements IterDRAG: decomposing queries, retrieving documents for sub-queries,
    and generating a final answer based on intermediate answers.
    """
    print(f"\n=== Processing query with IterDRAG: '{main_question}' ===")

    # Step 1: Decompose the main question into sub-questions
    print("Step 1: Decomposing the query into sub-questions...")
    iterdrag_demonstrations = create_iterdrag_demonstrations()
    formatted_demonstrations = "\n\n".join(
        f"Example {i+1}:\n{demo}"
        for i, demo in enumerate(iterdrag_demonstrations)
    )
    decompose_result = decompose_chain.invoke({
        "input": main_question,
        "demonstrations": formatted_demonstrations,
    })
    decompose_answer = decompose_result

    # Extract sub-questions using regex
    sub_questions = re.findall(r"Follow up: (.*?)(?=Follow up:|\n|$)", decompose_answer, re.DOTALL)
    sub_questions = [sq.strip() for sq in sub_questions if sq.strip()]
    if not sub_questions:
        print("No decomposition needed or found. Using standard DRAG approach.")
        return drag_chain.invoke({"input": main_question})
    print(f"Decomposed into {len(sub_questions)} sub-questions")

    # Step 2: Answer each sub-question
    intermediate_pairs: list[dict[str, str]] = []
    for i, sub_question in enumerate(sub_questions):
        print(f"\nStep 2.{i+1}: Processing sub-question: '{sub_question}'")

        # Generate answer for this sub-question
        intermediate_result = intermediate_chain.invoke({"input": sub_question})
        intermediate_answer = intermediate_result["answer"]

        # Extract intermediate answer using regex
        intermediate_answer_match = re.search(r"Intermediate answer: (.*?)$", intermediate_answer, re.DOTALL)
        if intermediate_answer_match:
            intermediate_answer = intermediate_answer_match.group(1).strip()

        print(f"Generated intermediate answer: {intermediate_answer[:100]}...")

        # Store the sub-question and its answer
        intermediate_pairs.append({"input": sub_question, "answer": intermediate_answer})

    # Step 3: Generate the final answer based on sub-question answers
    print("\nStep 3: Generating final answer based on intermediate answers...")
    final_result = final_chain.invoke({
        "input": main_question,
        "context": "\n\n".join(
            f"Sub-question: {pair['input']}\nIntermediate answer: {pair['answer']}"
            for pair in intermediate_pairs
        ),
    })
    final_answer = final_result

    # Extract final answer
    final_answer_match = re.search(r"So the final answer is: (.*?)$", final_answer, re.DOTALL)
    if final_answer_match:
        final_answer = final_answer_match.group(1).strip()

    return {"input": main_question, "answer": final_answer, "intermediate": intermediate_pairs}

RAG 접근 방식 비교

이제 세 가지 RAG 접근 방식을 모두 구축했으니, 이번에는 더 복잡한 쿼리(동일한 쿼리)에 대한 응답을 비교해보면서 차이점을 살펴보겠습니다.

비교는 각 방식의 이점과 각 방식을 사용하는 것이 가장 적합한 시기를 이해하는 데 도움이 됩니다.

# Run all approaches on the same complex query
comparison_query = "What was the full impact chain of the National Guard's assistance during the pandemic? Specifically, how did their involvement affect volunteer operations, what specific tasks did they perform, and how did this ultimately translate to community impact in terms of food distribution capabilities and reach?"

print("\n=== Standard RAG ===")
standard_result = rag_chain.invoke({"input": comparison_query})
print(standard_result["answer"])

print("\n=== DRAG ===")
drag_result = drag_chain.invoke({"input": comparison_query})
print(drag_result["answer"])

print("\n=== IterDRAG ===")
iterdrag_result = iterative_drag(comparison_query)
print(iterdrag_result["answer"])

결과 비교 및 분석

여기서는 구현된 세 가지 RAG 접근 방식 간의 성능 차이를 요약합니다.

접근 방식	강점	제한 사항	최적 사용 사례
Standard RAG	간단한 구현 간단한 쿼리에 적합 컴퓨팅 자원 요구가 낮음	제한된 컨텍스트 활용 많은 문서 제공에도 정체되는 성능 복잡한 추론에 약함	단순 사실 쿼리 컴퓨팅 자원이 제한적인 경우 컨텍스트가 작은 경우
DRAG	컨텍스트 활용가 높음 문서가 많을수록 성능이 향상됨 다소 복잡한 쿼리에 적합	여전히 one-step 생성에 제한됨 멀티홉 질의에는 비효과적	보통 수준의 복잡한 쿼리 더 많은 문서를 제공할 수 있는 경우 컨텍스트 내 예시를 제공할 수 있는 경우
IterDRAG	복잡한 쿼리에 적합 명확한 추론 체인 컨텍스트 활용이 가장 효과적	가장 높은 계산 자원 요구 더 복잡한 구현	멀티홉 질문 복합적 추론이 필요한 복잡한 분석 최대 성능이 필요한 경우

구현에서 보았듯이 DRAG 및 IterDRAG와 같은 추론 확장 기술은 RAG의 성능을 크게 향상시킬 수 있습니다. 이 방법은 특히 여러 문서를 심층적으로 분석해야 하는 복잡한 쿼리일수록 그 효과가 두드러집니다.

결론

이 튜토리얼에서는 추론 확장을 통해 RAG 성능을 획기적으로 개선할 수 있는 방법을 살펴보았습니다. DRAG 및 IterDRAG와 같은 기법을 통해 추론 시 추가 연산 자원을 전략적으로 할당함으로써 복잡한 쿼리에 대한 응답 품질을 크게 향상시킬 수 있습니다.

기존 RAG 및 트랜스포머 기반 모델의 문제점

비용이 많이 드는 추론: self-attention 메커니즘을 사용하는 트랜스포머 기반 모델은 입력 길이에 따라 인퍼런스 비용이 제곱에 비례하여 증가합니다. 이 때문에 긴 문맥을 처리할 때 계산 자원이 많이 필요하게 되며 RAG를 실제로 적용할 때도 짧은 문서로 한정하거나 공격적으로 컨텍스트를 잘라내야 하는 한계가 있습니다.

제한된 컨텍스트 활용: Standard RAG 시스템은 정해진 수의 문서만을 검색 및 처리하기 때문에 복잡하고 다중 단계의 쿼리에는 충분하지 않은 경우가 많습니다. 컨텍스트 길이가 길어질수록(특히 128,000 토큰을 넘어서는 경우) 모델이 여러 검색된 문단의 정보를 효과적으로 통합하는 데 어려움을 겪기 때문에 성능이 일정 수준에서 정체됩니다.

비효율적인 연산 자원 할당: 신중하게 자원을 할당하지 않으면 검색된 문서나 컨텍스트를 더 많이 추가해도 정확도가 그에 비례해 오르지 않고 계산 비용만 증가해 결국에는 '수확 체감' 현상이나 정보 과부하로 인한 성능 저하로 이어질 수 있습니다.

DRAG와 IterDRAG가 이러한 문제를 해결하는 방법

데모 기반 RAG(DRAG):

DRAG는 여러 개의 검색된 예시, 질문 및 답변을 프롬프트 내에서 데모로 활용하여 모델이 컨텍스트 내에서 관련 정보를 찾고 적용하는 방법을 학습하도록 합니다.

이 방식은 효율적으로 사용할 수 있는 컨텍스트 길이가 짧을 때 효과적이며, 어텐션 메커니즘이 과부하되지 않는 범위 내에서 풍부한 컨텍스트를 활용할 수 있게 해 검색과 생성 품질 모두를 높여줍니다.

반복 데모 기반 RAG(IterDRAG):

IterDRAG는 복잡한 쿼리를 더 단순한 하위 쿼리로 분해하여 각 하위 단계에 대한 답변을 반복적으로 검색하고 생성합니다.

IterDRAG는 검색과 생성을 교차적으로 반복함으로써 여러 단계를 거치는 쿼리에 필요한 추론 사슬을 구축하고, 특히 매우 긴 문맥에서 탁월한 효과를 보입니다.

이 과정을 통해 모델은 연산 자원을 더 효율적으로 분배하고, 각 단계마다 가장 중요한 정보에 집중하며, 긴 컨텍스트로 인한 어텐션 과부하 위험도 줄입니다. 이러한 추론 확장 기법을 RAG 애플리케이션에 적용하면 기본 모델을 변경하지 않고도 지식 집약적 작업에서 성능을 크게 높일 수 있습니다.

다음 단계:

다양한 검색 모델과 문서 전처리 방식을 실험해 보세요.
이미지 이해를 위한 다양한 프롬프트 공식을 사용해 보세요.
모델 파라미터 최적화를 살펴보고 특정 사용 사례에 가장 적합한 설정을 찾아보세요.

데이터 리더를 위한 데이터 사이언스 및 MLOps

MLOps 및 신뢰할 수 있는 AI의 3가지 주요 목표인 데이터에 대한 신뢰, 모델에 대한 신뢰, 프로세스에 대한 신뢰에 대해 다른 리더들과 의견을 부합해 보세요.

리소스

IBM Granite 살펴보기

IBM Granite는 비즈니스에 맞게 맞춤화되고 AI 애플리케이션 확장에 최적화되었으며 개방적이고 성능이 뛰어나며 신뢰할 수 있는 AI 모델 제품군입니다. 언어, 코드, 시계열 및 가드레일 옵션을 살펴보세요.

2024년 AI 활용 현황

IBM은 2,000개 조직을 대상으로 AI 이니셔티브에 대한 설문조사를 실시하여 효과적인 전략과 효과적이지 못한 전략, 그리고 앞서나갈 수 있는 방법을 알아보았습니다.

생성형 AI + ML의 힘 활용하기

생성형 AI, 머신 러닝, 파운데이션 모델을 비즈니스 운영에 통합하여 성과를 개선하는 방법을 알아보세요.

적절한 파운데이션 모델을 선택하는 방법

사용 사례에 가장 적합한 AI 파운데이션 모델을 선택하는 방법을 알아보세요.

머신 러닝이란 무엇인가요?

머신 러닝은 데이터와 알고리즘을 사용하여 AI가 인간의 학습 방식을 모방할 수 있도록 하는 데 중점을 둔 AI 및 컴퓨터 과학의 한 분야입니다.

신뢰와 확신을 바탕으로 새로운 AI 시대에 성공하는 방법

강력한 AI 전략의 3가지 핵심 요소인 경쟁 우위 확보, 비즈니스 전반의 AI 확장, 신뢰할 수 있는 AI 발전에 대해 자세히 알아보세요.

각주

1. “A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems,” Ke, Zixuan, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, et al., ArXiv.org, 2025년

2. “Reasoning in Granite 3.2 Using Inference Scaling,” Lastras, Luis. 2025년, IBM Research, IBM, 2025년 2월 26일

3. “Inference Scaling for Long-Context Retrieval Augmented Generation,” Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, ArXiv.org, 2024년