LLM으로 비공식 정치 담론의 Stance Detection 개선: 사용자 컨텍스트 활용

Exploiting contextual information to improve stance detection in informal political discourse with LLMs

Feb 4, 2026•Arman Engin Sucu, Yixiang Zhou, Mario A. Nascimento +1•View PDF

TL;DR Highlight

유저의 과거 포스트로 만든 프로필을 프롬프트에 넣었더니 정치 성향 분류 정확도가 최대 38.5%p 올랐다.

Who Should Read

소셜 미디어 분석, 콘텐츠 모더레이션, 여론 분석 파이프라인을 개발하는 백엔드/ML 엔지니어. 특히 LLM 프롬프트에 사용자 컨텍스트를 어떻게 녹여낼지 고민하는 개발자.

Core Mechanics

유저의 과거 포스트를 요약한 '정치 프로필'을 프롬프트에 붙이면, 7개 모델 전부 정확도가 크게 오름 (+17.5~38.5%p)
Grok-2-1212B는 베이스라인 35.5% → 컨텍스트 추가 후 74.0%로 가장 큰 폭으로 상승
포스트를 랜덤으로 많이 넣는 것보다, 정치 키워드 가중치로 골라낸 10~20개 포스트가 더 효과적 (50개와 정확도 차이 미미)
프로필 생성 모델과 분류 모델이 달라도 됨 — 오히려 서로 다른 모델 조합(예: Llama 3.1 생성 + Grok 분류 → 69.2%)이 단일 모델보다 나은 경우가 많음
베이스라인이 낮은 소형 모델(GPT-4o Mini, Mistral Small-24B)일수록 컨텍스트 추가 효과가 더 크게 나타남 — 컨텍스트가 모델 성능을 평준화
사용자 프로필 구조: 정치 성향, 신뢰도, 주요 발언 근거 3~5개, 자주 다루는 주제, 말투 요약을 JSON 형태로 생성

Evidence

Grok-2-1212B: 컨텍스트 없이 35.5% → 있을 때 74.0% (절대적 +38.5%p 향상)
Claude 3.7 Sonnet: 42.5% → 67.0% (+24.5%p), 가장 높은 베이스라인에서도 유의미한 상승
PoliticalSignalSelection 전략으로 10개 포스트만 사용해도 66.2% 달성, 50개 사용 시 70.2%로 소폭 증가에 그침
최고 조합인 Llama 3.1(프로필 생성) + Grok(분류) 69.2%로, 2008년 소셜 그래프 기반 기존 최고치 68.48% 초과

How to Apply

콘텐츠 분류 파이프라인에 '유저 프로필 생성 단계'를 추가: 유저의 과거 게시물 중 도메인 관련 키워드 가중치로 10~20개를 선별 → LLM으로 JSON 프로필 생성 → 분류 시 프롬프트 앞에 프로필을 prepend
모델 비용이 부담된다면, 프로필 생성은 Llama 3.1(오픈소스) 로컬 배포로, 최종 분류만 API 모델에 맡기는 하이브리드 파이프라인을 구성하면 비용 절감 가능
RAG 파이프라인에서 유저 이력 기반 개인화가 필요한 경우, 이 논문의 프로필 생성 프롬프트(Appendix A.1)를 그대로 참고해서 structured user context를 만들어 시스템 프롬프트에 삽입

Code Example

snippet

# 유저 프로필 생성 프롬프트 (논문 Appendix A.1 기반)
profile_prompt = """
Analyze the following forum posts by the user and create a concise profile summary.

1. Identify consistent indicators in their posts
2. Note recurring topics
3. Observe distinctive language patterns
4. Identify who/what they consistently criticize or support
5. Determine if there's sufficient evidence to classify them

Posts:
{user_posts}

Respond in JSON:
{
  "political_leaning": "left/right/unknown",
  "confidence": "high/medium/low",
  "key_indicators": ["3-5 specific examples"],
  "recurring_topics": ["list frequent topics"],
  "language_style": "brief description",
  "sentiment_patterns": "who/what they criticize or support",
  "context_notes": "additional relevant information"
}
"""

# 분류 프롬프트 (Appendix A.2 기반)
classification_prompt = """
Analyze this post and classify the author's stance.

IMPORTANT CONTEXT ABOUT THIS USER:
{profile_summary}

Post to classify:
{post_text}

Respond in JSON:
{
  "orientation": "LEFT|RIGHT|UNKNOWN",
  "explanation": "detailed explanation"
}
"""

# PoliticalSignalSelection: 정치 신호 강도로 포스트 선별
def score_post(post_text):
    general_terms = ['politics', 'government', 'policy', 'vote', 'election']  # weight 1
    party_terms = ['democrat', 'republican', 'liberal', 'conservative', 'trump']  # weight 2
    hot_button = ['abortion', 'gun', 'immigration', 'climate', 'healthcare']  # weight 3

    score = 0
    for term in general_terms:
        score += post_text.lower().count(term) * 1
    for term in party_terms:
        score += post_text.lower().count(term) * 2
    for term in hot_button:
        score += post_text.lower().count(term) * 3
    return score

def select_posts(posts, n=20):
    scored = sorted(posts, key=score_post, reverse=True)
    top_60 = scored[:int(n * 0.6)]
    diverse_40 = scored[int(n * 0.6):n]  # 간소화된 다양성 선택
    return top_60 + diverse_40

Terminology

Stance Detection텍스트에서 저자의 입장(찬성/반대/중립)을 자동으로 판별하는 NLP 태스크. '이 트윗 작성자가 특정 정책을 지지하는지 반대하는지'를 모델이 맞히는 것.

Zero-shot모델에게 예시를 하나도 안 보여주고 바로 분류하게 하는 방식. 학교 시험에서 유형 설명 없이 처음 보는 문제를 푸는 것과 비슷.

Contextual Enrichment모델 입력에 추가 배경 정보를 붙여서 성능을 올리는 기법. 이 논문에서는 유저의 과거 글 요약을 프롬프트 앞에 붙이는 방식.

Cross-model EvaluationA 모델이 만든 결과를 B 모델이 쓰는 식으로, 여러 모델 조합을 교차 테스트하는 평가 방식.

Weighted Lexicon단어마다 중요도 가중치를 다르게 부여한 단어 사전. 예: '투표'는 1점, '트럼프'는 2점, '낙태'는 3점 식으로.

Diminishing Returns처음엔 더 넣을수록 좋아지다가, 어느 시점부터는 추가해도 효과가 별로 없는 현상. 이 논문에서는 20개 이상 포스트를 넣어도 정확도가 거의 안 오름.

Profile Prepending유저 프로필 요약을 분류할 텍스트 앞에 붙여서 LLM에게 컨텍스트를 주는 프롬프트 기법.

Related Resources

Original Abstract (Expand)

This study investigates the use of Large Language Models (LLMs) for political stance detection in informal online discourse, where language is often sarcastic, ambiguous, and context-dependent. We explore whether providing contextual information, specifically user profile summaries derived from historical posts, can improve classification accuracy. Using a real-world political forum dataset, we generate structured profiles that summarize users'ideological leaning, recurring topics, and linguistic patterns. We evaluate seven state-of-the-art LLMs across baseline and context-enriched setups through a comprehensive cross-model evaluation. Our findings show that contextual prompts significantly boost accuracy, with improvements ranging from +17.5\% to +38.5\%, achieving up to 74\% accuracy that surpasses previous approaches. We also analyze how profile size and post selection strategies affect performance, showing that strategically chosen political content yields better results than larger, randomly selected contexts. These findings underscore the value of incorporating user-level context to enhance LLM performance in nuanced political classification tasks.