대화 시간을 넘어서: 개인화 LLM 에이전트를 위한 Temporal Semantic Memory

Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

Jan 12, 2026•Miao Su, Yucan Guo, Zhongni Hou +8•View PDF

TL;DR Highlight

LLM 에이전트의 메모리가 '대화한 날짜'가 아닌 '실제 사건 발생 시간' 기준으로 저장·검색되도록 만들어 개인화 정확도를 최대 12.2% 높인 프레임워크

Who Should Read

LLM 에이전트에 장기 메모리(Mem0, Zep 등)를 붙이고 있는 백엔드/AI 개발자. 특히 '지난주에 뭐 했어요?', '그때 예약한 호텔이 어디예요?' 같은 시간 기반 질문을 잘 못 처리하는 문제를 겪는 개발자.

Core Mechanics

기존 메모리 시스템의 두 가지 핵심 문제: 대화 날짜와 실제 사건 날짜가 다를 때 잘못된 시간으로 저장(Temporal inaccuracy), 연속된 경험을 파편화된 점(point)으로만 저장(Temporal fragmentation)
TSM은 사건 발생 시간 기반 Temporal Knowledge Graph(TKG)를 먼저 구축하고, 여기서 월 단위로 슬라이스해 GMM 클러스터링으로 '주제(topic)'와 '페르소나(persona)' 요약을 생성
쿼리 시 spaCy로 '지난 주말'같은 상대 시간 표현을 실제 날짜 범위로 파싱해서, 그 시간 범위에 맞는 메모리만 우선 검색
메모리 업데이트는 두 단계: 새 대화 턴마다 가볍게 TKG 그래프만 업데이트(online), 월 단위로 비용이 큰 요약 재생성은 sleep-time에 처리(offline)
GPT-4o-mini 기준 LONGMEMEVAL_S에서 기존 최고 A-MEM(62.6%) 대비 74.8%로 +12.2%p 달성, 특히 시간 관련 질문에서 +22.56%p 향상

Evidence

LONGMEMEVAL_S에서 TSM 74.80% vs A-MEM 62.60% vs Zep 60.20% (GPT-4o-mini 기준)
시간 추론(Temporal) 카테고리: TSM 69.92% vs A-MEM 47.36% — 무려 +22.56%p 차이
멀티세션(Multi-Session) 카테고리: TSM 69.17% vs A-MEM 48.87% — +20.30%p 차이
LOCOMO 데이터셋 메모리 기반 방법 중 최고: TSM 76.69% vs Mem0g 68.44% vs Naive RAG 63.64%

How to Apply

Mem0·Zep처럼 대화 내용을 그냥 타임스탬프 붙여 저장하고 있다면, 저장 전에 LLM으로 '이 내용이 실제로 발생한 시간'을 추출해서 semantic_time 필드를 별도로 기록하는 방식으로 전환해볼 것
검색 쿼리에 시간 표현('지난주', '작년 여름')이 포함됐다면 spaCy의 temporal expression parser로 절대 날짜 범위를 추출한 후, 벡터 유사도 점수보다 시간 범위 매칭을 1순위 필터로 적용
연속된 대화에서 같은 주제(여행, 취미 등)가 반복된다면 월 단위로 GMM 클러스터링 후 LLM으로 요약 생성—이게 'durative memory'로, 단편적 fact보다 개인화 응답 품질을 높임

Code Example

snippet

# spaCy로 쿼리의 시간 제약 파싱 예시
import spacy
from datetime import datetime, timedelta

nlp = spacy.load("en_core_web_sm")

def parse_semantic_time(query: str, now: datetime) -> tuple[datetime, datetime]:
    """
    쿼리에서 시간 표현을 추출해 실제 날짜 범위로 변환
    예: '지난 주말에 먹은 거' -> (2026-03-07, 2026-03-08)
    """
    doc = nlp(query)
    # spaCy의 DATE/TIME 엔티티 추출
    for ent in doc.ents:
        if ent.label_ in ["DATE", "TIME"]:
            print(f"감지된 시간 표현: {ent.text}")
    
    # 실제 구현에서는 dateparser, duckling 등 활용
    # https://github.com/scrapinghub/dateparser
    return (now - timedelta(days=7), now)

# 메모리 검색 시 시간 제약 우선 적용
def temporal_rerank(candidates: list, time_range: tuple, query_embedding) -> list:
    """
    시간 범위 매칭을 1순위, 의미 유사도를 2순위로 재랭킹
    """
    def score(mem):
        in_time = mem["semantic_time"] and \
                  time_range[0] <= mem["semantic_time"] <= time_range[1]
        sem_score = cosine_similarity(query_embedding, mem["embedding"])
        # (시간 매칭 여부, 의미 유사도) 튜플로 내림차순 정렬
        return (int(in_time), sem_score)
    
    return sorted(candidates, key=score, reverse=True)

Terminology

Temporal Knowledge Graph (TKG)사람·장소·사건 같은 개념들의 관계를 시간 정보와 함께 저장하는 그래프 DB. '유저가 2024-05-03에 보스턴에 도착했다'처럼 언제 어떤 관계가 성립했는지를 명시적으로 기록.

Episodic Memory특정 시점의 구체적 사건을 저장하는 기억. '지난 화요일에 라면 먹었다'처럼 점(point) 단위 사실. 인간 기억에서 에피소드 기억에 해당.

Durative Memory단발성 사건이 아닌 지속적인 상태나 패턴을 담은 기억. '유저는 칵테일 만들기에 관심이 많다'처럼 여러 에피소드에서 추출한 요약 정보.

GMM (Gaussian Mixture Model)데이터를 여러 그룹(클러스터)으로 자동 분류하는 통계 기법. 여기서는 같은 달에 언급된 엔티티들을 주제별로 묶는 데 사용. K-means처럼 군집화하되 경계가 부드러움.

Semantic Timeline대화가 일어난 날짜 순서가 아니라, 실제 사건이 발생한 날짜 순서로 정렬된 타임라인. '내일 도쿄 여행 간다'고 오늘 말했어도, 메모리에는 내일 날짜로 저장.

Point-wise Memory각 사실을 독립된 점(point)으로 저장하는 방식. '5월 1일에 A를 했다', '5월 3일에 B를 했다'처럼 서로 연결 없이 저장해서 연속성이 사라지는 문제가 있음.

Naive RAG가장 기본적인 RAG 방식. 문서를 청크로 쪼개 벡터 DB에 넣고, 쿼리와 코사인 유사도가 높은 청크를 그냥 가져오는 것. 시간 정보나 관계는 고려하지 않음.

Related Resources

Original Abstract (Expand)

Memory enables Large Language Model (LLM) agents to perceive, store, and use information from past dialogues, which is essential for personalization. However, existing methods fail to properly model the temporal dimension of memory in two aspects: 1) Temporal inaccuracy: memories are organized by dialogue time rather than their actual occurrence time; 2) Temporal fragmentation: existing methods focus on point-wise memory, losing durative information that captures persistent states and evolving patterns. To address these limitations, we propose Temporal Semantic Memory (TSM), a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory. During memory construction, it first builds a semantic timeline rather than a dialogue one. Then, it consolidates temporally continuous and semantically related information into a durative memory. During memory utilization, it incorporates the query's temporal intent on the semantic timeline, enabling the retrieval of temporally appropriate durative memories and providing time-valid, duration-consistent context to support response generation. Experiments on LongMemEval and LoCoMo show that TSM consistently outperforms existing methods and achieves up to 12.2% absolute improvement in accuracy, demonstrating the effectiveness of the proposed method.