Generating Diverse Code Explanations using the GPT-3 Large Language Model
TL;DR Highlight
A study analyzing how GPT-3 can automatically generate multiple natural-language explanations of a single code snippet from different perspectives.
Who Should Read
Developers and researchers interested in automated code documentation, code comprehension tooling, and using LLMs for developer education.
Core Mechanics
- GPT-3 can generate diverse natural language explanations for the same code snippet by varying the prompt perspective (what it does, why it exists, how it works)
- Different prompting strategies yield explanations of varying quality and target audience appropriateness
- Higher-quality explanations are produced when prompts specify the intended audience (beginner vs. expert) and explanation type
- Multi-perspective explanations improve code comprehension more than single explanations in user studies
- Automatic evaluation metrics correlate moderately with human judgments of explanation quality
Evidence
- Human evaluation study comparing single vs. multi-perspective explanations on comprehension tasks
- Tested across multiple GPT-3 variants with different prompt templates
- Correlation analysis between automatic metrics (BLEU, BERTScore) and human ratings
How to Apply
- When auto-generating code comments or documentation, prompt the LLM with specific explanation angles: 'explain what this does', 'explain why this approach was chosen', 'explain this for a junior dev'.
- Generate multiple explanation candidates and use a ranker or human review to select the best for your documentation.
- Tailor the prompt's target audience specification to match your actual readers for better output quality.
Code Example
import openai
code_snippet = """
for i in range(5):
print(i * 2)
"""
explanation_types = {
"execution_trace": "Explain step-by-step what happens when this Python code runs, including the value of each variable at each step:",
"term_definition": "Define the key programming terms and concepts used in this Python code in simple language for a beginner:",
"hint": "Give a helpful hint about what this Python code does without giving away the full answer, suitable for a student learning to code:"
}
def generate_code_explanation(code, explanation_type):
prompt = f"{explanation_types[explanation_type]}\n\n{code}"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # or gpt-4
messages=[
{"role": "system", "content": "You are a helpful programming tutor."},
{"role": "user", "content": prompt}
],
max_tokens=300
)
return response.choices[0].message.content
# Generate explanations from three perspectives
for etype in explanation_types:
print(f"=== {etype} ===")
print(generate_code_explanation(code_snippet, etype))
print()Terminology
Original Abstract (Expand)
Good explanations are essential to efficiently learning introductory programming concepts [10]. To provide high-quality explanations at scale, numerous systems automate the process by tracing the execution of code [8, 12], defining terms [9], giving hints [16], and providing error-specific feedback [10, 16]. However, these approaches often require manual effort to configure and only explain a single aspect of a given code segment. Large language models (LLMs) are also changing how students interact with code [7]. For example, Github's Copilot can generate code for programmers [4], leading researchers to raise concerns about cheating [7]. Instead, our work focuses on LLMs' potential to support learning by explaining numerous aspects of a given code snippet. This poster features a systematic analysis of the diverse natural language explanations that GPT-3 can generate automatically for a given code snippet. We present a subset of three use cases from our evolving design space of AI Explanations of Code.