The AI Settings Dictionary

You may have noticed some LLMs allow you the ability to control certain settings - typically paired with a brief explanation that still leaves you wondering exactly what they do. Fear not! This is the living dictionary of AI settings. We will keep this up to date as we learn more about how these settings work and what they do. Context Message Limit What it is How many past messages in a conversation the AI can "remember." Why it matters Imagine texting someone with short-term memory. If the limit is 10 messages, the AI forgets anything before that. Important in long chats, like customer support or storytelling, where the AI needs to follow along. When to change it Increase it for better conversation memory. Decrease it if you're seeing errors, or to save memory. Context Window Size What it is The total number of tokens (pieces of words) the AI can consider at once — both your message and its reply. Why it matters Think of it like a whiteboard. A bigger whiteboard = more info the AI can “see” at once. If your input is too long (e.g., an entire book), it might cut off earlier parts if they don’t fit. When to change it Use large windows for long documents, coding projects, or full email threads. Not something you usually adjust, but good to know when picking a model (e.g., GPT-4-Turbo can handle 128k tokens). Frequency Penalty What it is Tells the AI: "Don’t keep using the same words over and over." Why it matters Without it, the AI might write: “This is very very very good.” Makes writing sound more natural and less robotic. Range 0 to 2 Low (0) - repetition allowed. High (1–2) - more variety, less echo. Typical value 0.5 When to change it Raise it if the AI is looping or repeating itself. Lower it for tasks like summarizing where repetition might be okay. GPU Layers What it is A performance setting for people running AI locally. It decides how much of the model runs on a GPU (fast) versus CPU (slow). Why it matters More GPU layers = faster responses (if your graphics card can handle it). When to change it Only in local installations (like using models on your own computer). You’ll need some technical know-how and a decent GPU. Max Output Tokens What it is This sets how long the AI’s reply can be — in tokens, which are like chunks of words. Why it matters Prevents the AI from writing an essay when you just want a sentence. Saves on cost or time if you're paying per token. Range 1 to thousands (depending on the model) Typical value 256–512 tokens (about 100–300 words) When to change it Increase it for long answers (e.g., stories, detailed code). Decrease it for quick replies or summaries. Presence Penalty What it is Encourages the AI to introduce new topics instead of sticking to what it already said. Why it matters Great for brainstorming, exploring ideas, or avoiding repetition. Range 0 to 2 Low (0) - stick to what's already been said. High (1–2) - add new ideas, mix things up. Typical value 0.5–1.0 When to change it Raise it when you want variety (like idea generation). Lower it if you want the AI to stay focused on a specific topic. Temperature What it is Controls how creative or random the AI is when generating responses. Why it matters Think of it like a dial between "robot mode" and "creative mode." Range 0 to 1 Low (0–0.3) very focused, factual, safe. Example: Good for math, summaries, or coding. High (0.7–1: - more playful, unpredictable, and surprising. Example: Useful for stories, jokes, or coming up with weird ideas. Typical value 0.7 When to change it Lower for accurate info or professional tone. Raise for fun, brainstorming, or “think outside the box” tasks. Top K What it is The AI predicts many possible next words — Top K limits its choices to just the K most likely ones. Why it matters A low K is like giving the AI a strict script: “Pick from just the top 5 words.” A high K gives it more freedom: “Pick from the top 50 or even 100!” Range 1 to 100+ Low (1–10) - safe, predictable — might repeat phrases. High (50–100) - more creative, but risks weird or off-topic replies. Typical value 40–50 When to change it Raise it to make the AI more surprising or fun. Lower it if you want tighter, more professional responses. Top P (Nucleus Sampling) What it is Instead of limiting the number of word options (like Top K), it limits the total p

May 3, 2025 - 22:46

You may have noticed some LLMs allow you the ability to control certain settings - typically paired with a brief explanation that still leaves you wondering exactly what they do.

Fear not! This is the living dictionary of AI settings. We will keep this up to date as we learn more about how these settings work and what they do.

Context Message Limit

What it is

How many past messages in a conversation the AI can "remember."

Why it matters

Imagine texting someone with short-term memory. If the limit is 10 messages, the AI forgets anything before that.
Important in long chats, like customer support or storytelling, where the AI needs to follow along.

When to change it

Increase it for better conversation memory.
Decrease it if you're seeing errors, or to save memory.

Context Window Size

What it is

The total number of tokens (pieces of words) the AI can consider at once — both your message and its reply.

Why it matters

Think of it like a whiteboard. A bigger whiteboard = more info the AI can “see” at once.
If your input is too long (e.g., an entire book), it might cut off earlier parts if they don’t fit.

When to change it

Use large windows for long documents, coding projects, or full email threads.
Not something you usually adjust, but good to know when picking a model (e.g., GPT-4-Turbo can handle 128k tokens).

Frequency Penalty

What it is

Tells the AI: "Don’t keep using the same words over and over."

Why it matters

Without it, the AI might write: “This is very very very good.”
Makes writing sound more natural and less robotic.

Range

0 to 2

Low (0) - repetition allowed.
High (1–2) - more variety, less echo.

Typical value

0.5

When to change it

Raise it if the AI is looping or repeating itself.
Lower it for tasks like summarizing where repetition might be okay.

GPU Layers

What it is

A performance setting for people running AI locally. It decides how much of the model runs on a GPU (fast) versus CPU (slow).

Why it matters

More GPU layers = faster responses (if your graphics card can handle it).

When to change it

Only in local installations (like using models on your own computer).
You’ll need some technical know-how and a decent GPU.

Max Output Tokens

What it is

This sets how long the AI’s reply can be — in tokens, which are like chunks of words.

Why it matters

Prevents the AI from writing an essay when you just want a sentence.
Saves on cost or time if you're paying per token.

Range

1 to thousands (depending on the model)

Typical value

256–512 tokens (about 100–300 words)

When to change it

Increase it for long answers (e.g., stories, detailed code).
Decrease it for quick replies or summaries.

Presence Penalty

What it is

Encourages the AI to introduce new topics instead of sticking to what it already said.

Why it matters

Great for brainstorming, exploring ideas, or avoiding repetition.

Range

0 to 2

Low (0) - stick to what's already been said.
High (1–2) - add new ideas, mix things up.

Typical value

0.5–1.0

When to change it

Raise it when you want variety (like idea generation).
Lower it if you want the AI to stay focused on a specific topic.

Temperature

What it is

Controls how creative or random the AI is when generating responses.

Why it matters

Think of it like a dial between "robot mode" and "creative mode."

Range

0 to 1

Low (0–0.3) very focused, factual, safe.
- Example: Good for math, summaries, or coding.
High (0.7–1: - more playful, unpredictable, and surprising.
- Example: Useful for stories, jokes, or coming up with weird ideas.

Typical value

0.7

When to change it

Lower for accurate info or professional tone.
Raise for fun, brainstorming, or “think outside the box” tasks.

Top K

What it is

The AI predicts many possible next words — Top K limits its choices to just the K most likely ones.

Why it matters

A low K is like giving the AI a strict script: “Pick from just the top 5 words.”
A high K gives it more freedom: “Pick from the top 50 or even 100!”

Range

1 to 100+

Low (1–10) - safe, predictable — might repeat phrases.
High (50–100) - more creative, but risks weird or off-topic replies.

Typical value

40–50

When to change it

Raise it to make the AI more surprising or fun.
Lower it if you want tighter, more professional responses.

Top P (Nucleus Sampling)

What it is

Instead of limiting the number of word options (like Top K), it limits the total probability.

Why it matters

Imagine a pie chart of likely next words. Top P tells the AI, “Only pick from the smallest slice that makes up, say, 90% of the pie.”
This allows more flexibility than Top K, especially when probability shifts a lot.

Range

0 to 1

Low (0.1–0.3) - very focused and repetitive.
High (0.9–1.0) - much more varied and playful.

Typical value

0.9

When to change it

Lower for more technical, rule-following output.
Raise for more imagination, idea generation, or fun conversations.