These are notes from Unit 1 - Prompt Engineering of Google’s 5 Day Gen AI Course
Links
Link to Podcast Link to Whitepaper
Prompting Best Practices
- Provide examples
- Design with simplicity. Keep prompts clear, concise, and easy to understand.
- Use strong action verbs
- Be specific about the desired output
- Use instructions instead of constraints. E.g. instead of saying “don’t use this library”, say instead “use only these libraries.”
- Document various prompt attempts
Sample Controls
There are a few ways we can control how the model chooses the next word to populate in a text response. Temperature is a parameter that controls how much randomness is introduced. A low temperature pushes the model to choose the most likely next word(s), aka it suppresses randomness. A high temperature allows for more randomness in what’s generated. A temperature of 0 forces the model to choose the single most probable word.
Top k sampling limits the possible next words to only the most plausible. Top p sampling is similar, but it limits the possible next words to the smallest set of words with total probability . Lower is more restrictive.
If we use all of these sample controls (temperature, top k, and top p), the model will first choose words that meet both the top k and top p criteria, then use the temperature parameter to sample from this restricted pool of possibilities.
Guidelines
For generally coherent results with a touch of creativity, consider:
- temperature = .2
- top p = .95
- top k = 30
For creativity-biased output, crank these up to something like:
- temperature = .4
- top p = .99
- top k = 40
For accuracy-biased output, dial things down:
- temperature = .1
- top p = .9
- top k = 20
And for a single correct answer, use a temperature of 0 (top k and top p aren’t needed).
Of course, these are all guidelines.
General Prompting Approaches
Zero-shot prompting — you just describe the task and assume the model should be able to handle it.
One-shot or few-shot prompting is useful for creating structured output — you provide examples in your prompt that describes, for example, the structure of the output you want.
System prompting sets the overall context and purpose for the tasks. We could use a system prompt to tell a model, for instance, “you’re a helpful ai designed to help me complete X data science task.” You could also tell the model to produce only JSON output (for example).
Role prompting entails telling the model to adopt a specific role or persona. For instance, “you are a senior software engineer whose job is to document this code.”
Step-back prompting entails asking the LLM to answer a broader, related question before providing it with the more specific details of your task. It’s kind of akin to, like, activating prior knowledge in teaching. For example, if I want the LLM to help me generate a new feature for a statistical model, I could start by asking it to describe some best-practices or general principles of feature engineering.
Chain of thought (CoT) prompting asks the model to walk through its thought process — to explicitly generate intermediate results on the way to its final output. This can often yield better results, but the tradeoff is that the output will be longer (which can be slower and more expensive).
Tree of thoughts (2T) prompting is a generalization of CoT. The model is asked to branch out, explore several chains of thought, and backtrack as needed before supplying a final answer.
ReAct (reason and act) prompting combines the reasoning step (which all of the other prompting strategies elicit) with an action step, and the results of the action step will then inform the next step. For example, the LLM might reason about what code to write, then act by executing this code. The results of this code execution can then inform the next ReAct cycle.
And finally we arrive at automatic prompt engineering (APE), where LLMs automatically generate prompts to pass into themselves or other LLMs.
Code Prompting
Always review and test code generated by an LLM before running it.
Another way to use LLMs when writing code is by asking them to explain a chunk of code to you.
You could also use LLMs to translate code between languages.
Prompting Meta-Strategies
If you’re serious about developing optimal prompting strategies, you should document your prompts and their associated responses so you can see what works best
Python Examples
See Google Gemini for notes on how to install the Python SDK.
This Kaggle notebook also contains examples for interacting the the Gemini API as well as prompting strategies.
Generate Content
A general workflow for generating content via the python sdk is
# define a config
my_config = types.GenerateContentConfig(max_output_tokens = 200) # obviously we can use values other than 200
prmpt = "Explain the benefits of kettlebell training."
response = client.models.generate_content(
model = 'gemini-2.0-flash', # or whatever model we want
config = my_config,
contents = prmpt
)
# print the response
print(response.text)
Config Options
We can pass various options to the config that affect how we interact with the models. These include max_output_tokens
, temperature
, and top_p
. Top k is not an option in the Gemini API
So we could construct a config like so:
my_config = types.GenerateContentConfig(
temperature = 2.0,
top_p = .99
)
Note that the default values are:
temperature = 1.0
top_p = .95
Although we may often want to use the defaults, we do want to think carefully about how the config values influence the output. For instance, imagine we pass the following prompt:
zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """
We really just want a 1-word response, so we might use the following config:
model_config = types.GenerateContentConfig(
temperature=0.1,
top_p=1,
max_output_tokens=5,
)
Enums
Enums can give us an even better way to control output by allowing us to constrain the output to a fixed set of values:
import enum
class Sentiment(enum.Enum):
POSITIVE = "positive"
NEUTRAL = "neutral"
NEGATIVE = "negative"
response = client.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(
response_mime_type="text/x.enum",
response_schema=Sentiment
),
contents=zero_shot_prompt)
print(response.text)
enum_response = response.parsed
print(enum_response)
print(type(enum_response))
JSON Responses
Just like we can use enums to constrain our response options/format, we can also require that our response come back as JSON. We can do this via one-shot or few-shot learning:
few_shot_prompt = """Parse a customer's pizza order into valid JSON:
EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "pepperoni"]
}
```
EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}
```
ORDER:
"""
customer_order = "Give me a large with cheese & pineapple"
response = client.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(
temperature=0.1,
top_p=1,
max_output_tokens=250,
),
contents=[few_shot_prompt, customer_order])
print(response.text)
Or we can construct types that describe the response format as JSON, which is probably the better choice.
import typing_extensions as typing
class PizzaOrder(typing.TypedDict):
size: str
ingredients: list[str]
type: str
response = client.models.generate_content(
model='gemini-2.0-flash',
config=types.GenerateContentConfig(
temperature=0.1,
response_mime_type="application/json",
response_schema=PizzaOrder,
),
contents="Can I have a large dessert pizza with apple and chocolate")
print(response.text)