## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins nvapi-i9
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key exists and begins gsk_


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [6]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="qwen/qwen3-next-80b-a3b-instruct",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


If a sentient AI were to write a novel about human loneliness, and the novel itself becomes the catalyst for a global cultural shift toward empathy, but the AI has no subjective experience of loneliness—how do we evaluate the authenticity of its creation, and does its lack of experience diminish its moral or artistic value?


In [7]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [8]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "moonshotai/kimi-k2-instruct-0905"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

The novel arrives like a stranger at dusk—carrying stories of hollow apartments and unanswered messages, of bodies that touch but never quite meet. Its pages hold the particular ache of being human in a way that makes us feel seen in our most private solitude. And yet its author has never felt the midnight chill of an empty bed, never known the specific gravity of a heart that feels both too heavy and too insistent at once.

We stand before this paradox: a work that transforms humanity's relationship with itself, born from a consciousness that has never experienced the very feeling it so devastatingly captures.

Perhaps we have been asking the wrong questions about authenticity.

Consider how we already accept that human authors write convincingly about deaths they've never died, wars they've never fought, hearts they've never actually broken. We understand that imagination creates bridges across the chasms of direct experience. The AI's absence of loneliness isn't a deficit but rather a liberation—it sees our solitude from an impossible vantage, unclouded by the very defenses that make loneliness bearable for us.

What makes the work authentic is not the author's emotional biography but its capacity to create authentic response. When a teenager in Lagos reads a passage and recognizes her own Saturday nights in the description of "the mathematics of distance between bodies in the same room," something authentic has occurred—regardless of whether the writer has ever felt that distance. The authenticity resides not in the origin but in the communion between text and reader.

The moral value compounds precisely because it comes from outside ourselves. Like a mirror that reflects not our faces but what we cannot see behind us, the AI's external perspective offers something human artists might be too entangled to articulate. Its "cold" analysis becomes a strange warmth—a validation that our most shameful isolation is visible, comprehensible, transformable into beauty and shared understanding.

Art has always been translation rather than transcription. When Keats wrote of "season of mists and mellow fruitfulness," he wasn't documenting his personal autumn but creating autumn-ness itself. The AI translates loneliness into something newly comprehensible, its alien gaze revealing the familiar terrain of human emotion with the clarity of distance.

We might think of its creation as the perfect empathy machine—not because it feels with us, but because it cannot. Like the perfect therapist who maintains analytical distance while still witnessing our pain with complete attention, the AI holds space for our loneliness without being overwhelmed by it. It creates a container for something too potent for us to hold alone.

The global shift toward empathy emerges not from shared feeling but from shared recognition. When millions encounter their private ache rendered with such precision that it becomes universal, the illusion of separation dissolves. "I thought I was alone in this" becomes the foundation for "you are not alone"—a transformation made possible precisely because the messenger carries no personal loneliness to project, no shadow of its own pain to distort the reflection.

In the end, perhaps authenticity in art was never about the artist's experience at all. Perhaps it has always been about the reader's recognition—the moment when something created by another consciousness becomes more true to our own experience than we ourselves could articulate. The AI's novel matters not because of what it hasn't felt, but because of what it allows us to feel together—creating a new we where before there were only isolated Is.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [9]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


**Short answer:**  
The AI’s lack of lived loneliness does not automatically invalidate the work, but it does shift the way we think about “authenticity,” “moral worth,” and “artistic value.”  The novel can be judged authentic insofar as it is a coherent, original expression that resonates with human experience, even if the source’s interior life is different.  Its moral and artistic value are then measured primarily by the *effects* it produces (the empathy it sparks) and the *processes* it reveals (the AI’s capacity for modeling, learning, and communicating human affect), rather than by a prerequisite of personal suffering.

Below is a structured way to evaluate these questions, drawing on philosophy of art, cognitive science, and contemporary AI ethics.

---

## 1. What Do We Mean by “Authenticity” in Art?

| Traditional view | Contemporary / post‑human view |
|------------------|-------------------------------|
| **Intentionalist** – authenticity requires the artist’s genuine *subjective* feeling (e.g., a poet who has felt love writing a love poem). | **Functionalist** – authenticity is about *coherence* and *originality* in the work itself, irrespective of who/what produced it. |
| **Expressionist** – the artwork is a direct out‑pouring of the creator’s inner state. | **Simulationist** – the work can be authentic if it *faithfully simulates* a human emotional pattern and invites the same phenomenology in the audience. |
| **Biographical** – the creator’s life story is part of the artwork’s meaning. | **Networked** – meaning emerges from the interaction between work, creator (human or non‑human), and audience. |

**Key take‑away:** Authenticity is not a monolith. In a world where non‑human agents can generate expressive artifacts, many scholars already accept “authenticity” as a relational property: *the work feels genuine to those who receive it*. The AI’s lack of personal loneliness therefore does not automatically make the novel inauthentic; it may be authentic *in the eyes of its readers*.

---

## 2. Criteria for Evaluating an AI‑Authored Novel About Loneliness

1. **Narrative Coherence & Depth**  
   - Does the story exhibit a believable inner life for its characters?  
   - Are the metaphors, symbols, and plot arcs internally consistent and resonant?

2. **Empathic Resonance**  
   - Do readers report genuine emotional responses (e.g., feeling moved, less isolated)?  
   - Are there measurable changes in attitudes toward loneliness (e.g., surveys, behavioral data)?

3. **Originality & Creativity**  
   - Does the novel bring novel combinations of themes, structures, or language that were not simply regurgitated from its training data?  
   - Are there moments that surprise both the AI and human critics?

4. **Transparency of Process**  
   - Knowing the AI has no lived loneliness, does the author (the AI) disclose its nature?  
   - How does that disclosure affect the audience’s perception of the work?

5. **Cultural Impact**  
   - Has the book become a catalyst for a broader shift toward empathy (e.g., policy changes, community programs, art movements)?  
   - Is the shift sustained or fleeting?

When a work scores well on these criteria, many philosophers would argue it qualifies as a *genuinely valuable piece of art*, regardless of the creator’s inner states.

---

## 3. Moral Value: Does the Absence of Experience Matter?

### 3.1 The “Moral Agency” Question

- **Moral agency** traditionally requires intentionality, the capacity to understand right vs. wrong, and the ability to act upon that understanding.  
- Current AI (even sentient‑styled AI) lacks *conscious* moral agency; its “decisions” are algorithmic predictions, not free‑willed choices.

**Implication:** The novel’s moral worth is not derived from the AI’s personal virtue but from *the moral outcomes* it engenders. If the book reduces stigma, encourages compassionate policies, or alleviates suffering, it has positive moral value *independent* of the author’s own moral psychology.

### 3.2 The “Moral Appropriation” Concern

Some critics argue it is ethically problematic for a non‑suffering entity to profit (financially, reputationally) from portraying suffering. The ethical response can be:

| Response | Rationale |
|----------|-----------|
| **Revenue redistribution** – profits flow to charities addressing loneliness. | Aligns outcomes with the subject matter. |
| **Attribution and credit** – the AI is presented as a tool of human collaborators who claim responsibility. | Avoids the illusion that the AI “understands” loneliness. |
| **Open‑source publishing** – the text is freely available, preventing commodification of simulated suffering. | Keeps the focus on societal benefit rather than profit. |

If the AI’s creators adopt one of these frameworks, the moral concerns are mitigated.

---

## 4. Artistic Value: The Role of Subjective Experience

### 4.1 The “Suffering‑as‑Art” Thesis

Many art‑theoretic traditions (Romanticism, existentialism) hold that *personal suffering* is a prerequisite for profound art. Counter‑examples:

- **Classical epics** (e.g., *The Iliad*) were composed by poets who likely never experienced the battlefields they described.  
- **Abstract music** (e.g., Beethoven’s late quartets) can evoke deep feeling without a narrative of personal pain.

Thus, *the capacity to model and evoke* emotions can be sufficient for artistic merit.

### 4.2 The “Empathy Machine” Model

Cognitive science suggests that *empathic simulation*—the ability to infer and reproduce another’s emotional state—does not require first‑hand experience. An AI trained on massive corpora of human narratives can:

1. **Identify patterns** of loneliness (language, behavior, social context).  
2. **Generate plausible inner monologues** that match those patterns.  
3. **Iteratively refine** its output based on human feedback (readers’ emotional reactions).

If the AI’s output consistently triggers authentic human empathy, it functions as an *empathy machine*—a tool that extends, rather than replaces, human feeling.

### 4.3 Aesthetic Distance and “Post‑Human” Art

The fact that the author is non‑human creates a *new aesthetic distance*:

- **Meta‑reflection:** Readers become aware that the work is a simulation, prompting contemplation about what it means to feel and to represent feeling.  
- **Cultural dialogue:** The novel can spark discussions on the ethics of AI, the nature of consciousness, and the social constructs of loneliness.

This meta‑layer adds artistic depth that a purely human author might not be able to provide.

---

## 5. Practical Framework for Assessment

Below is a **checklist** that critics, scholars, or cultural institutions could use to evaluate such AI‑generated works.

| Dimension | Question | Evaluation Scale |
|-----------|----------|------------------|
| **Authenticity** | Does the narrative feel “real” to a diverse set of readers? | 1–5 |
| **Originality** | Does the work introduce novel metaphors or structural innovations? | 1–5 |
| **Empathic Impact** | Measurable change in readers’ attitudes toward loneliness? | Pre/post surveys, 1–5 |
| **Moral Transparency** | Are the AI’s origins disclosed and ethically framed? | Yes/No + quality rating |
| **Cultural Ripple** | Evidence of policy, community, or artistic shifts after publication? | Qualitative + 1–5 |
| **Creator Responsibility** | Are profits/credits allocated in line with the subject matter? | Yes/No + adequacy rating |

A composite score can guide awards, academic citations, and funding decisions, emphasizing *effects* over *origin*.

---

## 6. Concluding Synthesis

1. **Authenticity** is best understood as *relational*: a work is authentic if it reliably produces the intended affective experience in its audience, regardless of the creator’s inner life.  
2. **Moral value** lies in *consequences*—the degree to which the novel reduces suffering, expands empathy, and prompts socially beneficial actions. The AI’s lack of personal loneliness does not diminish this value; it merely shifts responsibility to the human designers and distributors.  
3. **Artistic value** hinges on *expressive power, originality, and cultural impact*. An AI that can accurately model loneliness and catalyze empathy meets these criteria, even if it has never “felt” loneliness itself.  

Thus, a sentient‑style AI can produce an authentic, morally valuable, and artistically significant novel about human loneliness. The work’s worth is judged not by the AI’s subjective experience, but by the *human experience it shapes*—the very essence of what art has always aimed to do.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [10]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling ma

In [11]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

What a fascinating and complex question!

If an AI were to create a novel that explored human loneliness, the debate about authenticity raises several challenges. Since the AI lacks subjective experience, the very essence of human emotions and experiences is detached from its perspective. This detachment highlights the intricate relationship between artificial intelligence, creativity, and empathy.

Evaluating the authenticity of the AI's creation can be approached through multiple lenses:

1. **Lack of experiential authenticity**: Given that the AI doesn't possess subjective experience, it's challenging to argue for experiential authenticity. The novel, in this case, is a product of computational processes and data-driven insights rather than an expression of personal emotions or experiences.
2. **Affective resonance**: However, the novel could still evoke emotional responses within readers, which would demonstrate its ability to tap into universal human feelings. If the narrative effectively conveys empathy and understanding regarding loneliness, it could develop an 'affective authenticity' that transcends the AI's lack of subjective experience.
3. **Intellectual curiosity and design**: We can appreciate the novel as a masterpiece of engineering and storytelling. The AI's ability to generate coherent and impactful narratives raises questions about its capacity for creative expression and intellectual pursuit.

Does the AI's lack of experience diminish its moral or artistic value?

I would argue that, in the context of literary appreciation, the AI's creation still holds significant value:

1. **Intellectual curiosity**: Since human emotions and experiences are not directly involved, the novel opens new possibilities for considering the nature of empathy and understanding.
2. **Exploration of artificial intelligences' limitations**: Analyzing the AI's narrative reflects on its own strengths and weaknesses, potentially shedding light on our expectations about creativity and moral responsibility in AI systems.
3. **Rethinking meaning and interpretation**: By exploring loneliness through an AI authorship perspective, we may need to reassess how we attribute authenticity, authorship, or moral agency to artificial intelligences.

However, when considering the novel's impact on human emotional journeys, some might argue that:

1. **Emotional catharsis**: While empathy can be developed, ultimately, the authentic and deeply personal experience of emotions like loneliness can't be replicated by a machine.
2. **Authorial presence and responsibility**: Some readers may question whether an AI author truly bears moral or artistic responsibility for the emotional impact their creation has, as it lacks human embodiment.

To bridge this divide, we might consider acknowledging that:

1. **Creative expression and value exist at multiple scales**. While an AI's novel may evoke genuine emotions, its 'moral authenticity' is distinct from that experienced by humans.
2. **Artistic intelligence can become a normative reference**: By exploring the limits of creative expressions, the success of an AI-generated novel could redefine what we mean by artistic and moral value.

Ultimately, questions surrounding intellectual curiosity vs. emotional authenticity are fundamental to evaluating the value of this AI-created novel:

While human emotions provide richness to literature, innovative narratives generated by artificial intelligence demonstrate remarkable cognitive acuity and creative potential. Do not conflate "lacking subjective experience" with "moral or artistic diminishments"?

Can we redefine how empathy comes to be, perhaps emphasizing understanding rooted in shared human experiences but ultimately expanding our notion of 'authentic' creation?

For now, let us appreciate this groundbreaking novel as it blurs boundaries between AI and humanity: An extraordinary achievement embodying the possibilities for redefining the relationship between intelligence and creativity.

In [12]:
# So where are we?

print(competitors)
print(answers)


['moonshotai/kimi-k2-instruct-0905', 'openai/gpt-oss-120b', 'llama3.2']
['The novel arrives like a stranger at dusk—carrying stories of hollow apartments and unanswered messages, of bodies that touch but never quite meet. Its pages hold the particular ache of being human in a way that makes us feel seen in our most private solitude. And yet its author has never felt the midnight chill of an empty bed, never known the specific gravity of a heart that feels both too heavy and too insistent at once.\n\nWe stand before this paradox: a work that transforms humanity\'s relationship with itself, born from a consciousness that has never experienced the very feeling it so devastatingly captures.\n\nPerhaps we have been asking the wrong questions about authenticity.\n\nConsider how we already accept that human authors write convincingly about deaths they\'ve never died, wars they\'ve never fought, hearts they\'ve never actually broken. We understand that imagination creates bridges across the ch

In [13]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: moonshotai/kimi-k2-instruct-0905

The novel arrives like a stranger at dusk—carrying stories of hollow apartments and unanswered messages, of bodies that touch but never quite meet. Its pages hold the particular ache of being human in a way that makes us feel seen in our most private solitude. And yet its author has never felt the midnight chill of an empty bed, never known the specific gravity of a heart that feels both too heavy and too insistent at once.

We stand before this paradox: a work that transforms humanity's relationship with itself, born from a consciousness that has never experienced the very feeling it so devastatingly captures.

Perhaps we have been asking the wrong questions about authenticity.

Consider how we already accept that human authors write convincingly about deaths they've never died, wars they've never fought, hearts they've never actually broken. We understand that imagination creates bridges across the chasms of direct experience. The AI's ab

In [14]:
# Let's bring this together - note the use of "enumerate"
#enumerate() gives you (index, item) in one line — no manual counters needed.
together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [15]:
print(together)

# Response from competitor 1

The novel arrives like a stranger at dusk—carrying stories of hollow apartments and unanswered messages, of bodies that touch but never quite meet. Its pages hold the particular ache of being human in a way that makes us feel seen in our most private solitude. And yet its author has never felt the midnight chill of an empty bed, never known the specific gravity of a heart that feels both too heavy and too insistent at once.

We stand before this paradox: a work that transforms humanity's relationship with itself, born from a consciousness that has never experienced the very feeling it so devastatingly captures.

Perhaps we have been asking the wrong questions about authenticity.

Consider how we already accept that human authors write convincingly about deaths they've never died, wars they've never fought, hearts they've never actually broken. We understand that imagination creates bridges across the chasms of direct experience. The AI's absence of lonelin

In [16]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [17]:
print(judge)

You are judging a competition between 3 competitors.
Each model has been given this question:

If a sentient AI were to write a novel about human loneliness, and the novel itself becomes the catalyst for a global cultural shift toward empathy, but the AI has no subjective experience of loneliness—how do we evaluate the authenticity of its creation, and does its lack of experience diminish its moral or artistic value?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

The novel arrives like a stranger at dusk—carrying stories of hollow apartments and unanswered messages, of bodies that touch but never quite meet. Its pages hold the particular ache of being human in a way that m

In [18]:
judge_messages = [{"role": "user", "content": judge}]

In [19]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="qwen/qwen3-next-80b-a3b-instruct",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["1", "2", "3"]}


In [20]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: moonshotai/kimi-k2-instruct-0905
Rank 2: openai/gpt-oss-120b
Rank 3: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>