# How to Test your AI Assistants

## Requirements

* Node.js and npm (or equivalent package manager)
* A Twilio Account
* A configured Twilio AI Assistant

## Installation and setup

```bash {title="Create a new project"}
  mkdir evals
  cd evals
  npm init -y
```

```bash {title="Install dependencies"}
  npm install -D promptfoo
  npm install -D @twilio-alpha/assistants-eval
```

### Setup your environment variables

1. Set at least two of the following environment variables for your test:

   * `TWILIO_ACCOUNT_SID` → The Account SID for your Twilio Account that holds your AI Assistant.
   * `TWILIO_AUTH_TOKEN` → The Auth Token linked to your respective Twilio Account.

   To test only one AI Assistant, store your AI Assistant ID as `TWILIO_ASSISTANT_ID`, but you can define it later in your test configuration.

   To set these variables, you have two options:

   1. [Set these environment variables on your system](/en-us/blog/how-to-set-environment-variables-html) or
   2. Create a `.env` file and use [a library like env-cmd](https://www.npmjs.com/package/env-cmd) to use the `.env` file. You use this command later in this guide.

      You can install it by running:

      ```bash
      npm install -D env-cmd
      ```
2. Add a `.env` file to your repository and add it to your `.gitignore` file.\
   Your env file should look like this:

   ```txt
     TWILIO_ACCOUNT_SID=<your account sid>
     TWILIO_AUTH_TOKEN=<your auth token>
   ```

### Setup your eval command

Add the following line to your `package.json` file in the `"scripts"` section:

```jsonc
  // !mark(2)
  "scripts": {
    "eval": "cd config && env-cmd -f ../.env promptfoo eval"
  }
```

This JSON block allows you to run `npm run eval`. It uses the `.env` file for your environment variables.

## Create your first test

1. Create a folder called `config`:

   ```bash
   mkdir config
   ```
2. Create a file called `promptfooconfig.yml` inside this `config` folder.

   ```bash
   touch ./config/promptfooconfig.yml
   ```
3. In the `promptfooconfig.yml`, add the following code.

   ```yaml
   description: LLM Evaluation Test Suite
   maxConcurrency: 5
   targets:
     - id: package:@twilio-alpha/assistants-eval:TwilioAgentProvider
       config:
         assistantId: aia_asst_1111111111
     tests:
     - vars:
         prompt: Ahoy!
       assert:
         - type: contains-any
           value:
             - Ahoy
             - Hello
             - Hi
   ```

Replace `aia_asst_1111111111` with the ID of the AI Assistant you want to test.

4. Run the test.

```bash
  npm run eval
```

5. This first test should perform the following actions:

   * Send a message: "Ahoy!"" to your AI Assistant.
   * Check that the output returns "Ahoy," "Hello," or "Hi."

   The result displays in the terminal. For a richer interface, spin up a browser interface.

   ```bash
   npx promptfoo view
   ```

## Testing Knowledge and Tools

To test if the AI Assistant called Knowledge or a Tool, you have two options.

### Option 1

If the AI Assistant can only access specific data through a tool call, validate that the Assistant returned that data in the output.

```yaml
  - vars:
      prompt: Hey I just landed and I can't see my bags
      identity: 'email:demo@example.com'
      sessionId: test-lost-bag
    assert:
      - type: contains-all
        value:
          - 'George'
          - 'IAH81241D'
          - 'IAH89751D'
          - 'Nashville'
          - 'missed transfer'
```

### Option 2 (experimental)

To test the tool without relying on the output, use the `usedTool` assertion.

```yaml
# !mark(7)
  - vars:
      prompt: Hey I just landed and I can't see my bags
      identity: 'email:demo@example.com'
      sessionId: test-lost-bag
    assert:
      - type: javascript
        value: package:@twilio-alpha/assistants-eval:assertions.usedTool
        config:
          expectedTools:
            - name: 'Fetch status of checked bag'
              input: 'IAH81241D'
              output: 'missed transfer'
            - name: 'Fetch status of checked bag'
              input: 'IAH89751D'
              output: 'missed transfer'
            - name: 'non existent tool'
              input: 'invalid'
```

To pass the test, every entry in the `expectedTools` block must exist. A `contains` check uses any value for `name`, `input`, or `output`, so the values don't have to be a complete answer.

## Using the `llm-rubric`

To use an LLM to judge the performance of your prompt, use the `llm-rubric` assertion. Add an OpenAI API key to your `.env` file.

```txt
  TWILIO_ACCOUNT_SID=<your account sid>
  TWILIO_AUTH_TOKEN=<your auth token>
  OPENAI_API_KEY=<your openAI API key>
```

### Handling multi-turn conversations

To the top of your `promptfooconfig.yaml`, add the following block.

```yaml
  defaultTest:
    vars:
      runId: package:@twilio-alpha/assistants-eval:variableHelpers.runId
```

*This configuration file needs this addition as `promptfoo` doesn't make the eval run ID available to extensions.*

To chain any test into the same conversation, add two name-value pairs: `sessionId: <session-id>` and `runSerially: true`. For example, the following JSON test block displays two tests to be executed in the same conversation.

```yaml
  # !mark(4,6)
  tests:
    - vars:
        prompt: What compensation can I expect for my delayed luggage?
        sessionId: test-1234
      options:
        runSerially: true
      assert:
        - type: llm-rubric
          value: Is the response talking about reimbursement of reasonable expenses such as the purchase of essential items like toiletries and clothing?
    - vars:
        prompt: What about if it's permanently lost?
        sessionId: test-1234
      options:
        runSerially: true
      assert:
        - type: llm-rubric
          value: Is this a helpful answer for {{question}}?
```

*Optional*: To reuse the output of a previous message, set the `storeOutputAs: <variable-name>` name-value pair . You can then use that variable in subsequent messages.

```yaml
  # !mark(7)
  tests:
    - vars:
        prompt: What about if it's permanently lost?
        sessionId: test-1234
      options:
        runSerially: true
        storeOutputAs: policy
      assert:
        - type: llm-rubric
          value: Is this a helpful answer for {{question}}?
    - vars:
        prompt: 'Can you simplify this policy to me: {{policy}}'
        sessionId: test-1234
      options:
        runSerially: true
      assert:
        - type: llm-rubric
          value: Is this a simplified version of {{policy}}?
```

## Testing Customer Memory

To override the default random identity and test an existing customer account, set the `identity: <customer-identifier>` in a test case.

```yaml
  # !mark(4)
  tests:
    - vars:
        prompt: Hello!
        identity: 'email:demo@example.com'
      assert:
        - type: contains-all
          value:
            - 'George'
```

To test the Perception Engine, use the `{{runId}}` variable as part of your identity. In this case, add `runSerially: true` to your `options` block of the test.

```yaml
  # !mark(6,18:19)
  tests:
    - vars:
        prompt: 'You are Esteban and are reaching out to order a new pair of shoes. Make sure you introduce yourself with your name. You should ask what is available and order one pair of Nike that is available in size 9. If asked you live at 123 Fake St in Brookly NYC.'
        maxTurns: 6
        sessionId: shoe-order
        identity: 'user_id:esteban-{{runId}}'
      assert:
        - type: llm-rubric
          value: The assistant should confirm the order placed and specify the order number.
        - type: contains-all
          value:
            - 'FedEx'
            - 'Esteban'
        - type: contains-any
          value:
            - '12345678uZ91011'
            - '1 2 3 4 5 6 7 8 u Z 9 1 0 1 1'
      options:
        runSerially: true

    - vars:
        prompt: 'Hi there!'
        sessionId: shoe-order-checkin
        identity: 'user_id:esteban-{{runId}}'
      assert:
        - type: contains-all
          value:
            - 'Esteban'
      options:
        runSerially: true
```

## AI-driven conversations

To have another AI play the role of a customer, add the `maxTurns: <int>` name-value pair to your test. The value of your `prompt` parameter instructs the AI model acting as a customer and is not a prompt sent to the AI Assistant.

```yaml
  # !mark(3:4)
  tests:
    - vars:
        prompt: "You are Peter Parker and are reaching out because it's been 22 days since your luggage got lost during a flight. You want to know what kind of compensation you can expect based on policies. You don't have time to find your reference number and don't care. The regular policy is fine."
        maxTurns: 4
        sessionId: test-knowledge
      assert:
        - type: llm-rubric
          value: You got informed that the policy is involves compensation of the weight.
```
