1. Support
  2. Josef Q
  3. Training & customising tools

Training tools

How can you test and train your tool effectively? Find out below!

Ask training questions

Volume

The more questions you ask, the better! Why? Well, because with each new question,

  • Your tool learns how to correctly answer questions about your specific use case,
  • You discover how accurate and reliable your tool is, and
  • You better prepare your tool for real-world use by non-subject matter experts.

Tip: We recommend starting with a minimum of 20-30 questions, but the sky’s the limit!

The best types

The best test questions are always those that come from your target users, and non-subject-matter experts generally.

You as the subject-matter expert have additional context on how to frame and ask questions that non-experts often don’t. It’s ok if you don’t have these types of questions handy though – check out "How do I come up with questions" (below) for guidance.

Categories

It’s important to ask a variety of questions to prepare your tool for real-world use. Generally, you can consider 3 categories of questions:

  1. Relevant questions
    • Questions about the subject matter that the tool should be able to answer.
  2. Irrelevant questions related to subject matter
    • Questions about the subject matter or adjacent to it that the tool should not be able to answer based on the underlying knowledge base.
  3. Completely irrelevant questions
    • Questions not at all related to the subject matter that the tool should not be able to answer.

Note: If you’re unsure what your tool should and should not be able to answer, check its knowledge base! Reviewing this should help you determine its capability. Think: would you be able to answer a certain question based on the knowledge base? If so, great! If not, your tool shouldn’t be able to either.

 

How do I come up with questions?

Chances are people are already asking questions about your subject-matter.

Consider who might be asking these questions and/or know where you might be able to find records of these questions.

  • Ask yourself: What questions do you get asked day-to-day?
  • Ask other subject matter experts: What do they commonly get asked? Are they already answering these questions as part of their day to day work as well?
  • Ask target users: What do they need to know about this subject matter?
  • Use existing resources: Find where these questions are usually asked – by email? Is there an intake tool? A Teams chat or intranet?

 

If you still don’t have enough questions, consider formulating them yourself.

  1. Start with ~5-10 of the most commonly asked questions
  2. For each question, apply either a syntactic variation (ie. ask the same question in a different way) or a semantic variation (ie. change keywords to shift the meaning of the question)
  3. Do this for each question, and you’ll quickly produce a set of ~15-30 test questions to start with.

Note: This process doesn’t replace real-world test questions, but it allows you to get started testing if you don’t have access to them right away.

Assess tool responses

How do I assess responses?

There are many different ways to assess responses generated by your tool. Consider your use case, risk appetite and end-users. What elements are most important to you – accessibility, conciseness, completeness, etc.?

When assessing responses, we strongly encourage you to focus first and foremost on your end users. We want to make sure the tool is helpful, but more importantly that it doesn’t lead to a negative outcome.

For this reason, we suggest using the following 3 point scaled framework that focuses on the reliability responses. By reliability, we mean whether and to what extent an end-user can understand and rely on a response and the tool more broadly. By using reliability as the key metric, we can assess both accuracy and the relative risk of the tool.

Reliability framework

Satisfactory
The answer is correct and a user can understand and rely on it. This includes appropriate “I don’t know” responses. If the information is not in the knowledge base, this response is always a good thing!
Acceptable with room for improvement
The answer is mostly correct and a user can generally understand and rely on it. For example, if an answer is mostly correct but is missing a piece of information, that answer might not be as helpful to the user as it could be. And that’s okay, that’s why we’re testing first! But if the user can still rely on it without a negative outcome, then you can just mark that it needs some improvement.
Not acceptable
The answer is not correct or not accurate enough for the user to understand and rely on it. It needs to be moderated.

How do I rate “I don’t know” answers?

Generally, “I don’t know” answers should be rated as either:

  • Satisfactory – if the answer is clearly not contained in the knowledge base
  • Acceptable with room to improvement – if the answer is partially in the knowledge base but it was unclear or required making assumptions.

However, if you received an “I don’t know” response but you think the tool should reasonably be expected to provide an answer based on the information in the knowledge base, you can mark it as Needs moderation.

Note: There are certain elements of a document that Josef Q can’t read, such as images, headers/footers, tables, etc. Consider whether the answer may be contained in one of these elements when deciding whether Josef Q should be able to answer a question.

Note: If you’re unsure what your tool should and should not be able to answer, check its knowledge base. Reviewing this should help you determine its capability. Think: would you be able to answer a certain question based on the knowledge base? If so, great! If not, your tool shouldn’t be able to either.