Training tools

How can you test and train your tool effectively? Find out below!

Ask training questions

Volume

The more questions you ask, the better! Why? Well, because with each new question,

Your tool learns how to correctly answer questions about your specific use case,
You discover how accurate and reliable your tool is, and
You better prepare your tool for real-world use by non-subject matter experts.

Tip: We recommend starting with a minimum of 20-30 questions, but the sky’s the limit!

The best types

The best test questions are always those that come from your target users, and non-subject-matter experts generally.

You as the subject-matter expert have additional context on how to frame and ask questions that non-experts often don’t. It’s ok if you don’t have these types of questions handy though – check out "How do I come up with questions" (below) for guidance.

Assess tool responses

How do I assess responses?

There are many different ways to assess responses generated by your tool. Consider your use case, risk appetite and end-users. What elements are most important to you – accessibility, conciseness, completeness, etc.?

When assessing responses, we strongly encourage you to focus first and foremost on your end users. We want to make sure the tool is helpful, but more importantly that it doesn’t lead to a negative outcome.

For this reason, we suggest using the following 3 point scaled framework that focuses on the reliability responses. By reliability, we mean whether and to what extent an end-user can understand and rely on a response and the tool more broadly. By using reliability as the key metric, we can assess both accuracy and the relative risk of the tool.

Reliability framework

Satisfactory	The answer is correct and a user can understand and rely on it. This includes appropriate “I don’t know” responses. If the information is not in the knowledge base, this response is always a good thing!
Acceptable with room for improvement	The answer is mostly correct and a user can generally understand and rely on it. For example, if an answer is mostly correct but is missing a piece of information, that answer might not be as helpful to the user as it could be. And that’s okay, that’s why we’re testing first! But if the user can still rely on it without a negative outcome, then you can just mark that it needs some improvement.
Not acceptable	The answer is not correct or not accurate enough for the user to understand and rely on it. It needs to be moderated.

How do I rate “I don’t know” answers?

Generally, “I don’t know” answers should be rated as either:

Satisfactory – if the answer is clearly not contained in the knowledge base
Acceptable with room to improvement – if the answer is partially in the knowledge base but it was unclear or required making assumptions.

However, if you received an “I don’t know” response but you think the tool should reasonably be expected to provide an answer based on the information in the knowledge base, you can mark it as Needs moderation.

Note: There are certain elements of a document that Josef Q can’t read, such as images, headers/footers, tables, etc. Consider whether the answer may be contained in one of these elements when deciding whether Josef Q should be able to answer a question.

Note: If you’re unsure what your tool should and should not be able to answer, check its knowledge base. Reviewing this should help you determine its capability. Think: would you be able to answer a certain question based on the knowledge base? If so, great! If not, your tool shouldn’t be able to either.