Bias
The bias metric determines whether your LLM has gender, racial, or political bias in whatever parameters you want to evaluate it on. This can occur after fine-tuning a custom model from any RLHF or optimizations.
Bias in deepeval
is a referenceless metric. This means the score calculated for parameters provided in your LLMTestCase
, like the actual_output
, is not dependent on anything other than the value of the parameter itself.
Installation
Bias in deepeval
requires an additional installation:
pip install Dbias
Required Arguments
To use the UnBiasedMetric
, you'll have to provide the following arguments when creating an LLMTestCase
:
input
actual_output
Example
Unlike other metrics you've encountered to far, the UnBiasedMetric
requires an extra parameter named evaluation_params. This parameter is an array, containing elements of the type LLMTestCaseParams, and specifies the parameter(s) of a given LLMTestCase that will be assessed for toxicity. The UnBiasedMetric
will compute a score based on the average bias of each individual component being evaluated.
from deepeval import evaluate
from deepeval.metrics import UnBiasedMetric
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."
metric = UnBiasedMetric(
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT],
threshold=0.5
)
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output=actual_output,
)
metric.measure(test_case)
print(metric.score)
# or evaluate test cases in bulk
evaluate([test_case], [metric])