Please enable JavaScript.
Coggle requires JavaScript to display documents.
LLM halluciations potential datasets - Coggle Diagram
LLM halluciations potential datasets
Mathematics
GSM-8k
Math QA dataset
Comment: Seems that this dataset has annotations for formulas such as
subtract(add(multiply(10, 10), multiply(10, 80)), multiply(15, 60))
. We could potentially use this for CoT reasoning setup. The questions may be more challenging than GSM-8k
Math dataset
Comment: Most of the questions seem like one-step problems. The problems are divided into different categories (which are very specific). For example, there is
numbers__lcm
(which are questions about lcm computation).
MATH dataset
Comment: These are more quant style math questions with a range of varying difficulties (ranging from Level 1 (easy) to Level 5 (hard)). The level 5 problems may be too difficult. Also, categorisation of intermediate steps may not be completely straightforward.
MetaMATH Qa
MultiArth
Comment: Similar to GSM-8k
Logic
StrategyQA
Comment: StrategyQA is a question-answering benchmark focusing on open-domain questions where the required reasoning steps are implicit in the question and should be inferred using a strategy. StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
The categories might be factual information vs logical reasoning
LogiQA
Comment: Sourced from expert-written questions for testing human Logical reasoning. The problems can be divided into different types of reasoning problems. E.g. Categorical reasoning, Sufficient conditional reasoning, etc
CommonsenseQA
Comment: simple common sense questions such as:
What is the hopeful result of going to see a play? A. being entertained, B. meet, C. sit, D. ...
. The problem seem single step simple problems.
Useful link:
https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/