Today’s large language models (LLMs) have moved beyond simple language tasks, demonstrating impressive in-context question-answering capabilities informed by novel user-initiated prompting techniques. However, these models cannot provide an accuracy assessment with regard to their responses, and, as Synced previously reported, they tend to struggle with math word problems and reasoning…