Development

Leveraging Chain-of-thought to communicate with language models efficiently

Carl Lapierre

min read

Over a month ago we hosted our first AI hackathon at Osedea. Since the event, we've been on a quest to unlock the full prompting potential of GPT. As we witnessed the incredible capabilities of this large language model during the hackathon, we became obsessed with harnessing its power to deliver the most accurate and insightful results. This led us down the “Prompt Engineering” rabbit hole.

This article will share some of the knowledge we learned during our Hackathon and our outlook for the future. This way you can find easy and digestible information on the topic in one place, from our perspective.

With the rise in popularity of ChatGPT, prompt engineering has gained traction as a promising concept in artificial intelligence, particularly in natural language processing. This approach aims to align AI behavior with human intent. Prompt engineers carefully construct prompts to push generative AI models to their limits, resulting in improved performance and better outcomes for existing generative AI tools. Before delving into some of the techniques of prompt engineering, let's first provide a brief overview of how large language models (LLMs) operate.

So, how do language models actually work?

Language Models (LMs), including Large Language Models (LLMs) powering chatbots like ChatGPT, are probabilistic models designed to identify and learn statistical patterns in natural language. At their core, LMs calculate the probability of a word appearing at the end of a given input sentence. These models have been utilized for quite some time, even in everyday applications such as predictive texting on smartphones, where suggestions for the next words are provided based on the input.

The primary distinction between LMs and LLMs lies in the size of the model. With recent advancements in machine learning architectures and distributed training, developers can now train LLMs on a significantly larger scale. This increased size enhances the statistical accuracy of the model. (Although this is a simplification of how LLMs work, it's important to note that there are other factors at play, such as fine-tuning and reinforcement learning).

Lack of reasoning in LLMs

While LLMs excel at prediction, it's essential to understand that they do not possess true intelligence or reasoning capabilities. LLMs are essentially large prediction machines and lack the ability to think or reason behind their output. For instance, if I asked it to generate a sentence that ends with the same word it started with, an LLM like GPT-3 would struggle to fulfill the request.

This example demonstrates that LLMs, including ChatGPT, rely heavily on the statistical patterns they've learned and lack genuine understanding or reasoning.

Hallucinations

What we see in the example response above is also known as a hallucination. Hallucinations refer to confident responses generated by a model that are not supported by its training data. Also known as confabulation or delusion, these hallucinations occur when an algorithm ranks a response with high confidence, despite it being unrelated or incorrect. This phenomenon arises due to the LLM's limitations and is not contextualized as a weakness of the model. Hallucinations highlight the importance of caution when relying on AI models for factual answers that require reasoning, as they may provide misleading or inaccurate information.

Emergent abilities

In the example above, GPT-4 was asked the same question, although this time, it answered correctly to the prompt as opposed to GPT-3. Emergent abilities are an intriguing phenomenon observed in Large Language Models (LLMs) as they grow in size and complexity. Researchers have noticed that as LLMs, such as GPT-4, are trained on larger amounts of data with more parameters, they begin to exhibit new behaviors that go beyond their expected capabilities. This development of new abilities can be likened to unlocking superpowers, such as translation, coding, and even understanding jokes.

It’s both fascinating and somewhat concerning, as these emergent abilities are not explicitly programmed into the models but rather emerge through the learning process from vast amounts of text data. OpenAI's paper on GPT-4, titled "Sparks of AGI" delves into the details of these newfound capabilities and the signs of general intelligence they exhibit. However, despite these emergent abilities, the fundamental challenge of reasoning in LLMs still persists. As we continue to scale language models, we can expect more unexpected effects and capabilities to emerge, shaping the landscape of AI in exciting and unpredictable ways.

Now that we know the basics of LLMs, let’s take a look at prompt engineering.

What is prompt engineering?

Prompt engineering is an emerging discipline that focuses on the development and optimization of prompts for efficient use of language models (LMs). It serves as a crucial approach for understanding the capabilities and limitations of large language models and enhancing their performance across various tasks, such as question answering and arithmetic reasoning. Through prompt engineering, researchers and developers employ a range of techniques to design effective and robust prompts that interact with LLMs and other tools.

This discipline plays a vital role in expanding the capacity and applicability of LLMs across diverse applications and research topics. Some notable prompting techniques include Zero-Shot Prompting, Few-Shot Prompting, Chain-of-Thought Prompting, Self-Consistency, Generate Knowledge Prompting, Automatic Prompt Engineer, Active-Prompt, Directional Stimulus Prompting, ReAct Prompting, Multimodal CoT Prompting, Tree-of-Thoughts Prompting, and Graph Prompting. As prompt engineering continues to evolve, new techniques are being developed regularly, reflecting the ongoing efforts in this field. Let’s go over a few techniques in more detail.

Zero-shot prompting

Zero-shot prompting is a straightforward technique in prompt engineering that leverages the inherent capabilities of LLMs. With zero-shot prompting, it is possible to achieve accurate results without explicitly providing detailed instructions. For instance, when prompted for sentiment analysis without explicitly defining what sentiment is, GPT demonstrates its understanding by generating an accurate sentiment analysis response. This showcases GPT's ability to reply correctly "out of the box" based on its built-in understanding of sentiment.

Few-shot prompting

Few-shot prompting takes prompt engineering a step further by incorporating example outputs alongside the prompt. This approach enables in-context learning and guides the model towards improved performance. In the provided example, we observe a one-shot approach where the model learns to perform the task based on just a single example. However, we can enhance the prompt by scaling it with more examples, leading to more accurate responses from the model. By tweaking the prompt and utilizing these techniques, we can clearly observe the effects on the desired output. It is worth noting that for more challenging tasks, increasing the number of demonstrations, such as 3-shot, 5-shot, 10-shot, and so on, can be experimented with to further improve the model's performance.

Chain-of-thought prompting (CoT)

Chain-of-thought prompting is a technique used to enhance the reasoning abilities of large language models (LLMs). The idea behind this approach is to guide the LLMs by demonstrating the steps involved in solving a specific problem, similar to teaching a toddler how to do math. Instead of expecting direct answers, the LLMs are shown the process to reach the solution. By providing a chain of thought as an example, the LLMs learn to generate their own reasoning process, leading to more accurate outputs.

In 2022, researchers at Google published a paper on chain-of-thought prompting, highlighting its impact on prompt results. They conducted experiments using benchmarks for arithmetic reasoning abilities in LLMs and observed significant improvements when employing chain-of-thought prompting. The results showed higher scores for reasoning tasks compared to standard prompting methods. It was also noted that chain-of-thought prompting is an emergent ability in larger language models, indicating the evolving nature of prompting techniques and their dependence on model size.

result of experiments using benchmarks for arithmetic reasoning abilities in LLMs

While chain-of-thought has demonstrated positive results, it’s not without limitations. Certain tasks may still pose challenges for LLMs, and further improvements are needed to enhance their performance. However, the empirical gains achieved through chain-of-thought prompting have showcased its potential to enhance reasoning abilities in LLMs, surpassing even fine-tuned models. As the field of prompt engineering continues to evolve, it is expected that new approaches and techniques will further refine the capabilities of language models in the future.

Self-consistency with chain-of-thought

On top of CoT, we can add a technique called self-consistency which refers to a technique that involves creating an average of the results obtained from multiple iterations of a chain of thought prompt. Unlike other techniques mentioned previously, this approach does not focus on the prompt itself but rather on generating diverse answers by repeating the same 1-shot chain-of-thought prompt multiple times.

In the example above, by examining the most consistent answer among the three iterations, you can arrive at a valid result, which in this case is 9.

Tree-of-thought

A recently emerged technique called the "Tree of Thoughts" has gained prominence in self-consistent chain-of-thought prompting. ToT addresses the limitations of language models in problem-solving tasks. Since models make decisions based on individual tokens in a left-to-right manner, it restricts their effectiveness in tasks involving exploration and strategic planning.

To overcome these limitations, the ToT framework extends the "Chain of Thought" approach. It allows language models to consider coherent units of text called "thoughts" as intermediate steps in problem-solving. With ToT, models can make deliberate decisions by exploring different reasoning paths, evaluating choices, and determining the best course of action. They can also look ahead or backtrack when necessary to make more informed choices.

ReAct (Reason + Act)

And finally, we have ReAct, short for "Reason and Act". It’s a popular technique that leverages language models to generate both reasoning traces and task-specific actions. It enables LLMs to not only provide their thoughts and insights but also perform actions that interact with external sources such as databases, environments, or APIs.

With ReAct, LLMs gain access to tools like web browsers with Puppeteer, file systems, or any desired API. This access allows them to utilize external resources for tasks like web searching or retrieving information from knowledge bases. The generated reasoning traces help LLMs induce, track, and update action plans while handling exceptions.

By incorporating external tools and generating task-specific actions, ReAct enhances the LLMs' understanding of the task at hand and facilitates the retrieval of accurate and reliable information.

What can we do with prompt engineering?

Are you excited? Now that we have learned about these fascinating prompt engineering techniques, let's explore how we can make the most of them. The good news is that you don't have to become an expert in these techniques. Understanding the basics is sufficient because there are powerful tools available that implement these techniques seamlessly. One such leading framework in the field of prompt engineering is called Langchain.

Langchain, a Python and TypeScript framework launched last October, provides all the essential building blocks to leverage the capabilities of large language models effectively.Langchain integrates with a wide range of major model providers, including OpenAI, HuggingFace, Google, and others. These integrations are incredibly convenient as they allow for easy swapping of language models, enabling you to test out different behaviors across various providers.Furthermore, Langchain offers support for an extensive list of tools that agents can interact with. From search engines to AWS Lambdas and Twilio, you can also integrate your custom tools and models effortlessly using the same interface.

The framework also provides a variety of toolkit agents out of the box, catering to different purposes. For instance, SQL agents and JSON agents handle structured data independently, allowing multiple agents to interact with each other seamlessly.

Langchain is divided into seven modules: models, prompts, memory, indexes, chain, agents, and callbacks. By leveraging these modules, you can rapidly build powerful applications. For example, with just a few lines of code, you can create a tool that consumes PDFs and summarizes their content. Langchain empowers you to build customized applications, not limited to any specific language model like GPT-4.

Final thoughts

As we conclude our exploration of prompt engineering and its role in advancing language models, we are reminded of the ever-changing nature of the AI landscape. At Osedea, we are committed to staying at the forefront of this dynamic field, and we actively foster an environment of continuous learning, collaboration, and innovation.

Whether it be by organizing AI hackathons, conducting workshops or engaging in regular discussions around AI, these initiatives serve as opportunities for our team and other like-minded individuals to come together, challenge ourselves, and exchange ideas. By actively participating in these activities, we not only enhance our own capabilities, but also contribute to the collective growth and development of AI.

In this fast-paced environment, collaboration, knowledge-sharing, and continuous learning are key. By fostering an environment that encourages exploration and innovation, we can collectively navigate the evolving AI landscape and unlock new frontiers of possibility.

While we anticipate significant advancements in AI technologies in the coming months and years, it's important to remain adaptable. The availability of tools and frameworks today serves as a testament to the potential and possibilities that lie ahead. Let’s embrace the dynamic nature of AI and be prepared to harness the advancements that will shape our future.

Photo Credit: Mojahid Mottakin

‍