Chatbots and Short-Term Memory
Let's go through the terminology ladder and figure out how a chatbot and an AI agent differ from each other? And also explore how short-term memory is implemented in chatbots.
Questions
Questions we will discuss:
- how do LLM, chatbot, AI workflow, co-pilot, agent, and others differ from each other?
- what to do if the user reaches the token limit during communication?
- how is short-term memory implemented in chatbots?
Steps
1. Chatbots
Explore the difference between chatbots and other things
2. Studying short-term memory
What if a user is communicating with your bot for the tenth time?
- firstly, due to the large message history, the LLM follows the latest instructions worse
- secondly, each subsequent request will cost more (if we use a proprietary LLM, all input-tokens are charged)
just read both articles, look at the pictures - the main thing is to understand the concepts!
if you have carefully studied the OpenAI API, these articles will be simple and understandable for you.
Comparison of different approaches
When deleting/summarizing messages, we lose useful information.
For example, the first messages that we are going to delete may contain a technical specification from the user - and details will be lost when deleting/summarizing.
Basic strategies for managing message history
-
Truncation by number of tokens (
max_tokens
) Allows you to limit the message history so that their total number of tokens does not exceed the specified value. This is especially useful for complying with the model's context window limitations. -
Truncation by number of messages (
max_messages
) Limits the history to a certain number of recent messages, deleting older ones.
Additional parameters for truncation settings
-
strategy="last"
Saves the latest messages, deleting older ones. This is the standard strategy for maintaining relevant context. -
include_system=True
Guarantees the preservation ofSystemMessage
, which usually contains important instructions for the model. (How to trim messages | 🦜️ LangChain) -
start_on="human"
andends_on=("human", "tool")
Ensure the correct structure of the message history, starting with the user's message and ending with the user's or tool's message.
Usage examples
To truncate the message history by the number of tokens while preserving the system message and the latest user messages, you can use the following code:
from langchain_core.messages import trim_messages
trimmed_history = trim_messages(
messages=chat_history,
max_tokens=1000,
strategy="last",
include_system=True,
start_on="human",
ends_on=("human", "tool")
)
More information about the functions and parameters of trim_messages
can be found in the official LangChain documentation:
Advanced techniques:
- we can use several short-term memories, for example:
- one for storing message history - will be located in the messages themselves
- the second for storing information about the user (for example, his preferences) - will be located in the system message
- the third for storing the user's task - will be located in the system message
- we can do retrieve on deleted messages if necessary (we will study retrieve in the future module)
Now we know...
Now you know how to manage message history in chatbots. We skip the practice of creating a chatbot - because it's easy and boring.
However, multi-agent systems await you ahead - there, working with short-term memory will be very complex and interesting.