Codecademy Logo

Retrieval-Based Chatbots

Retrieval-Based Chatbots

Retrieval-based chatbots are used in closed-domain scenarios and rely on a collection of predefined responses to a user message. A retrieval-based bot completes three main tasks: intent classification, entity recognition, and response selection.

Flowchart of intent classification, entity recognition, and response selection

Intent Similarity for Retrieval-Based Chatbots

For retrieval-based chatbots, it is common to use bag-of-words or tf-idf to compute intent similarity.

# using tf-idf to identify most likely intent
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
vectorizer = TfidfVectorizer()
tfidf_vectors = vectorizer.fit_transform(processed_docs)
cosine_similarities = cosine_similarity(tfidf_vectors[-1], tfidf_vectors)
similar_response_index = cosine_similarities.argsort()[0][-2]
best_response = documents[similar_response_index]

Entity Recognition for Retrieval-Based Chatbots

For retrieval-based chatbots, entity recognition can be accomplished using part-of-speech (POS) tagging or word embeddings such as word2vec.

import spacy
# load word2vec model
word2vec = spacy.load('en')
# call model on data
tokens = word2vec("wednesday, dog, flower")
response_category = word2vec("weekday")
output_list = list()
for token in tokens:
output_list.append(f"{token.text} {response_category.text} {token.similarity(response_category.text)}")
# output:
# wednesday weekday 0.453354920245737
# dog weekday 0.21911001129423147
# flower weekday 0.17118961389940174

Learn more on Codecademy