Codecademy Logo

Retrieval-Based Chatbots

Retrieval-Based Chatbots

Retrieval-based chatbots are used in closed-domain scenarios and rely on a collection of predefined responses to a user message. A retrieval-based bot completes three main tasks: intent classification, entity recognition, and response selection.

Flowchart of intent classification, entity recognition, and response selection

Intent Similarity for Retrieval-Based Chatbots

For retrieval-based chatbots, it is common to use bag-of-words or tf-idf to compute intent similarity.

# using tf-idf to identify most likely intent from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity vectorizer = TfidfVectorizer() tfidf_vectors = vectorizer.fit_transform(processed_docs) cosine_similarities = cosine_similarity(tfidf_vectors[-1], tfidf_vectors) similar_response_index = cosine_similarities.argsort()[0][-2] best_response = documents[similar_response_index]

Entity Recognition for Retrieval-Based Chatbots

For retrieval-based chatbots, entity recognition can be accomplished using part-of-speech (POS) tagging or word embeddings such as word2vec.

import spacy # load word2vec model word2vec = spacy.load('en') # call model on data tokens = word2vec("wednesday, dog, flower") response_category = word2vec("weekday") output_list = list() for token in tokens: output_list.append(f"{token.text} {response_category.text} {token.similarity(response_category.text)}") # output: # wednesday weekday 0.453354920245737 # dog weekday 0.21911001129423147 # flower weekday 0.17118961389940174