Retrieval-based chatbots are used in closed-domain scenarios and rely on a collection of predefined responses to a user message. A retrieval-based bot completes three main tasks: intent classification, entity recognition, and response selection.
For retrieval-based chatbots, it is common to use bag-of-words or tf-idf to compute intent similarity.
# using tf-idf to identify most likely intentfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.metrics.pairwise import cosine_similarityvectorizer = TfidfVectorizer()tfidf_vectors = vectorizer.fit_transform(processed_docs)cosine_similarities = cosine_similarity(tfidf_vectors[-1], tfidf_vectors)similar_response_index = cosine_similarities.argsort()[0][-2]best_response = documents[similar_response_index]
For retrieval-based chatbots, entity recognition can be accomplished using part-of-speech (POS) tagging or word embeddings such as word2vec.
import spacy# load word2vec modelword2vec = spacy.load('en')# call model on datatokens = word2vec("wednesday, dog, flower")response_category = word2vec("weekday")output_list = list()for token in tokens:output_list.append(f"{token.text} {response_category.text} {token.similarity(response_category.text)}")# output:# wednesday weekday 0.453354920245737# dog weekday 0.21911001129423147# flower weekday 0.17118961389940174