Learn

Lost in a multidimensional vector space after this lesson? We hope not! We have covered a lot here, so let’s take some time to recap.

  • Vectors are containers of information, and they can have anywhere from 1-dimension to hundreds or thousands of dimensions
  • Word embeddings are vector representations of a word, where words with similar contexts are represented with vectors that are closer together
  • spaCy is a package that enables us to view and use pre-trained word embedding models
  • The distance between vectors can be calculated in many ways, and the best way for measuring the distance between higher dimensional vectors is cosine distance
  • Word2Vec is a shallow neural network model that can build word embeddings using either continuous bag-of-words or continuous skip-grams
  • Gensim is a package that allows us to create and train word embedding models using any corpus of text

Instructions

1.

Load a word embedding model from spaCy into a variable named nlp.

2.

Use the loaded model to create the following words embeddings:

  • a vector representation of the word “sponge” saved in a variable named sponge_vec
  • a vector representation of the word “starfish” in a variable named starfish_vec
  • a vector representation of the word “squid” in a variable named squid_vec
3.

Use SciPy to compute the cosine distance between:

  • sponge_vec and starfish_vec, storing the result in a variable dist_sponge_star
  • sponge_vec and squid_vec, storing the result in a variable dist_sponge_squid
  • starfish_vec and squid_vec, storing the result in a variable dist_star_squid

Print dist_sponge_star, dist_sponge_squid and dist_star_squid to the terminal.

Which word embeddings are furthest apart according to cosine distance?

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?