These are embeddings for the Neo4j recommendations dataset.
The embeddings and vector indexes allow you to find semantically similar bios, plots, and posters.
Get Started
Create recommendations database
To load the embeddings you will need to create a Neo4j database with the recommendations dataset.
You can create a new Sandbox instance by logging into the sandbox.neo4j.com and selecting "Recommendations".
You can restore the dataset in your own database using the recommendations-50.dump file.
Load the embeddings
To load all the embeddings, download and run the load_all.cypher
query.
See below for the individual embeddings and partial datasets.
Embeddings
The embeddings were created for the properties using the following models:
-
Person.bio
- OpenAI text-embedding-ada-002 -
Movie.plot
- OpenAI text-embedding-ada-002 -
Movie.poster
- OpenAI CLIP
There are full embedding and a sample of the embeddings for the first 1000 (1k) nodes in the dataset.
There are also Cypher queries to load the embeddings and create the indexes.
The Python code to generate the embeddings is in the src
directory.
All files are at data.neo4j.com/rec-embed.
Person.bio
-
Embeddings (full) -
person-bio-embeddings.csv
-
Embeddings (sample) -
person-bio-embeddings-1k.csv
-
Load Cypher -
bio_load.cypher
-
Create Index Cypher -
bio_create_index.cypher
-
Example Query -
bio_query.cypher
-
Python code -
src/bio_openai_embeddings.py
Movie.plot
-
Embeddings (full) -
movie-plot-embeddings.csv
-
Embeddings (sample) -
movie-plot-embeddings-1k.csv
-
Load Cypher -
plot_load.cypher
-
Create Index Cypher -
plot_create_index.cypher
-
Example Query -
plot_query.cypher
-
Python code -
src/plot_openai_embeddings.py
Movie.poster
-
Embeddings (full) -
movie-poster-embeddings.csv
-
Embeddings (sample) -
movie-poster-embeddings-1k.csv
-
Load Cypher -
posters_load.cypher
-
Create Index Cypher -
posters_create_index.cypher
-
Example Query -
poster_query.cypher
-
Python code -
src/poster_embeddings.py