Implementation for Matching Engine Vectorstore #3104

tomaspiaggio · 2023-04-18T18:27:07Z

We just finished the implementation for the vector store using the GCP Matching Engine.

We'll be contributing the implementation.

Related to #2892

If you have any questions or suggestions please contact me (@tomaspiaggio) or @scafati98.

tomaspiaggio · 2023-04-19T15:08:10Z

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

pyproject.toml

dev2049 · 2023-04-19T16:00:19Z

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

black is only a linting dependency, not a package dependency, so shouldn't cause issues. think you may have accidentally added it to list of actual dependencies

langchain/vectorstores/matching_engine.py

hwchase17

my main comment, inline with some of the others - is it simpler to just do the client creation OUTSIDE of the class, and then pass in an already initialized client? would cut back on a lot of the ags being passed around

tomaspiaggio · 2023-04-20T13:17:41Z

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

…ments in from_texts to make them required.

dev2049 · 2023-04-20T17:03:45Z

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

think he means to make __init__ look something like what i mentioned here https://github.com/hwchase17/langchain/pull/3104/files#r1170476602

tomaspiaggio · 2023-04-20T17:44:01Z

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

dev2049 · 2023-04-21T01:21:36Z

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

i just meant you should update __init__ params, which it looks like you did in 2f946f5 🙏 !

tomaspiaggio · 2023-04-22T01:28:15Z

Great @dev2049 !! So do you need me to do anything else for the merge?

hwchase17

looks great! thanks

meal · 2023-05-07T13:23:52Z

@hwchase17 any chance to get this into release anytime soon?

eugenemiretsky · 2023-05-21T18:08:36Z

@hwchase17 Same question here: Would be nice to see this released

eugenemiretsky · 2023-05-21T18:35:27Z

One concern is that the docs are stored/retrieve from GCS which is slow (and somewhat defeats the purpose of using a Vector DB)

eugenemiretsky · 2023-05-24T11:27:08Z

@tomaspiaggio should you create a PR your branch to master?

olaf-hoops · 2023-05-25T19:26:32Z

@hwchase17 Any updates on this one? Would be a cool feature!

tomaspiaggio · 2023-05-30T15:27:42Z

Will this be merged to master? @hwchase17

HarrisonKhannah · 2023-05-31T03:33:43Z

Keen to get this merged into master @hwchase17

ramssai · 2023-06-01T10:18:05Z

Once we have Matching engine index is deployed, What is the best retriever on langchain to get the query results ? @tomaspiaggio

ktibbs9417 · 2023-11-30T23:34:40Z

Have been using the Vector Search (Matching Engine) with langchain for a couple of days now and I've been hitting my head against a wall to solve a problem.

I notice that when embeddings are sent to Vector Search they get stored and a file is also created and stored within a separate GCS bucket that is referenced when queried.

I am looking for a way to remove the embeddings from the Vector Search but it seems I can only do it with gcloud commands but I need to know the datapoint_ids.

What would be the best way to store the datapoint_ids that are related to the documents that are being embedded?

Tom Piaggio added 30 commits April 14, 2023 12:09

Initial implementation for matching engine

5dc6bce

Fixing bugs

0fc0cdf

Finished initial implementation. About to test

43f458f

Typos

ffd3939

Typos

a4e4f0a

Typos

7b1b3d5

Typos

e0d956c

Typos

7603e85

Typos

b492c03

Typos

de768d0

Typos

b1ecb20

Typos

7e00904

Typos

c3c27fe

Typos

18e6412

Typos

c0c10e8

Typos

21b443f

Typos

78bfdb2

Typos

45c2c8f

Typos

583abe7

Typos

d5602e6

Typos

c12b9ab

Typos

ad5c433

Typos

dc1db6f

Typos

cd63328

Typos

5a3dce9

Typos

fbdef49

Typos

a2c3e7e

Typos

4d80f81

Typos

014821f

Typos

bb2c78a

dev2049 reviewed Apr 19, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

vowelparrot reviewed Apr 19, 2023

View reviewed changes

Tom Piaggio added 3 commits April 19, 2023 14:04

Continued with comments

3267bb5

Added dependencies to pyproject.toml

364b83a

Removed HUB_MODEL constant as it was the default anyway.

e2554b1

vowelparrot reviewed Apr 19, 2023

View reviewed changes

langchain/vectorstores/matching_engine.py Outdated Show resolved Hide resolved

hwchase17 reviewed Apr 19, 2023

View reviewed changes

Changed gcs_bucket_uri to gcs_bucket_name in all places. Changed argu…

af73772

…ments in from_texts to make them required.

Moved all object creations out of the object and into from_components.

2f946f5

hwchase17 approved these changes Apr 22, 2023

View reviewed changes

hwchase17 changed the base branch from master to harrison/matching-engine-vectorstore April 22, 2023 15:28

hwchase17 merged commit 89d574a into langchain-ai:harrison/matching-engine-vectorstore Apr 22, 2023

MarkEdmondson1234 mentioned this pull request Jun 3, 2023

GCP integrations MarkEdmondson1234/langchain-github#1

Open

6 tasks

afirstenberg mentioned this pull request Jun 4, 2023

Support for Vertex AI Matching Engine as a Vector Store langchain-ai/langchainjs#1532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation for Matching Engine Vectorstore #3104

Implementation for Matching Engine Vectorstore #3104

tomaspiaggio commented Apr 18, 2023

tomaspiaggio commented Apr 19, 2023

dev2049 commented Apr 19, 2023

hwchase17 left a comment

tomaspiaggio commented Apr 20, 2023

dev2049 commented Apr 20, 2023

tomaspiaggio commented Apr 20, 2023

dev2049 commented Apr 21, 2023

tomaspiaggio commented Apr 22, 2023

hwchase17 left a comment

meal commented May 7, 2023

eugenemiretsky commented May 21, 2023

eugenemiretsky commented May 21, 2023

eugenemiretsky commented May 24, 2023

olaf-hoops commented May 25, 2023

tomaspiaggio commented May 30, 2023

HarrisonKhannah commented May 31, 2023

ramssai commented Jun 1, 2023

ktibbs9417 commented Nov 30, 2023

Implementation for Matching Engine Vectorstore #3104

Implementation for Matching Engine Vectorstore #3104

Conversation

tomaspiaggio commented Apr 18, 2023

tomaspiaggio commented Apr 19, 2023

dev2049 commented Apr 19, 2023

hwchase17 left a comment

Choose a reason for hiding this comment

tomaspiaggio commented Apr 20, 2023

dev2049 commented Apr 20, 2023

tomaspiaggio commented Apr 20, 2023

dev2049 commented Apr 21, 2023

tomaspiaggio commented Apr 22, 2023

hwchase17 left a comment

Choose a reason for hiding this comment

meal commented May 7, 2023

eugenemiretsky commented May 21, 2023

eugenemiretsky commented May 21, 2023

eugenemiretsky commented May 24, 2023

olaf-hoops commented May 25, 2023

tomaspiaggio commented May 30, 2023

HarrisonKhannah commented May 31, 2023

ramssai commented Jun 1, 2023

ktibbs9417 commented Nov 30, 2023