Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for Matching Engine Vectorstore #3104

Merged

Conversation

tomaspiaggio
Copy link
Contributor

We just finished the implementation for the vector store using the GCP Matching Engine.

We'll be contributing the implementation.

Related to #2892

If you have any questions or suggestions please contact me (@tomaspiaggio) or @scafati98.

@tomaspiaggio
Copy link
Contributor Author

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

@dev2049
Copy link
Contributor

dev2049 commented Apr 19, 2023

I just pushed a new updates addressing the comments. However, we were trying to add google-cloud-storage and google-cloud-aiplatform to the pyproject.toml but we're having dependency conflicts with black. Do you have any suggetions here? @dev2049

black is only a linting dependency, not a package dependency, so shouldn't cause issues. think you may have accidentally added it to list of actual dependencies

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main comment, inline with some of the others - is it simpler to just do the client creation OUTSIDE of the class, and then pass in an already initialized client? would cut back on a lot of the ags being passed around

@tomaspiaggio
Copy link
Contributor Author

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

…ments in from_texts to make them required.
@dev2049
Copy link
Contributor

dev2049 commented Apr 20, 2023

@hwchase17 I thought that was addressed with the from_components function. Would you comment specifically what would you need? I'm also not sure what you mean by arguments being passed around as well. Would you please comment on that as well so I can fix it? Thank you!

think he means to make __init__ look something like what i mentioned here https://github.com/hwchase17/langchain/pull/3104/files#r1170476602

@tomaspiaggio
Copy link
Contributor Author

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

@dev2049
Copy link
Contributor

dev2049 commented Apr 21, 2023

@dev2049 I already added the from_components function and I agree it is a better approach. The methods called in the constructor are validations for the gcs_bucket_name and that the client libraries are installed. I'm sorry if I'm not understanding what you mean.

i just meant you should update __init__ params, which it looks like you did in 2f946f5 🙏 !

@tomaspiaggio
Copy link
Contributor Author

Great @dev2049 !! So do you need me to do anything else for the merge?

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! thanks

@hwchase17 hwchase17 changed the base branch from master to harrison/matching-engine-vectorstore April 22, 2023 15:28
@hwchase17 hwchase17 merged commit 89d574a into langchain-ai:harrison/matching-engine-vectorstore Apr 22, 2023
@meal
Copy link
Contributor

meal commented May 7, 2023

@hwchase17 any chance to get this into release anytime soon?

@eugenemiretsky
Copy link

@hwchase17 Same question here: Would be nice to see this released

@eugenemiretsky
Copy link

One concern is that the docs are stored/retrieve from GCS which is slow (and somewhat defeats the purpose of using a Vector DB)

@eugenemiretsky
Copy link

@tomaspiaggio should you create a PR your branch to master?

@olaf-hoops
Copy link

@hwchase17 Any updates on this one? Would be a cool feature!

@tomaspiaggio
Copy link
Contributor Author

Will this be merged to master? @hwchase17

@HarrisonKhannah
Copy link

Keen to get this merged into master @hwchase17

@ramssai
Copy link

ramssai commented Jun 1, 2023

Once we have Matching engine index is deployed, What is the best retriever on langchain to get the query results ? @tomaspiaggio

@ktibbs9417
Copy link

Have been using the Vector Search (Matching Engine) with langchain for a couple of days now and I've been hitting my head against a wall to solve a problem.

I notice that when embeddings are sent to Vector Search they get stored and a file is also created and stored within a separate GCS bucket that is referenced when queried.

I am looking for a way to remove the embeddings from the Vector Search but it seems I can only do it with gcloud commands but I need to know the datapoint_ids.

What would be the best way to store the datapoint_ids that are related to the documents that are being embedded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.