Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use identity instead of hash for caching subfields in collect_subfields #56

Merged
merged 1 commit into from
Sep 8, 2019

Conversation

ktosiek
Copy link
Contributor

@ktosiek ktosiek commented Sep 5, 2019

I've noticed computing __hash__ on each cache access is quite expensive, and not what the JS implementation does - they are caching subfield nodes based on argument identity.

This PR is based on #55 for the benchmarks, but it can work separately.

Results of pytest --enable-benchmark -k benchmark before and after the change:

------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------
Name (time in s)                               Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------
benchmark_executing_introspection_query     1.1361  1.1782  1.1484  0.0170  1.1427  0.0152       1;1  0.8707       5           1
--------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------
Name (time in ms)                                Min       Max      Mean  StdDev    Median     IQR  Outliers     OPS  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------
benchmark_executing_introspection_query     774.3004  789.3146  779.2556  6.0870  778.8992  7.2825       1;0  1.2833       5           1
----------------------------------------------------------------------------------------------------------------------------------------

this change also shaved 1/3 of the time from the benchmark mentioned in #54.

@ktosiek ktosiek requested a review from Cito as a code owner September 5, 2019 06:15
@ktosiek ktosiek force-pushed the collect-subfields-cache-perf branch from dd7a427 to 62dd9e4 Compare September 5, 2019 18:26
Copy link
Member

@Cito Cito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again thanks a lot for this contribution which really makes a big performance difference.

This issue is actually a regression after #45 where the Node class got a correct, but slow, hash method. So using id() here is better even if it may result in fewer cache hits (when field nodes are equal, but not identical as objects).

Can you remove the benchmarking stuff from this PR, so that we have this nicely separated in different commits? See also another small suggestion below.

This is closer to what GraphQL.js does, and performs much better:
benchmark_executing_introspection_query went from Mean 1.1484 to 766.0428
@ktosiek ktosiek force-pushed the collect-subfields-cache-perf branch from 62dd9e4 to 5034989 Compare September 8, 2019 08:34
@ktosiek ktosiek requested a review from Cito September 8, 2019 11:36
@Cito Cito merged commit 1ebcb23 into graphql-python:master Sep 8, 2019
@Cito
Copy link
Member

Cito commented Sep 8, 2019

👍 Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants