Use identity instead of hash for caching subfields in collect_subfields #56

ktosiek · 2019-09-05T06:15:42Z

I've noticed computing __hash__ on each cache access is quite expensive, and not what the JS implementation does - they are caching subfield nodes based on argument identity.

This PR is based on #55 for the benchmarks, but it can work separately.

Results of pytest --enable-benchmark -k benchmark before and after the change:

------------------------------------------------------ benchmark: 1 tests ------------------------------------------------------
Name (time in s)                               Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------
benchmark_executing_introspection_query     1.1361  1.1782  1.1484  0.0170  1.1427  0.0152       1;1  0.8707       5           1
--------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------
Name (time in ms)                                Min       Max      Mean  StdDev    Median     IQR  Outliers     OPS  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------
benchmark_executing_introspection_query     774.3004  789.3146  779.2556  6.0870  778.8992  7.2825       1;0  1.2833       5           1
----------------------------------------------------------------------------------------------------------------------------------------

this change also shaved 1/3 of the time from the benchmark mentioned in #54.

Cito

Again thanks a lot for this contribution which really makes a big performance difference.

This issue is actually a regression after #45 where the Node class got a correct, but slow, hash method. So using id() here is better even if it may result in fewer cache hits (when field nodes are equal, but not identical as objects).

Can you remove the benchmarking stuff from this PR, so that we have this nicely separated in different commits? See also another small suggestion below.

src/graphql/execution/execute.py

This is closer to what GraphQL.js does, and performs much better: benchmark_executing_introspection_query went from Mean 1.1484 to 766.0428

Cito · 2019-09-08T14:43:50Z

👍 Thank you.

ktosiek requested a review from Cito as a code owner September 5, 2019 06:15

ktosiek force-pushed the collect-subfields-cache-perf branch from dd7a427 to 62dd9e4 Compare September 5, 2019 18:26

Cito requested changes Sep 6, 2019

View reviewed changes

Cito reviewed Sep 6, 2019

View reviewed changes

src/graphql/execution/execute.py Outdated Show resolved Hide resolved

Use identity instead of hash for caching subfields in collect_subfields

5034989

This is closer to what GraphQL.js does, and performs much better: benchmark_executing_introspection_query went from Mean 1.1484 to 766.0428

ktosiek force-pushed the collect-subfields-cache-perf branch from 62dd9e4 to 5034989 Compare September 8, 2019 08:34

ktosiek requested a review from Cito September 8, 2019 11:36

Cito merged commit 1ebcb23 into graphql-python:master Sep 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use identity instead of hash for caching subfields in collect_subfields #56

Use identity instead of hash for caching subfields in collect_subfields #56

ktosiek commented Sep 5, 2019

Cito left a comment

Cito commented Sep 8, 2019

Use identity instead of hash for caching subfields in collect_subfields #56

Use identity instead of hash for caching subfields in collect_subfields #56

Conversation

ktosiek commented Sep 5, 2019

Cito left a comment

Choose a reason for hiding this comment

Cito commented Sep 8, 2019