Performance of isawaitable #54

ktosiek · 2019-09-04T07:12:13Z

I've run a very simplistic benchmark, just returning a long list of single-field objects.
It seems graphql-core-next is 2.5x slower than graphql-core: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287

Looking at flamegraphs, I see isawaitable is used a lot, and it's a pretty slow function. Would it be possible to pass raw results around more? It seems resolve_field_value_or_error and complete_value_catching_error are the main offenders here.

Cito · 2019-09-04T19:15:28Z

So far the focus has been on compatibility, correctness, and being up to date with GraphQL.js.

The next step will be to improve performance. There is probably a lot of room for improvement. I'd like to keep the code simple and very close to the GraphQL.js implementation though, in order to make it easier to stay in sync with changes over there, and not move us into a very performant, but inscrutable and unmaintainable dead end.

Maybe we can kind of inline some of the isawaitables() or split some of the code paths for sync/async to avoid such checks.

But before we start the optimization, we need good benchmarks tools. Your script would be a starting point. My idea is to port the existing benchmarks from GraphQL.js and maybe add some more Python specific ones if needed (see issue #3). I think it makes sense t to work on that first, and at least have some basic performance checks before changing the code.

It's on my list, just need some free time to work on this. Feel free to send contributions as PRs as well.

Cito · 2019-09-14T19:49:51Z

Update: We now have nice benchmarks and can continue working on performance.

qeternity · 2019-09-16T09:25:45Z

I'd like to keep the code simple and very close to the GraphQL.js implementation though, in order to make it easier to stay in sync with changes over there, and not move us into a very performant, but inscrutable and unmaintainable dead end.

Nice job so far on this project, thanks for all your efforts. I worry that keeping the implementation similar between two very different languages and interpreters is going to keep this project from being scalable. We're using this for a new project, and are considering returning to DRF because for relatively small calls we're struggling to get latencies below 300ms (profiling shows nearly all of the time is spent inside graphql-core). PyPy reduces this by about half, but that means backporting a few repos to 3.6 which I did to test out, but am not keen to maintain especially with 3.8 just around the corner.

Cito · 2019-09-16T17:56:00Z

If it really makes and big difference necessary we certainly can implement some parts that are crucial regarding performance differently. But actually JS and Python are not that different, and as I wrote above, I believe there is still a lot of room for improvement while keeping the implementation similar. Anybody who wants to work on this: Feel free to send PRs and suggestions.

Cito · 2019-09-16T18:05:59Z

While I said JS and Python are not that different, there is one difference that plays a role here, namely that in JS you can await any expression, but in Python you can await only awaitables. That's why we need so many isawaitable calls and have nested async functions which is not very efficient. Maybe we can find a better solution.

qeternity · 2019-09-17T10:31:04Z

Thanks @Cito indeed my point about Python vs. JS was that Python has been sync from day one with async bolted on whereas JS has been the other way around. As @ktosiek has done a very good job demonstrating, the isawaitable calls are very expensive and are only necessary due to the differences you mention.

Hellzed · 2019-11-20T22:03:06Z

@ktosiek Which versions of Python have you tried this on? I found huge improvements from Python 3.6 to Python 3.8 (it seems like 3.8 isawaitable() has almost 300% the performance of 3.6).

ktosiek · 2019-11-21T06:59:26Z

Thanks for the heads up, I'll try on 3.8 too. I've run my experiments on 3.6 and 3.7. Are you sure your 3.6 was compiled with PGO? It made a huge difference for me when I've installed 3.7 from pyenv (which disabled PGO to get better compilation times). śr., 20 lis 2019, 23:03 użytkownik Loris Zinsou <[email protected]> napisał:

…

@ktosiek <https://github.com/ktosiek> Which versions of Python have you tried this on? I found huge improvements from Python 3.6 to Python 3.8 (it seems like 3.8 isawaitable() has almost 300% the performance of 3.6). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACD37MRKKD3SBJ2XREOPSDQUWXZVANCNFSM4ITOIKPQ> .

ktosiek · 2019-11-22T06:53:57Z

@Hellzed wow, I've confirmed your tests. There's a big difference between 3.6 and 3.7, but 3.8 is even faster!

I've looked at the test_execute_introspection_query benchmark, and it went from ~756ms on 3.6 to ~265ms on 3.8. This is so huge isawaitable() might not be the bottleneck anymore, I'll need new flamegraphs now :-)

Benchmark results:

version	time
Python 3.8.0 (default, Nov 21 2019, 20:31:38)	370 nsec per loop
Python 3.7.5rc1 (default, Oct 8 2019, 16:47:45)	444 nsec per loop
Python 3.6.9 (default, Nov 21 2019, 21:00:04)	1.07 usec per loop

The benchmark:

$ python3 -VV
Python 3.6.9 (default, Nov 21 2019, 21:00:04) 
[GCC 9.2.1 20191008]
$ python3 -mtimeit -s $'from inspect import isawaitable \nclass A: pass \na = A()' 'isawaitable(a)'
1000000 loops, best of 3: 1.07 usec per loop

Hellzed · 2019-11-22T11:06:29Z

@ktosiek I used stock Ubuntu 19.10 python packages, so I'm not sure which compile flags were used.

If you intend to make a GraphQL service built on graphql-core-3 faster, I would recommend to target the parsing and validation steps, and make sure your clients always use variables for any changing value in the query.

This means instead of using graphql.graphql, you should use graphql.parse, graphql.validate then only graphql.execute separately, and introduce a caching step (with Redis or memcache for example) in between. This would also allow you to leverage libgraphqlparser if you're comfortable with CFFI or cTypes.

ktosiek · 2019-11-22T12:47:21Z

@Hellzed this issue started as a result of digging into cases like graphql-python/graphene#268 - with many objects returned by one API call. Caching the parsed query would help, but the bulk of the time is spent in execution.

Hellzed · 2019-11-22T13:20:16Z

@ktosiek I actually wondered about the use of isawaitable() when I saw it comes from the inspect module, and remembered some old advice: "don't use inspect, it's slow". Never needed to verify that before.

How does Python 3.8 fare when running both Graphene v2 and Graphene v3 benchmarks? Does it just make both faster, or does it bring v3 closer to v2?

ewhauser · 2020-02-16T02:51:01Z

Caching the parsing and validation definitely helps, but the bulk of the time (especially for larger objects) is spent in execution. The best approach would be to introduce either an AOT or JIT compiler like https://github.com/zalando-incubator/graphql-jit that generated runtime code for optimal execution.

qeternity · 2020-03-17T08:34:51Z

I've just run this simple benchmark against latest v2 and v3 branches and we've still had a performance regression of nearly 30% (18.5 sec vs 24.3 sec)

Cito · 2020-03-18T14:56:14Z

@qeternity Can confirm it's currently still 30% even with Python 3.8.

I can also confirm that the main problem are still the isawaitable calls, particularly in the complete_value_catching_error method. And in the isawaitable calls, the isinstance check for collections.abc.Awaitable is the biggest time waster.

When I replaced that isinstance check in the inspect module with a simple hasattr(object, '__await__'), the benchmark ran faster than with the v2 branch.

Maybe we can live with such a "simplified" isawaitable method? It's currently unclear to me, why and when it does not suffice to simply check for the __await__ attribute so that an instance check against Abc.Awaitable would be necessary. Any Python gurus here who can shed light on this, or who has some time to discuss this on the Python mailing list and report back?

Note: We must keep in mind that the above benchmark is only testing the path where all resolvers are synchronous. If we start to optimize, we must also measure with a benchmark with asynchronous code in order to not deteriorate the performance for that case.

Cito · 2020-03-18T16:51:53Z

Another idea: Maybe we should add an isawaitable parameter to the execute function. By default we could use the simplified, faster isawaitable check mentioned above, and maybe it could even be set to a function that always returns False when execute is called by the graphql_sync function to make that path even faster.

qeternity · 2020-03-18T19:20:24Z

There's also this which I opened some time ago only to fall on deaf ears: syrusakbary/promise#82

Cito · 2020-03-18T21:27:33Z

@qeternity This issue tracker only covers GraphQL-core v3. And please understand that these are all projects that various people work on for free in their spare time. You can't expect people to work on issues - they may have currently other priorities in life. The only way to move forward is to contribute by sending PRs, become a maintainer of stalled projects and motivating other people.

qeternity · 2020-03-19T09:16:01Z

@Cito thanks, I absolutely don't expect anything. I have spent an incredible amount of time profiling and benchmarking this. With respect to scope of issue tracker, the OP benchmark is graphene, not graphql-core. I wanted to make sure we were all aware of performance regression in libraries that graphene depends on so as not to chase ghosts in this issue tracker.

Cito · 2020-03-19T10:13:01Z

@qeternity Graphene is only a wrapper around the GraphQL-core types. The query execution and the heavy lifting happens in GraphQL-core, so that's in fact the right place to optimize execution performance. The promise library mentioned above is only used in GraphQL-core v2 (Graphene v2) which is considered deprecated, that's why you don't see much activity there any more.

qeternity · 2020-03-19T10:16:28Z

@Cito yep, I'm aware. But if this delta between graphene v2 and v3 is closing (which is ultimately what we're benchmarking here) due to regressions in graphene v2 (vis-a-vis Promises regressions) instead of improvements in v3, I presumed that was relevant to our discussion. But your point is well taken.

Cito · 2020-03-19T11:09:48Z

@qeternity Ah ok now I understand your point. You were actually saying you observed a v2 regression in the same benchmark we are discussing here, and since we are comparing with v2 as our base line, we should be aware of that. That's in fact relevant to our discussion.

It's interesting that you observed an update of the promises lib to affect this (synchronous) benchmark. I had a quick look - the main thing that has been changed in the promises lib update is something with caching, fixing an out-of-memory error. Maybe it's just slower because of the additional garbage collection. Benchmarking is not easy. This benchmark only measures the time in this one special case, not "the" performance in general.

Cito · 2020-03-21T20:07:35Z

I have now implemented the ideas mentioned above - this improves the performance considerably. The benchmark with v3.1.0b2 is now running faster than with v2. Please let me know how it works for you and whether we can close this issue.

qeternity · 2020-03-22T09:52:31Z

@Cito Just had a look over the commit...nice, looks great! Not had a chance to run it yet, but will sit down with it tonight. I already had put aside some weekend quarantine time to start implementing some more aggressive hotpaths.

ktosiek · 2020-03-22T17:14:25Z

I can confirm it runs faster than v2, tested on Python 3.7 and 3.8. I think this issue is solved. Thank you, this will help a lot with migrating to v3!

ewhauser · 2020-04-26T22:38:01Z

Just to add on here - when using async, the performance gains appear to be quite large due to the elimination of promises. This example highlights the performance gains pretty as it is returning a large of number of results (5x iterations).

Here is graphl-core/graphene 2:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      210    0.407    0.002   49.608    0.236 /usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py:1675(_run_once)
   180304    0.208    0.000   46.979    0.000 /usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/events.py:86(_run)
   180304    0.329    0.000   46.771    0.000 {method 'run' of 'Context' objects}
   180060    0.097    0.000   25.413    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:478(resolve)
315125/180065    0.564    0.000   25.316    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:189(_resolve_callback)
    90035    0.084    0.000   25.052    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:841(handle_future_result)
405130/180050    0.625    0.000   24.412    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:229(_fulfill)
180045/90030    0.107    0.000   24.067    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/async_.py:54(settle_promises)
180045/90030    0.219    0.000   24.012    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/async_.py:36(_async_settle_promise)
180050/90035    0.115    0.000   23.858    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/async_.py:127(queue_tick)
    90035    0.045    0.000   23.766    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/schedulers/immediate.py:9(call)
    90035    0.175    0.000   23.722    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/async_.py:119(drain_queues)
   180070    0.661    0.000   23.516    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/async_.py:71(drain_queue)
315105/180045    0.403    0.000   22.259    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:457(_settle_promises)
315105/180045    0.177    0.000   21.887    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:334(_settle_promise0)
315110/180050    0.473    0.000   21.781    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:306(_settle_promise)
135065/105040    0.162    0.000   21.180    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:345(_settle_promise_from_handler)
   135065    0.111    0.000   19.625    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/promise/promise.py:84(try_catch)
    90025    0.085    0.000   19.430    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/executor.py:531(<lambda>)
225065/90030    0.863    0.000   19.345    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/executor.py:500(complete_value)
    45010    0.061    0.000   17.934    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/executor.py:700(complete_object_value)
    45015    0.265    0.000   17.727    0.000 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/executor.py:230(execute_fields)
       10    0.000    0.000   17.025    1.703 
... user code ...

vs 3:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      175    0.049    0.000   15.466    0.088 /usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py:1675(_run_once)
      270    0.000    0.000   14.621    0.054 /usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/events.py:86(_run)
      270    0.001    0.000   14.621    0.054 {method 'run' of 'Context' objects}
... user code ...
 45015/10    0.141    0.000    2.268    0.227 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:361(execute_fields)
 90025/15    0.223    0.000    2.268    0.151 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:522(resolve_field)
       10    0.000    0.000    2.268    0.227 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:630(await_completed)
       10    0.000    0.000    2.268    0.227 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:615(await_result)
 135040/5    0.260    0.000    2.267    0.453 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:660(complete_value)
  45010/5    0.024    0.000    2.267    0.453 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:899(complete_object_value)
  45010/5    0.045    0.000    2.267    0.453 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:934(collect_and_execute_subfields)
105025/15    0.094    0.000    2.267    0.151 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:599(complete_value_catching_error)
        5    0.024    0.005    2.266    0.453 /Users/ewhauser/.venvs/drone_web/lib/python3.7/site-packages/graphql/execution/execute.py:744(complete_list_value)

It does appear that document validation is 2x slower in the newer version of core, but I'll open a separate issue. Nice work!

Cito · 2020-04-27T09:23:36Z

@ewhauser Thanks for the feedback. Yes, it would be good to open a new issue for performance improvements in validation. Maybe you can run a profiler and identify some bottlenecks already.

Cito added optimization Code optimizations and performance issues pending Waiting for other issues to move forward labels Sep 4, 2019

ktosiek mentioned this issue Sep 5, 2019

Use identity instead of hash for caching subfields in collect_subfields #56

Merged

Cito removed the pending Waiting for other issues to move forward label Sep 14, 2019

Cito self-assigned this Sep 14, 2019

Cito added this to the v3.1 milestone Sep 14, 2019

corydolphin mentioned this issue Jan 5, 2020

Use asyncio.iscoroutine instead of inspect.isawaitable #74

Closed

rafalp mentioned this issue Feb 13, 2020

Really slow for huge list of objects mirumee/ariadne#319

Closed

Cito added a commit that referenced this issue Mar 21, 2020

Improve performance using faster is_awaitable (#54)

5db7a7a

ktosiek closed this as completed Mar 22, 2020

ktosiek mentioned this issue Feb 10, 2021

Performance issues with large data sets graphql-python/graphene#268

Open

Cito mentioned this issue Oct 15, 2021

Improve performance of async execution #142

Open

rafalp mentioned this issue Oct 23, 2021

Graphql query resolution gets very slow with middlewares. mirumee/ariadne#708

Closed

bartenra mentioned this issue Mar 22, 2022

Use inspect.isawaitable instead of custom is_awaitable helper function #163

Closed

bellini666 mentioned this issue Oct 1, 2022

Cryptic error if permission directive returns non-bool blb-ventures/strawberry-django-plus#124

Closed

bellini666 mentioned this issue Feb 19, 2023

refactor: Permissions using Field Extensions strawberry-graphql/strawberry#2570

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of isawaitable #54

Performance of isawaitable #54

ktosiek commented Sep 4, 2019

Cito commented Sep 4, 2019 •

edited

Loading

Cito commented Sep 14, 2019

qeternity commented Sep 16, 2019

Cito commented Sep 16, 2019 •

edited

Loading

Cito commented Sep 16, 2019 •

edited

Loading

qeternity commented Sep 17, 2019 •

edited

Loading

Hellzed commented Nov 20, 2019

ktosiek commented Nov 21, 2019 via email

ktosiek commented Nov 22, 2019

Hellzed commented Nov 22, 2019

ktosiek commented Nov 22, 2019

Hellzed commented Nov 22, 2019 •

edited

Loading

ewhauser commented Feb 16, 2020

qeternity commented Mar 17, 2020

Cito commented Mar 18, 2020 •

edited

Loading

Cito commented Mar 18, 2020

qeternity commented Mar 18, 2020

Cito commented Mar 18, 2020

qeternity commented Mar 19, 2020

Cito commented Mar 19, 2020

qeternity commented Mar 19, 2020

Cito commented Mar 19, 2020

Cito commented Mar 21, 2020

qeternity commented Mar 22, 2020

ktosiek commented Mar 22, 2020

ewhauser commented Apr 26, 2020 •

edited

Loading

Cito commented Apr 27, 2020

Performance of isawaitable #54

Performance of isawaitable #54

Comments

ktosiek commented Sep 4, 2019

Cito commented Sep 4, 2019 • edited Loading

Cito commented Sep 14, 2019

qeternity commented Sep 16, 2019

Cito commented Sep 16, 2019 • edited Loading

Cito commented Sep 16, 2019 • edited Loading

qeternity commented Sep 17, 2019 • edited Loading

Hellzed commented Nov 20, 2019

ktosiek commented Nov 21, 2019 via email

ktosiek commented Nov 22, 2019

Benchmark results:

Hellzed commented Nov 22, 2019

ktosiek commented Nov 22, 2019

Hellzed commented Nov 22, 2019 • edited Loading

ewhauser commented Feb 16, 2020

qeternity commented Mar 17, 2020

Cito commented Mar 18, 2020 • edited Loading

Cito commented Mar 18, 2020

qeternity commented Mar 18, 2020

Cito commented Mar 18, 2020

qeternity commented Mar 19, 2020

Cito commented Mar 19, 2020

qeternity commented Mar 19, 2020

Cito commented Mar 19, 2020

Cito commented Mar 21, 2020

qeternity commented Mar 22, 2020

ktosiek commented Mar 22, 2020

ewhauser commented Apr 26, 2020 • edited Loading

Cito commented Apr 27, 2020

Cito commented Sep 4, 2019 •

edited

Loading

Cito commented Sep 16, 2019 •

edited

Loading

Cito commented Sep 16, 2019 •

edited

Loading

qeternity commented Sep 17, 2019 •

edited

Loading

Hellzed commented Nov 22, 2019 •

edited

Loading

Cito commented Mar 18, 2020 •

edited

Loading

ewhauser commented Apr 26, 2020 •

edited

Loading