-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of isawaitable #54
Comments
So far the focus has been on compatibility, correctness, and being up to date with GraphQL.js. The next step will be to improve performance. There is probably a lot of room for improvement. I'd like to keep the code simple and very close to the GraphQL.js implementation though, in order to make it easier to stay in sync with changes over there, and not move us into a very performant, but inscrutable and unmaintainable dead end. Maybe we can kind of inline some of the isawaitables() or split some of the code paths for sync/async to avoid such checks. But before we start the optimization, we need good benchmarks tools. Your script would be a starting point. My idea is to port the existing benchmarks from GraphQL.js and maybe add some more Python specific ones if needed (see issue #3). I think it makes sense t to work on that first, and at least have some basic performance checks before changing the code. It's on my list, just need some free time to work on this. Feel free to send contributions as PRs as well. |
Update: We now have nice benchmarks and can continue working on performance. |
Nice job so far on this project, thanks for all your efforts. I worry that keeping the implementation similar between two very different languages and interpreters is going to keep this project from being scalable. We're using this for a new project, and are considering returning to DRF because for relatively small calls we're struggling to get latencies below 300ms (profiling shows nearly all of the time is spent inside graphql-core). PyPy reduces this by about half, but that means backporting a few repos to 3.6 which I did to test out, but am not keen to maintain especially with 3.8 just around the corner. |
If it really makes and big difference necessary we certainly can implement some parts that are crucial regarding performance differently. But actually JS and Python are not that different, and as I wrote above, I believe there is still a lot of room for improvement while keeping the implementation similar. Anybody who wants to work on this: Feel free to send PRs and suggestions. |
While I said JS and Python are not that different, there is one difference that plays a role here, namely that in JS you can await any expression, but in Python you can await only awaitables. That's why we need so many isawaitable calls and have nested async functions which is not very efficient. Maybe we can find a better solution. |
Thanks @Cito indeed my point about Python vs. JS was that Python has been sync from day one with async bolted on whereas JS has been the other way around. As @ktosiek has done a very good job demonstrating, the isawaitable calls are very expensive and are only necessary due to the differences you mention. |
@ktosiek Which versions of Python have you tried this on? I found huge improvements from Python 3.6 to Python 3.8 (it seems like 3.8 |
Thanks for the heads up, I'll try on 3.8 too. I've run my experiments on
3.6 and 3.7.
Are you sure your 3.6 was compiled with PGO? It made a huge difference for
me when I've installed 3.7 from pyenv (which disabled PGO to get better
compilation times).
śr., 20 lis 2019, 23:03 użytkownik Loris Zinsou <[email protected]>
napisał:
… @ktosiek <https://github.com/ktosiek> Which versions of Python have you
tried this on? I found huge improvements from Python 3.6 to Python 3.8 (it
seems like 3.8 isawaitable() has almost 300% the performance of 3.6).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#54>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACD37MRKKD3SBJ2XREOPSDQUWXZVANCNFSM4ITOIKPQ>
.
|
@Hellzed wow, I've confirmed your tests. There's a big difference between 3.6 and 3.7, but 3.8 is even faster! I've looked at the Benchmark results:
The benchmark: $ python3 -VV
Python 3.6.9 (default, Nov 21 2019, 21:00:04)
[GCC 9.2.1 20191008]
$ python3 -mtimeit -s $'from inspect import isawaitable \nclass A: pass \na = A()' 'isawaitable(a)'
1000000 loops, best of 3: 1.07 usec per loop |
@ktosiek I used stock Ubuntu 19.10 python packages, so I'm not sure which compile flags were used. If you intend to make a GraphQL service built on graphql-core-3 faster, I would recommend to target the parsing and validation steps, and make sure your clients always use variables for any changing value in the query. This means instead of using |
@Hellzed this issue started as a result of digging into cases like graphql-python/graphene#268 - with many objects returned by one API call. Caching the parsed query would help, but the bulk of the time is spent in execution. |
@ktosiek I actually wondered about the use of How does Python 3.8 fare when running both Graphene v2 and Graphene v3 benchmarks? Does it just make both faster, or does it bring v3 closer to v2? |
Caching the parsing and validation definitely helps, but the bulk of the time (especially for larger objects) is spent in execution. The best approach would be to introduce either an AOT or JIT compiler like https://github.com/zalando-incubator/graphql-jit that generated runtime code for optimal execution. |
I've just run this simple benchmark against latest v2 and v3 branches and we've still had a performance regression of nearly 30% (18.5 sec vs 24.3 sec) |
@qeternity Can confirm it's currently still 30% even with Python 3.8. I can also confirm that the main problem are still the When I replaced that Maybe we can live with such a "simplified" isawaitable method? It's currently unclear to me, why and when it does not suffice to simply check for the Note: We must keep in mind that the above benchmark is only testing the path where all resolvers are synchronous. If we start to optimize, we must also measure with a benchmark with asynchronous code in order to not deteriorate the performance for that case. |
Another idea: Maybe we should add an |
There's also this which I opened some time ago only to fall on deaf ears: syrusakbary/promise#82 |
@qeternity This issue tracker only covers GraphQL-core v3. And please understand that these are all projects that various people work on for free in their spare time. You can't expect people to work on issues - they may have currently other priorities in life. The only way to move forward is to contribute by sending PRs, become a maintainer of stalled projects and motivating other people. |
@Cito thanks, I absolutely don't expect anything. I have spent an incredible amount of time profiling and benchmarking this. With respect to scope of issue tracker, the OP benchmark is graphene, not graphql-core. I wanted to make sure we were all aware of performance regression in libraries that graphene depends on so as not to chase ghosts in this issue tracker. |
@qeternity Graphene is only a wrapper around the GraphQL-core types. The query execution and the heavy lifting happens in GraphQL-core, so that's in fact the right place to optimize execution performance. The promise library mentioned above is only used in GraphQL-core v2 (Graphene v2) which is considered deprecated, that's why you don't see much activity there any more. |
@Cito yep, I'm aware. But if this delta between graphene v2 and v3 is closing (which is ultimately what we're benchmarking here) due to regressions in graphene v2 (vis-a-vis Promises regressions) instead of improvements in v3, I presumed that was relevant to our discussion. But your point is well taken. |
@qeternity Ah ok now I understand your point. You were actually saying you observed a v2 regression in the same benchmark we are discussing here, and since we are comparing with v2 as our base line, we should be aware of that. That's in fact relevant to our discussion. It's interesting that you observed an update of the promises lib to affect this (synchronous) benchmark. I had a quick look - the main thing that has been changed in the promises lib update is something with caching, fixing an out-of-memory error. Maybe it's just slower because of the additional garbage collection. Benchmarking is not easy. This benchmark only measures the time in this one special case, not "the" performance in general. |
I have now implemented the ideas mentioned above - this improves the performance considerably. The benchmark with v3.1.0b2 is now running faster than with v2. Please let me know how it works for you and whether we can close this issue. |
@Cito Just had a look over the commit...nice, looks great! Not had a chance to run it yet, but will sit down with it tonight. I already had put aside some weekend quarantine time to start implementing some more aggressive hotpaths. |
I can confirm it runs faster than v2, tested on Python 3.7 and 3.8. I think this issue is solved. Thank you, this will help a lot with migrating to v3! |
Just to add on here - when using Here is graphl-core/graphene 2:
vs 3:
It does appear that document validation is 2x slower in the newer version of core, but I'll open a separate issue. Nice work! |
@ewhauser Thanks for the feedback. Yes, it would be good to open a new issue for performance improvements in validation. Maybe you can run a profiler and identify some bottlenecks already. |
I've run a very simplistic benchmark, just returning a long list of single-field objects.
It seems graphql-core-next is 2.5x slower than graphql-core: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287
Looking at flamegraphs, I see isawaitable is used a lot, and it's a pretty slow function. Would it be possible to pass raw results around more? It seems resolve_field_value_or_error and complete_value_catching_error are the main offenders here.
The text was updated successfully, but these errors were encountered: