Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential complexity of unions generation #558

Open
LiLatee opened this issue Dec 12, 2023 · 16 comments
Open

Exponential complexity of unions generation #558

LiLatee opened this issue Dec 12, 2023 · 16 comments
Assignees

Comments

@LiLatee
Copy link

LiLatee commented Dec 12, 2023

Hi! Awesome package! Good job! 😊
We are switching quite complex project from artemis to ferry and we were not able to generate code using ferry. It was throwing out of memory error and then we found reuse_fragments option that helped (probably), but then the generation can't finish. It takes infinite time (after 9 hours of processing a single file I gave up).

We've made an investigation and it turned out that the complexity of unions in our queries is the problem.
We have a timeline query that can return 1 of 8 possible objects. We are not able to process that file, but if we remove some of the fragments from the query then we can do it and it doesn't matter which one we remove:

  • 4 fragments takes ~95ms
  • 5 fragments takes ~470ms
  • 6 fragments takes ~6667ms
  • 7 fragments takes ~ for sure more than 1 hour, I was bored to wait more,
  • 8 fragment - I was waiting 9 hours and it didn't finish 😢

I've prepared a repository with an example to reproduce the issue. The example is much simpler than our real project so it can generate even 10 fragments, but it takes about 2 minutes.

The time is checked in the following way:
image
It looks like calling https://github.com/gql-dart/gql/blob/1cabae81a96db119b00d1502043974d1f7b49f70/codegen/gql_code_builder/lib/data.dart#L128 function is responsible for such long times.

Is there any plan to improve that? In our case ferry seems to be not usable 😞

@knaeckeKami
Copy link
Collaborator

Hi!

Thanks for the repro, this makes it easier for me to figure out what is happening.

@knaeckeKami
Copy link
Collaborator

knaeckeKami commented Dec 12, 2023

Can you maybe also produce a repro that runs into this pathological case with reuse_fragments: false?

without reuse_fragments: false, the example builds fine and quickly for me.

So you definitely found a performance bug with reuse_fragments, but I would like to focus on the general bug that prevents you from building without it first.

@knaeckeKami
Copy link
Collaborator

I found an issue in gql_code_builder, and found a potential fix but it's not testes/verified yet.

you can test it by using

dependency_overrides:
   gql_code_builder:
      git:
           ref: possible_reuse_fragment_fix
           url:  https://github.com/gql-dart/gql.git
           path: codegen/gql_code_builder

@knaeckeKami knaeckeKami self-assigned this Dec 12, 2023
@LiLatee
Copy link
Author

LiLatee commented Dec 13, 2023

Awesome! It looks like you fixed the problem related to a long time of processing. Now file that couldn't process within 9+ hours has been finished in 1s. 🎉 So I was able to build some part of the queries we use in the project, but when I tried to build the all queries then after 30 minutes of building I got an Out of memory error 😞 The process bumped to 30GB of memory.

[INFO] 30m 42s elapsed, 1291/1296 actions completed.Exhausted heap space, trying to allocate 458256 bytes. [SEVERE] built_value_generator:built_value on lib/graphql/__generated__/serializers.gql.dart: Unknown error in BuiltValueGenerator for /project/lib/graphql/__generated__/serializers.gql.dart. Out of Memory

So it's probably another issue, and for now, I do not have any more specific information about it.

@LiLatee
Copy link
Author

LiLatee commented Dec 13, 2023

@knaeckeKami is that question still relevant?

Can you maybe also produce a repro that runs into this pathological case with reuse_fragments: false?

@knaeckeKami
Copy link
Collaborator

knaeckeKami commented Dec 13, 2023

@knaeckeKami is that question still relevant?

Yes! I would love the see a runnable reproduction of the issue where reuse-fragments is deactivated and the build takes a long time / runs out of memory. This would allow me to fix the root cause (or at least find out what causes this issue ;) ).

I heard reports of that from different users, but I could not reproduce it myself yet.

@knaeckeKami
Copy link
Collaborator

Actually, looking at the error message, it seems to fail to generate the serializers in the built_value step.

https://github.com/google/built_value.dart/blob/master/built_value_generator/lib/built_value_generator.dart#L56

That's surprising to me, generating the list of serializers is not particularly complex.
And potentially bad, since it might not be easy to fix from the ferry side.

If you cannot provide a reproducible example, it would also help if you cold fork built_value_generator and print more details (e.g. the stack trace when this exception happens·.

@LiLatee
Copy link
Author

LiLatee commented Dec 13, 2023

Yes! I would love the see a runnable reproduction of the issue where reuse-fragments is deactivated and the build takes a long time / runs out of memory. This would allow me to fix the root cause (or at least find out what causes this issue ;) ).

So the example from the repo I provided works fine with the disabled reuse_fragments option and it runs sooo long with enabled, but the solution you provided fixes the issue for the enabled option.

But in our project, we get Out of memory error in both cases. I am trying to create some reproducible example but I need more time. The only thing I noticed, for now, is that in one place I removed one fragment from the union and then it built properly.

If you cannot provide a reproducible example, it would also help if you cold fork built_value_generator and print more details (e.g. the stack trace when this exception happens·.

I will try to provide more details soon.

@LiLatee
Copy link
Author

LiLatee commented Dec 13, 2023

If you cannot provide a reproducible example, it would also help if you cold fork built_value_generator and print more details (e.g. the stack trace when this exception happens·.

Here you have a full stack trace of out-of-memory error with reuse_fragments DISABLED.

[SEVERE] built_value_generator:built_value on lib/graphql/__generated__/serializers.gql.dart:
Unknown error in BuiltValueGenerator for /project/lib/graphql/__generated__/serializers.gql.dart.
Out of Memory
#0      _StringBase.+ (dart:core-patch/string_patch.dart:277:43)
#1      Scanner.tokenize (package:analyzer/src/dart/scanner/scanner.dart:154:40)
#2      FileState._parse (package:analyzer/src/dart/analysis/file_state.dart:718:27)
#3      FileState.parse (package:analyzer/src/dart/analysis/file_state.dart:490:14)
#4      AnalysisDriver.parseFileSync (package:analyzer/src/dart/analysis/driver.dart:1016:33)
#5      AnalysisDriver.getParsedLibrary (package:analyzer/src/dart/analysis/driver.dart:819:24)
#6      AnalysisDriver.getParsedLibraryByUri (package:analyzer/src/dart/analysis/driver.dart:835:19)
#7      _$SerializerSourceClass.parsedLibrary (package:built_value_generator/src/serializer_source_class.g.dart:51:33)
#8      SerializerSourceClass.fields (package:built_value_generator/src/serializer_source_class.dart:167:31)
#9      SerializerSourceClass.generateBuilderFactoryAdders (package:built_value_generator/src/serializer_source_class.dart:287:12)
#10     SerializerSourceLibrary._generateSerializersTopLevelFields.<anonymous closure>.<anonymous closure> (package:built_value_generator/src/serializer_source_library.dart:203:35)
#11     new _GrowableList._ofEfficientLengthIterable (dart:core-patch/growable_array.dart:189:27)
#12     new List.of (dart:core-patch/array_patch.dart:47:28)
#13     Iterable.toList (dart:core/iterable.dart:495:7)
#14     SerializerSourceLibrary._generateSerializersTopLevelFields.<anonymous closure> (package:built_value_generator/src/serializer_source_library.dart:205:20)
#15     Iterable.join (dart:core/iterable.dart:445:19)
#16     SerializerSourceLibrary._generateSerializersTopLevelFields (package:built_value_generator/src/serializer_source_library.dart:209:8)
#17     SerializerSourceLibrary.generateCode (package:built_value_generator/src/serializer_source_library.dart:181:12)
#18     BuiltValueGenerator.generate (package:built_value_generator/built_value_generator.dart:46:48)
#19     _rootRunUnary (dart:async/zone.dart:1407:47)
#20     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#21     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#22     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#23     _rootRunUnary (dart:async/zone.dart:1407:47)
#24     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#25     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#26     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#27     _rootRunUnary (dart:async/zone.dart:1407:47)
#28     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#29     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#30     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#31     _rootRunUnary (dart:async/zone.dart:1407:47)
#32     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#33     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#34     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#35     _rootRunUnary (dart:async/zone.dart:1407:47)
#36     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#37     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#38     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#39     _rootRunUnary (dart:async/zone.dart:1407:47)
#40     _FutureListener.handleValue (dart:async/future_impl.dart:156:18)
#41     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:840:45)
#42     Future._propagateToListeners (dart:async/future_impl.dart:869:13)
#43     Future._asyncCompleteWithValue.<anonymous closure> (dart:async/future_impl.dart:715:7)
#44     _rootRun (dart:async/zone.dart:1399:13)
#45     _CustomZone.runGuarded (dart:async/zone.dart:1209:7)
#46     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:1249:23)
#47     _startMicrotaskLoop (dart:async/schedule_microtask.dart:49:5)
#48     _Timer._runTimers (dart:isolate-patch/timer_impl.dart:405:11)
#49     _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:189:12)

I will try to create some reproducible example in the next few days.

@knaeckeKami
Copy link
Collaborator

Thanks.

Maybe the aggressive memoization of built_value_generator + keeping all the SerializerSourceLibrary objects in memory is the issue.

I removed a few suspicious @memoized fields in built_value_generator in my fork https://github.com/knaeckeKami/built_value.dart/tree/relax_memoization .

you can try it out using dependency overrides for built_value_generator

  built_value_generator:
    git:
      url: https://github.com/knaeckeKami/built_value.dart.git
      ref: relax_memoization
      path: built_value_generator

This will probably increase the code generation time significantly, but might be a workaround until a fix in built_value is found.

@LiLatee
Copy link
Author

LiLatee commented Dec 14, 2023

This will probably increase the code generation time significantly, but might be a workaround until a fix in built_value is found.

Yep, that worked, thanks 🎉 , but the code has been generating about 2 hours. Let's leave it for now. I will back to the topic in a few days with a reproducible example (I hope so).

@knaeckeKami
Copy link
Collaborator

Glad to hear that! There are definitely ways to fix this without disabling memoization completely and re-calculating the fields on every access, built_value_generator just needs to be a bit smarter with keeping references to objects with all these calculated fields.

This should improve the built time again. I opened an issue in built_value.

@LiLatee
Copy link
Author

LiLatee commented Dec 20, 2023

Hi @knaeckeKami!
I think we found the main problem in our case. I removed the commented part of the fragment (that is used in maaaaaany queries/fragments in our schema) and now code generation takes ~6min. Instead of 2.7mln lines of code, we have just 11k. It also solved that problem #560
image (4)

I don't know why but, bcs of that code reusing fragments doesn't work. So almost every query has to generate new fragments.

BUT there is still some issue. Now we have an issue with the deserialization of fragments used in unions
image
image

Do you have any idea how to fix that? 🤔 If you need a repo with an example to reproduce then let me know 🙏

BTW we use that branch with fix.

dependency_overrides:
   gql_code_builder:
      git:
           ref: possible_reuse_fragment_fix
           url:  https://github.com/gql-dart/gql.git
           path: codegen/gql_code_builder

@knaeckeKami
Copy link
Collaborator

knaeckeKami commented Dec 20, 2023

cool!

Is this a new problem? The reuse_fragment flag is still experimental, and the author actually opened a PR with some fixes which i did not get to merge yet / i also asked them to verify that the bug with the exponential codegen time on unions is fixed.

It's possible that this PR: https://github.com/gql-dart/gql/pull/417/files fixes the deserialization issue, but it's possible that the exponential codegen time is still there (that was just a one line change though and should be easy to integrate)

If you disable the reuse_fragment flag, I guess the codegen time and size is still too high?

@LiLatee
Copy link
Author

LiLatee commented Dec 20, 2023

If you disable the reuse_fragment flag, I guess the codegen time and size is still too high?

yep

It's possible that this PR: gql-dart/gql#417 (files) fixes the deserialization issue, but it's possible that the exponential codegen time is still there (that was just a one line change though and should be easy to integrate)

I am going to check it ❤️

@LiLatee
Copy link
Author

LiLatee commented Dec 21, 2023

Checked. It looks like it solves the aforementioned problem, but I am getting some errors related to incorrect return type. I am going to investigate it and create some example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants