Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: For Arctic keyd table, the out-of-order TransactionId lead to inconsistent data #479

Closed
1 task done
wangtaohz opened this issue Oct 17, 2022 · 0 comments · Fixed by #564 or #654
Closed
1 task done
Labels
module:ams-optimizer AMS optimizer module type:bug Something isn't working
Milestone

Comments

@wangtaohz
Copy link
Contributor

wangtaohz commented Oct 17, 2022

What happened?

Arctic keyed table's TransactionId may be out-of order, if spark and flink write into change table at the same time.

Affects Versions

master/0.3.1

What engines are you seeing the problem on?

Core, AMS

How to reproduce

1.Spark write into change, begin transaction1, get tid = 1
2.Flink write into change, begin transaction2, get tid = 2
3.Fllink commit transaction2 with tid = 2
4.Minor Optimize transfer change files to base store with tid = 2
5.Spark commit transaction1 with tid = 1
6.Optimize/MOR will ignore change files with tid = 1, and lead to data loss

Relevant log output

No response

Anything else

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@wangtaohz wangtaohz added the type:bug Something isn't working label Oct 17, 2022
@wangtaohz wangtaohz added this to the Release 0.4.0 milestone Oct 17, 2022
@baiyangtx baiyangtx added the module:ams-optimizer AMS optimizer module label Oct 18, 2022
wangtaohz added a commit to wangtaohz/amoro that referenced this issue Nov 7, 2022
wangtaohz added a commit to wangtaohz/amoro that referenced this issue Nov 8, 2022
@wangtaohz wangtaohz linked a pull request Nov 9, 2022 that will close this issue
3 tasks
zhoujinsong pushed a commit that referenced this issue Nov 13, 2022
* fix #479 add property base.table.max-transaction

* refactor change scan

* file cache add snapshot sequence

* remove record all partitions

* fix ams unit test
expire change sansphot with max-txId

* refactor change table scan and fix spark test

* refactor BaseChangeTableIncrementalScan with entriesTable

* refactor overwrite/partitionRewrite API to set transactionId/legacyTransactionId

* revert expire change table snapshot with transactionId

* mor to support legacy table

* refactor overwrite remove set legacy transaction id

* fix derby file_info_cache mapper

* fix spark insert overwrite unit test

* fix hive insert overwrite unit test

* rename some api and remove useless import

* add truncate sql

Co-authored-by: wangtao <[email protected]>
zhoujinsong pushed a commit that referenced this issue May 31, 2023
* fix #479 add property base.table.max-transaction

* refactor change scan

* file cache add snapshot sequence

* remove record all partitions

* fix ams unit test
expire change sansphot with max-txId

* refactor change table scan and fix spark test

* refactor BaseChangeTableIncrementalScan with entriesTable

* refactor overwrite/partitionRewrite API to set transactionId/legacyTransactionId

* revert expire change table snapshot with transactionId

* mor to support legacy table

* refactor overwrite remove set legacy transaction id

* fix derby file_info_cache mapper

* fix spark insert overwrite unit test

* fix hive insert overwrite unit test

* rename some api and remove useless import

* add truncate sql

Co-authored-by: wangtao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-optimizer AMS optimizer module type:bug Something isn't working
Projects
None yet
2 participants