perf, memory: Improve performance and memory use for large datasets #5927

mleibman-db · 2025-02-21T00:15:49Z

This PR moves all duplicate row instance methods to row proto, drastically reducing memory use for large datasets.
Fixes issue #5926.

For a table of 50'000x5, this reduces table memory use from 136 Mb to 4.8 Mb (28x).
The initialization time (accessRows()) is also reduced from 132ms to 18ms (7x).

Before

After

Deltas

KevinVandy · 2025-02-21T17:21:06Z

This idea is awesome, and it's something that I've been thinking about for a long time.

However, it's a bigger change and I would personally like to see this PR targeted at alpha so we can test the pattern there and even take this idea further.

V9 probably won't be stable for a few more months, but I hope that you still would be able to help implement this there.

We should follow this same pattern for not only rows, but also cells, headers, and header groups. The table object is the only one where I don't see this as necessary since there is only 1 table created.

What do you think?

mleibman-db · 2025-02-21T18:38:26Z

@KevinVandy Thank you for the quick response!

Is there any way we can apply this fix to V8 (pretty please)?

This is a bit of a pressing issue for us. We use the table in Notebooks in our product (Databricks), where it's not unusual to have dozens of tables on a page with up to 10k rows each, which results in the page taking up multiple gigabytes of memory, with over 70% of it coming from the table. We have some ideas for workarounds in our code, but they are somewhat brittle/hacky, and I would much prefer to contribute to improving the table itself.

We could increase the scope of this change and apply the pattern more broadly (I saw a lot of areas where we could improve scalability wrt efficiency), but my goal here was to make a minimally invasive fix w/o changing the API that gets us most of the way there. As is, this PR reduces table memory use by ~28x, which is more than enough to address our current needs.

In terms of applying the same pattern to headers and header groups, yes, that would have a similar impact on tables with a very large number of columns, but that is a lot less common than tables with lots of rows. We're talking about tens of thousands of columns here. It does happen, but is something that can easily be added in a separate PR.

Applying this to cells is not as effective however, since cells are created on-demand as they are rendered, so unlike rows, they don't use up memory unless you render (or scroll through in case of virtualization) tens of thousands of cells.

KevinVandy · 2025-02-21T19:58:20Z

I'm definitely still open to merging this to v8, I'll just need to do some extensive regression testing

nx-cloud · 2025-02-21T19:58:34Z

View your CI Pipeline Execution ↗ for commit a964f41.

Command	Status	Duration	Result
`nx affected --targets=test:format,test:sherif,t...`	✅ Succeeded	1m 51s	View ↗
`nx run-many --targets=build --exclude=examples/**`	✅ Succeeded	34s	View ↗

☁️ Nx Cloud last updated this comment at 2025-02-21 20:03:55 UTC

pkg-pr-new · 2025-02-21T20:02:32Z

Open in Stackblitz

More templates

@tanstack/angular-table

npm i https://pkg.pr.new/@tanstack/angular-table@5927

@tanstack/lit-table

npm i https://pkg.pr.new/@tanstack/lit-table@5927

@tanstack/match-sorter-utils

npm i https://pkg.pr.new/@tanstack/match-sorter-utils@5927

@tanstack/qwik-table

npm i https://pkg.pr.new/@tanstack/qwik-table@5927

@tanstack/react-table

npm i https://pkg.pr.new/@tanstack/react-table@5927

@tanstack/react-table-devtools

npm i https://pkg.pr.new/@tanstack/react-table-devtools@5927

@tanstack/solid-table

npm i https://pkg.pr.new/@tanstack/solid-table@5927

@tanstack/svelte-table

npm i https://pkg.pr.new/@tanstack/svelte-table@5927

@tanstack/table-core

npm i https://pkg.pr.new/@tanstack/table-core@5927

@tanstack/vue-table

npm i https://pkg.pr.new/@tanstack/vue-table@5927

commit: a964f41

KevinVandy · 2025-02-21T20:05:34Z

@mleibman-db You can install npm i https://pkg.pr.new/@tanstack/react-table@5927 right now to try out the preview NPM version in your code.

For the alpha branch, it would need a redo instead of merging up. So your help would be appreciated there if you have time.

And yes, I forgot to include column objects in my original feedback. Those would be 2nd most important. It's interesting that cells don't have this problem as much, but that makes sense.

mleibman-db · 2025-02-21T20:17:02Z

@KevinVandy I don't do much OSS development, so I'm going to need some help / hand-holding here :) Do I need to do a separate PR to apply the changes to the alpha branch? Is that instead of this one, or in addition to? Not sure what I need to do here.

Re: doing the same thing for column objects. As I mentioned, in most cases it wouldn't be as impactful since tables with tens of thousands of columns are much less rare than tables with lots of rows, but I'm happy to make that change as well. I'd probably do that in a separate PR though to limit the scope.

KevinVandy · 2025-02-21T20:23:27Z

The scope of this pr to the main v8 branch is fine/good.

However, the alpha v9 branch has been heavily refactored with new approaches to assigning APIs to these objects. In the v9 alpha branch, I'd hope to find an approach that follows this new strategy for everything as much as possible.

mleibman-db · 2025-02-21T20:27:27Z

Ok, so IIUIC, I'll leave this PR as-is to proceed with code review, testing, and inclusion in v8, and will look at v9 to see how things are different there and what I need to do to re-apply them there.

KevinVandy · 2025-02-21T20:40:06Z

Ok, so IIUIC, I'll leave this PR as-is to proceed with code review, testing, and inclusion in v8, and will look at v9 to see how things are different there and what I need to do to re-apply them there.

That would be awesome. I realize the follow up for the v9 alpha work is a big extra ask, but hopefully a fun and interesting thing for you to look at helping us out. It will need a slightly different approach.

One of the main goals of v9 is to strip down the bundle sizes (and memory usage) of table instances down to just the features that apps are actually using. This PR is very much on theme for that.

mleibman-db · 2025-02-21T20:44:15Z

Will do!

KevinVandy · 2025-02-21T21:25:19Z

packages/table-core/src/core/row.ts

+      original: undefined as TData,
+      subRows: [],
+      _valuesCache: {},
+      _uniqueValuesCache: {},


Just wondering if we might get in trouble by storing these values on the proto? The wisdom I heard last time I worked on stuff using the prototypes was to only store functions on the prototype

With the exception of subRows, they are all unused (never referenced on the proto) and are only here for TypeScript type checking.

The advice you heard is most likely referring to potential confusion if one ends up modifying a property on the proto instead of an instance, which results in the prop changing in all instances. This is not happening here for subRows since it is never modified directly and is only reassigned, which sets the value on the instance.

KevinVandy · 2025-02-21T21:27:58Z

packages/table-core/src/features/ColumnFiltering.ts

@@ -411,6 +403,23 @@ export const ColumnFiltering: TableFeature = {

      return table._getFilteredRowModel()
    }
+
+    Object.assign(getRowProto(table), {


At one point, we had removed most usages of Object.assign in favor of direct assignment as a performance improvement at scale. Wonder if that's still applicable to consider here.

It wouldn't be an issue here since it's only called once per table anyway. Your question would apply more to createRow() in row.ts since we call it once per row there, but AFAIK, there are no known performance issues around Object.assign(). There have been some many years ago when it was just introduced and browser support was fresh (plus there were polyfills), but that hasn't been the case in quite some time.

@KevinVandy beat me to it. I like the idea but am not a big fan of typing the prototype as CoreRow which is not strictly accurate, (and requires us to create these dummy values to keep typescript happy).

@mleibman-db did you try making the createRow function into a constructor function, adding the methods directly to the prototype? I haven't tried it myself but intuitively it feels like it should work. Would need to always call createRow with the new keyword I think.

Typing the row proto as CoreRow is actually very useful since it provides type safety and makes sure the methods only access defined props there. The use of default unused values there doesn't strike me as concerning, but we could try to replace them with some purely TypeScript type annotations, though IMHO that would be more hacky.

I'm not sure I understand what you're proposing. Could you elaborate?

since it provides type safety

It's the wrong type though, isn't it? The prototype shouldn't have the instance properties on it.

Could you elaborate?

I am imagining something approximately like the below. I haven't tried but think it should work, happy to be corrected. The naming would be a bit weird though. createRow should probably become just Row, but that would be a breaking change - not sure what to do about that.

const createRow = <TData>( this: CoreRow<TData>, table: Table<TData>, id: string, original: TData, rowIndex: number, depth: number, subRows?: Row<TData>[], parentId?: string ) => { this.id = id this.original = original // etc. } createRow.prototype.getValue = (columnId: string) => { // ... return this._valuesCache[columnId] as any }

elsewhere:

const row = new createRow(...)

Anywhere where we are thinking that an alternative would be cleaner, but it's a breaking change, can be reserved for a v9 pr. So far this PR looks mostly good. We don't have to assign dummy vars to the prototype just to satisfy TypeScript. A cast could be acceptable there.

If the Object.assign only gets called once, that is negligible and something we don't need to worry about. Direct assignment was a performance improvement in this pr that sped up rendering when creating 10k+ rows. This PR is solving the memory side of that same issue. In conclusion, I'm not worried about this after you explained more.

so you can't just create the proto once at the module level

But I think you can merge the feature.createRow prototypes into the prototype of the object returned by the core createRow function at runtime, when new createRow() is called. In the same loop where we currently call feature.createRow in the core createRow() function body. I haven't tested this though. In this case the prototype's methods would be created at module level on each of the features' createRow functions.

vastly preferring classes

Personally I am not opposed to using a class if it makes typing easier.

(and just for anyone reading this ... the code snippet in this comment should be using function createRow() {}, not an arrow function!)

But happy to change if you feel strongly about it

I was actually agreeing with you that since it's not called many times, it wouldn't be likely to cause issues. I was just trying to explain the likely cause of perf issues - not due to Object.assign() itself, but rather the fact that it it often called like this:

Object.assign( targetObject, // <-- existing object { // new source object which will be garbage collected eventually }, )

If it's used this way in a loop with many thousands of iterations, you can run into perf issues due to garbage collection.

We don't have to assign dummy vars to the prototype just to satisfy TypeScript. A cast could be acceptable there.

Done.

tombuntus · 2025-02-21T22:08:11Z

packages/table-core/src/features/ColumnFiltering.ts

@@ -362,14 +362,6 @@ export const ColumnFiltering: TableFeature = {
    }
  },

-  createRow: <TData extends RowData>(


In the core createRow function, we still call these feature.createRow functions if they exist, passing them the row and table instance. That should prevent breaking changes for existing custom features, but we may want to recommend custom features to take the same approach (i.e. extend the prototype). @KevinVandy what do you think about this?

I haven't thought all the details through but something like retaining a createRow function in each feature, and in the core createRow function both calling the feature.createRow function with the row and table instances (to prevent breaking changes for existing custom features), and also merging its prototype onto the core createRow prototype.

That way we could also retain the createRow functions in the core features, (just move the methods onto the prototype), and wouldn't need the getRowProto and Object.assign() approach I think.

+1 to generally recommending people use the same approach for implementing custom features. I considered making things more explicit by adding methods like initRowProto() to TableFeature interface, but decided against it for simplicity's sake, plus this is more of an internal implementation detail than a public API.

This kind of pattern will be useful to think about in the alpha branch though

improve performance and memory use

73a6b1e

mleibman-db changed the title ~~perf, memory: Improve performance and memory use for large datasets (#5926)~~ perf, memory: Improve performance and memory use for large datasets Feb 21, 2025

mleibman-db mentioned this pull request Feb 21, 2025

Performance issue: createRow in tanstack/table-core uses excessive memory #5926

Open

2 tasks

mleibman-db changed the title ~~perf, memory: Improve performance and memory use for large datasets~~ perf, memory: Improve performance and memory use for large datasets (WIP) Feb 21, 2025

mleibman-db added 2 commits February 21, 2025 11:16

lazy-init optional instance values

0cc9f19

remove test code

3ff5df9

mleibman-db force-pushed the main branch from 4d37f6d to 3ff5df9 Compare February 21, 2025 19:31

mleibman-db changed the title ~~perf, memory: Improve performance and memory use for large datasets (WIP)~~ perf, memory: Improve performance and memory use for large datasets Feb 21, 2025

ci: apply automated fixes

a964f41

KevinVandy requested review from tombuntus and KevinVandy February 21, 2025 21:15

removed unused wip code

24710f8

KevinVandy reviewed Feb 21, 2025

View reviewed changes

tombuntus reviewed Feb 21, 2025

View reviewed changes

mleibman-db added 2 commits February 21, 2025 16:06

make typing more explicit and make 'subRows' safer

0769660

make memo() more self-consistent

3620a00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf, memory: Improve performance and memory use for large datasets #5927

perf, memory: Improve performance and memory use for large datasets #5927

mleibman-db commented Feb 21, 2025 •

edited

Loading

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025 •

edited

Loading

KevinVandy commented Feb 21, 2025

nx-cloud bot commented Feb 21, 2025 •

edited

Loading

pkg-pr-new bot commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

KevinVandy Feb 21, 2025

mleibman-db Feb 21, 2025

KevinVandy Feb 21, 2025

mleibman-db Feb 21, 2025

tombuntus Feb 21, 2025

mleibman-db Feb 21, 2025

tombuntus Feb 21, 2025

KevinVandy Feb 21, 2025

tombuntus Feb 21, 2025 •

edited

Loading

tombuntus Feb 21, 2025

tombuntus Feb 21, 2025

mleibman-db Feb 22, 2025

tombuntus Feb 21, 2025

mleibman-db Feb 21, 2025

KevinVandy Feb 21, 2025

perf, memory: Improve performance and memory use for large datasets #5927

Are you sure you want to change the base?

perf, memory: Improve performance and memory use for large datasets #5927

Conversation

mleibman-db commented Feb 21, 2025 • edited Loading

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025 • edited Loading

KevinVandy commented Feb 21, 2025

nx-cloud bot commented Feb 21, 2025 • edited Loading

pkg-pr-new bot commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

KevinVandy commented Feb 21, 2025

mleibman-db commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tombuntus Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mleibman-db commented Feb 21, 2025 •

edited

Loading

mleibman-db commented Feb 21, 2025 •

edited

Loading

nx-cloud bot commented Feb 21, 2025 •

edited

Loading

tombuntus Feb 21, 2025 •

edited

Loading