Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: HybridCache: span-based fetch #112866

Open
mgravell opened this issue Feb 24, 2025 · 15 comments
Open

[API Proposal]: HybridCache: span-based fetch #112866

mgravell opened this issue Feb 24, 2025 · 15 comments
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-Extensions-Caching blocking Marks issues that we want to fast track in order to unblock other important work untriaged New issue has not been triaged by the area owner

Comments

@mgravell
Copy link
Member

mgravell commented Feb 24, 2025

Background and motivation

This is follow-on work related to DefaultInterpolatedStringHandler and MemoryCache; the core idea here is that HybridCache should be able to satisfy local "hit" requests synchronously without the code of a string key parameter, instead leaving the key unmaterialized. This is achieved by taking the existing 2 HybridCache.GetOrCreateAsync methods that take a string key parameter, and add an overload that has ref DefaultInterpolatedStringHandler key instead. Since Roslyn prefers this signature, existing HybridCache code of the form:

var value = hybridCache.GetOrCreateAsync($"/some/{key}/path/{id}", ...);

will automatically update (at next recompile) to the new usage, with zero code changes required. We will also ensure documentation is clear about preferring this API.

Since this is a public abstraction with 1st and 3rd party implementations, and DefaultInterpolatedStringHandler is a dangerous type, we do not blindly let implementations deal with this; the two new methods are non-virtual, with a new virtual method that takes the key as ReadOnlySpan<char> key. That way, implementations are not exposed to the fiddly DefaultInterpolatedStringHandler API and we do not risk leaks / double-pool-restore.

API compatibility:

This should probably target net10+ only, since it requires related features. If the MemoryCache API gets backported to net9 (unlikely), in theory we could use the DefaultInterpolatedStringHandler via unsafe-accessor, but: it seems preferable to consider this as net10+.

API Proposal

namespace Microsoft.Extensions.Caching.Hybrid;

public abstract partial class HybridCache
{
+    protected virtual ValueTask<T> GetOrCreateAsync<TState, T>(ReadOnlySpan<char> key, TState state, Func<TState, CancellationToken, ValueTask<T>> factory,
+        HybridCacheEntryOptions? options = null, IEnumerable<string>? tags = null, CancellationToken cancellationToken = default)
+        => GetOrCreateAsync(key.ToString(), state, factory, options, tags, cancellationToken);

+   public ValueTask<T> GetOrCreateAsync<T>(ref DefaultInterpolatedStringHandler key, Func<CancellationToken, ValueTask<T>> factory,
+     HybridCacheEntryOptions? options = null, IEnumerable<string>? tags = null, CancellationToken cancellationToken = default)
+ {
+     try
+     {
+         return GetOrCreateAsync(key.Text, factory, WrappedCallbackCache<T>.Instance, options, tags, cancellationToken);
+     }
+     finally
+     {
+         key.Clear();
+     }
+ }

+    public ValueTask<T> GetOrCreateAsync<TState, T>(ref DefaultInterpolatedStringHandler key, TState state, Func<TState, CancellationToken, ValueTask<T>> factory,
+        HybridCacheEntryOptions? options = null, IEnumerable<string>? tags = null, CancellationToken cancellationToken = default)
+    {
+        try
+        {
+            return GetOrCreateAsync(key.Text, state, factory, options, tags, cancellationToken);
+        }
+        finally
+        {
+            key.Clear();
+        }
+    }
}

Notes:

  • implementations shown here to illustrate that we're not breaking abstract rules
  • the finally without await does not in this instance represent premature cleanup; the span cannot traverse the async/await boundary, so we know it is safe to call Clear() in all exit scenarios

The bottom two methods are the public API surface, using try/finally to ensure correct handling of the key. The first two method is the protected virtual method that specific implementations may choose to override to exploit allocation-free L1 lookups. If they do not choose to do so, we simply allocate the string as we would have originally. This ensures the default behaviour is valid on existing implementations.

A note is added to clarify the slightly unusual usage of a span on a *Async method:

<remarks>If the specific implementation cannot get the value synchronously, that implementation must copy the <paramref name="key"/> contents (using a leased buffer, string, or similar) before the asynchronous transition.</remarks>

This atypical usage is because of the high probability of local synchronous "hit" results, hence ValueTask<T> as the return value.

API Usage

No change at the caller:

var value = hybridCache.GetOrCreateAsync($"/some/{key}/path/{id}", ...);
@mgravell mgravell added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 24, 2025
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 24, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 24, 2025
@mgravell mgravell added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 24, 2025
@mgravell
Copy link
Member Author

/cc @jodydonetti

@vcsjones vcsjones added area-Extensions-Caching and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 24, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-extensions-caching
See info in area-owners.md if you want to be subscribed.

@julealgon
Copy link

@mgravell is the idea here that whenever there is a cache hit, it doesn't end up allocating the string for the key when using interpolation handlers? So, it only allocates if it is generating the actual entry (first time or after expiration, etc)?

If that's the case, wouldn't it still be a better idea to create a dedicated struct (say, a record struct, which gets equality implemented by default) to store the key "variables" instead of having them be part of a string which uses a lot more memory?

Using your example:

var value = hybridCache.GetOrCreateAsync(new MyCustomKeyStruct(key, id), ...);

Or is there a major downside to doing something like that?

Sorry if this is a dumb question, I'm just not as familiar with interpolation handlers yet and was curious.

@mgravell
Copy link
Member Author

mgravell commented Feb 24, 2025

@julealgon the idea of HybridCache here is to radically simplify dealing with IMemoyCache and IDistributedCache; in both cases we need the ability to talk their native language - IMemoryCache talks object, and IDistributedCache talks string. Fortunately, most IMemoryCache is actually MemoryCache, which we can type-test against to access the alt-lookup API (new in net10 for MemoryCache - it specializes string keys internally already, for the string rehashing dictionary feature, and also for Marvin on down-level). This means with this one simple change, we get zero change to the perceived calling code, and end-to-end alloc-free L1 hits.

Now contrast MyCustomKeyStruct: this would need to be per consumer, since different call paths have different key parameters, they'd need to implement all the washing logic (or we rely on them using record struct, not a reasonable assumption), they have to pay boxing for IMemoryCache, we need a formatter for IDistributedCache, etc. We can't have a per-key TKey without RCache or similar, and that would also force either a HybridCache<TKey> or internal per-call type lookup.

Basically: the string-like API is a very practical and efficient way of avoiding a wide raft of problems; this last change allows us to avoid the actual string in the intended most common code path.

instead of having them be part of a string which uses a lot more memory?

Don't forget:

  • in the miss case, we need the string anyway, for IDistributedCache, and an object for IMemoryCache which might as well be the string key that we'd need in some other cases
  • in the hit case: we don't generate a string, so "a lot more" becomes zero

There is also the "no IDistributedCache" scenario; we'd still need to box the struct, which isn't free, but is probably cheaper than the string, I'll grant - but we'd also lose the dictionary rehashing feature; this is specific to TKey===string.

Also, to underline the point: HybridCache is already locked and loaded - we can't change it radically without making it something else. This is an optimization of an existing API, not a complete overhaul. HybridCache currently takes string: we want to make that cheaper.

@julealgon
Copy link

@mgravell

IMemoryCache talks object, and IDistributedCache talks string.

As someone who is currently heavily using only IMemoryCache in our codebase (we have plans for a distributed cache but its not yet in effect), I completely forgot IDistributedCache was string due to the serialization aspect.

Now contrast MyCustomKeyStruct: this would need to be per consumer, since different call paths have different key parameters, they'd need to implement all the washing logic (or we rely on them using record struct, not a reasonable assumption),

Yeah, we've been creating dedicated key record struct's for each scenario. I guess the benefits only exist now when we are still using IMemoryCache in NET472 land, but like you alluded to most of that would go away with this allocation-free path.

... they have to pay boxing for IMemoryCache

I feel dumb for not realizing this. Of course... because it's a value type and the key is not generic. 🤦‍♂️

Basically: the string-like API is a very practical and efficient way of avoiding a wide raft of problems; this last change allows us to avoid the actual string in the intended most common code path.

Noted. Thanks!

There is also the "no IDistributedCache" scenario; we'd still need to box the struct, which isn't free, but is probably cheaper than the string, I'll grant - but we'd also lose the dictionary rehashing feature; this is specific to TKey===string.

Interesting. So, at the end of the day, string-based keys will probably be the recommended way to handle cache keys of all types I assume.

I believe in our current scenario (pure IMemoryCache + NET472) it is probably quite wasteful, but with these improvements in NET9/10 most of the downsides would vanish.

Also, to underline the point: HybridCache is already locked and loaded - we can't change it radically without making it something else. This is an optimization of an existing API, not a complete overhaul. HybridCache currently takes string: we want to make that cheaper.

👍

@jodydonetti
Copy link
Contributor

As someone who is currently heavily using only IMemoryCache in our codebase (we have plans for a distributed cache but its not yet in effect), I completely forgot IDistributedCache was string due to the serialization aspect.

Not really, it's string because it's the type of the key in IDistributedCache, see here.

@jodydonetti
Copy link
Contributor

jodydonetti commented Feb 24, 2025

Hi @mgravell

Background and motivation

the core idea here is that HybridCache should be able to satisfy local "hit" requests synchronously without the code of a string key parameter, instead leaving the key unmaterialized.

Dumb question (I'm still thinking about it), but in the case of L1+L2 this will allocate a string every time L2 is needed, correct?

Not that it is worse than now, mind you, just pointing it out for me to get the full picture.

@julealgon
Copy link

As someone who is currently heavily using only IMemoryCache in our codebase (we have plans for a distributed cache but its not yet in effect), I completely forgot IDistributedCache was string due to the serialization aspect.

Not really, it's string because it's the type of the key in IDistributedCache, see here.

@jodydonetti you might've misunderstood what I meant there. I was precisely talking about IDistributedCache, the same one you linked.

I was just making a point that we are not yet using IDistributedCache in our application, only IMemoryCache, and I had forgotten that IDistributedCache's key was a string.

I was not talking about HybridCache's interface there.

@jodydonetti
Copy link
Contributor

you might've misunderstood what I meant there. I was precisely talking about IDistributedCache, the same one you linked.
I was just making a point that we are not yet using IDistributedCache in our application, only IMemoryCache, and I had forgotten that IDistributedCache's key was a string.

I was not talking about HybridCache's interface there.

Yes I got that, I referenced one of IDistributedCache methods just to point out that the key there was string, since you mentioned "I completely forgot IDistributedCache was string due to the serialization aspect" and the serialization part was not the reason strings were involved with IDistributedCache.

When Marc mentioned "IMemoryCache talks object, and IDistributedCache talks string" I think he was referring to the keys being object and string, not the values (which are object and byte[]).

Anyway it seems like I got that wrong, my bad.

@GerardSmit
Copy link

Alternatively, instead of DefaultInterpolatedStringHandler the key could a generic type that implements ISpanFormattable and allows ref struct:

public abstract partial class HybridCache
{
+    public abstract ValueTask<T> GetOrCreateAsync<TKey, T>(
+        TKey key, Func<CancellationToken, ValueTask<T>> factory, HybridCacheEntryOptions? options = null,
+        IEnumerable<string>? tags = null, CancellationToken cancellationToken = default)
+        where TKey : ISpanFormattable, allows ref struct;
}

In this case you could make a custom type like MyCustomKeyStruct that implements ISpanFormattable:

var value = hybridCache.GetOrCreateAsync(new MyCustomKeyStruct(key, id), ...);

However, DefaultInterpolatedStringHandler does not implement ISpanFormattable, so a custom extension has to be made:

Example extension
public static class DefaultHybridCacheExtensions
{
  // Extension for DefaultInterpolatedStringHandler
  public static ValueTask<T> GetOrCreateAsync<T>(this HybridCache cache,
      ref DefaultInterpolatedStringHandler key,
      Func<CancellationToken, ValueTask<T>> factory,
      HybridCacheEntryOptions? options = null, IEnumerable<string>? tags = null,
      CancellationToken cancellationToken = default)
  {
      var wrapper = new DefaultInterpolatedStringHandlerWrapper(key);

      return cache.GetOrCreateAsync(wrapper, factory, options, tags, cancellationToken);
  }

  // Wraps DefaultInterpolatedStringHandler to ISpanFormattable
  private ref struct DefaultInterpolatedStringHandlerWrapper(DefaultInterpolatedStringHandler handler) : ISpanFormattable
  {
      private DefaultInterpolatedStringHandler _handler = handler;

      public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider)
      {
          // Workaround: Text is internal. PR #112171
          var span = GetText(ref _handler);

          if (span.Length > destination.Length)
          {
              charsWritten = 0;
              return false;
          }

          span.CopyTo(destination);
          charsWritten = span.Length;
          return true;

          [UnsafeAccessor(UnsafeAccessorKind.Method, Name = "get_Text")]
          static extern ReadOnlySpan<char> GetText(ref DefaultInterpolatedStringHandler handler);
      }

      public string ToString(string? format, IFormatProvider? formatProvider) => _handler.ToString();

      public override string ToString() => _handler.ToString();
  }
}

Down-side: the key has to be copied from DefaultInterpolatedStringHandler to a temporary buffer (with TryFormat).

@mgravell
Copy link
Member Author

the key could a generic type that implements ISpanFormattable and allows ref struct

Maybe that makes sense for an additional/extension convenience API, but we could only use it to generate a string-like key - we couldn't use that as the key (can't store an allows ref-struct, and even if we remove that: boxing, etc) - and I think it complicates the key-generation mechanism - it becomes an interpolated string with extra steps

@jodydonetti
Copy link
Contributor

jodydonetti commented Feb 24, 2025

Since Roslyn prefers this signature, existing HybridCache code of the form:

var value = hybridCache.GetOrCreateAsync($"/some/{key}/path/{id}", ...);
will automatically update (at next recompile) to the new usage, with zero code changes required.

This is smart, and I like the automatic lifting.

Right now I'm thinking about the possible intersections of the different overloads, including the ones with the stateful factory, which when used is typically done to pass along the key to the factory itself.
In that case that would not work I suppose, so I'm wondering if the overload resolution can have doubts or throw errors on that, but I can't be sure right now without trying it.

We will also ensure documentation is clear about preferring this API.

Yes, this is very important to avoid surprises.

Since this is a public abstraction with 1st and 3rd party implementations, and DefaultInterpolatedStringHandler is a dangerous type, we do not blindly let implementations deal with this; the two new methods are non-virtual, with a new virtual method that takes the key as ReadOnlySpan<char> key. That way, implementations are not exposed to the fiddly DefaultInterpolatedStringHandler API and we do not risk leaks / double-pool-restore.

👍

One question: there's no plan for new overloads of Set/Remove/etc, right?
I suppose the hypothetical usage frequency is so low that it's not deemed worthy for the upgrade, correct? (btw, it makes sense)

API compatibility:

This should probably target net10+ only, since it requires related features. If the MemoryCache API gets backported to net9 (unlikely), in theory we could use the DefaultInterpolatedStringHandler via unsafe-accessor, but: it seems preferable to consider this as net10+.

There's a lot going on already, and 10 will be LTS, so I would stick to that only. My 2 cents.

A note is added to clarify the slightly unusual usage of a span on a *Async method:

If the specific implementation cannot get the value synchronously, that implementation must copy the contents (using a leased buffer, string, or similar) before the asynchronous transition.
This atypical usage is because of the high probability of local synchronous "hit" results, hence ValueTask<T> as the return value.

Good call, preferential treatment for the happy path.

API Usage

No change at the caller:

var value = hybridCache.GetOrCreateAsync($"/some/{key}/path/{id}", ...);

Chef's kiss.

Closing remarks: this is all really nifty stuff, and it can make a difference with really high perf scenarios.

I'f I'm being honest, the thing I'm worried about is how all of this will play out with the rest of the codebase, overloads and scenarios.

For example logging/OTEL: if they are enabled, and the key is (presumably) written, would it still be possible to zero-allocate?

And let's say it's not possible: we may think "no big deal, we would have had the string anyway" right? Eh, no sure about that, since if we don't keep around the string key for the lifetime of the entire method (and sub-methods), at every log/trace point we would need to allocate a new string, every time, or store that at first use and pass that around.

I hope the problem is that I just don't have imagined every little piece in a clear way right now, but just wanted to share these second-order thinking with you and the team.

@mgravell
Copy link
Member Author

Re logging/OTEL, I can think of some ways of doing that without allocating, but I suspect you'd usually not want to not log extensive data in the "hit" path - so maybe a more pragmatic approach here is to ensure you have a string if full debug logging is enabled? My intention here (for the default implementation) is to use a single code path between the two versions via

ValueTask<T> NamingIsHardAsync(ReadOnlySpan<char> keySpan, string? keyString, ...)

Using (key.AsSpan(), key) and (key, null) respectively - allowing the implementation to upgrade keyString ??= keySpan.ToString() as necessary - whether for L2, L1 "write", etc.

One question: there's no plan for new overloads of Set/Remove/etc, right?

That is correct for now. If there's a real need for them: then maybe! Perhaps questionable for Add, for the simple reason that you're always going to need to materialize a string for add, to store the key - unless perhaps it already exists and you're throwing! But remove, upsert, etc... not necessarily a problem.

@mgravell
Copy link
Member Author

which when used is typically done to pass along the key to the factory itself.

Being genuinely serious: I'm totally fine with that being an upsell "here's a scenario where FusionCache offers a more tailored API for these nuanced scenarios" - I don't think HybridCache is ever going to tick every box for every niche.

@jodydonetti
Copy link
Contributor

which when used is typically done to pass along the key to the factory itself.

Being genuinely serious: I'm totally fine with that being an upsell "here's a scenario where FusionCache offers a more tailored API for these nuanced scenarios" - I don't think HybridCache is ever going to tick every box for every niche.

Really appreciate the transparency... even though even FusionCache is not there yet 😅 (see here).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-Extensions-Caching blocking Marks issues that we want to fast track in order to unblock other important work untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants