Add `HasId` method to `ValueStore` for debug checks #4866

clavin · 2025-01-29T18:58:41Z

Small addition to ValueStore to check if an ID can be safely used to access values from the store. This change iterates on the DCHECK bounds check that currently exists in ValueStore::Get.

When working with multiple ValueStores, IDs can become mixed up and misused to access the wrong ValueStore. This always indicates a bug in the code, so this method is added for the convenience of debug checks to identify bugs.

In many situations there is only one ValueStore instance available with a unique ID type, so it's guaranteed that those IDs came from ValueStore::Add; thus, this check is usually unnecessary. To deter overly-cautious code, this method is documented as a debug tool.

I plan to use this method in a follow-up PR to harden Dump functions against out-of-bounds IDs.

jonmeow · 2025-01-29T22:39:24Z

toolchain/base/value_store.h

+  // Checks whether the ID has a value in this store.
+  //
+  // This is usually unnecessary to check, but can add safety to debug code.
+  // Note that the ID may originate from a different store with a different
+  // value.
+  auto HasId(IdT id) const -> bool {
+    return id.index >= 0 && static_cast<size_t>(id.index) < size();
+  }


What's it look like if you just check against size() in debug code?

@jonmeow I was writing size() checks in Dump functions, but found it to be a little noisy when working with ValueStore. Here's an example of my first sketch:

toolchain/check/dump.cpp:48-55:

+ if (static_cast<size_t>(loc_id.import_ir_inst_id().index) >= + context.sem_ir().import_ir_insts().size()) { + llvm::errs() << "LocId(<out of bounds import instruction>)"; + return; + } auto import_ir_id = context.sem_ir() .import_ir_insts() .Get(loc_id.import_ir_inst_id()) .ir_id; + if (static_cast<size_t>(import_ir_id.index) >= + context.sem_ir().import_irs().size()) { + llvm::errs() << "LocId(<out of bounds import file>)"; + return; + } const auto* import_file = context.sem_ir().import_irs().Get(import_ir_id).sem_ir; llvm::errs() << "LocId(import from \"" << FormatEscaped(import_file->filename()) << "\")";

For Lex & Parse, each having only one small Dump function, checking against size() is relatively quiet and works well.

Looking now at toolchain/sem_ir/dump.cpp, I realize that many ValueStore types there are wrappers. 😅 So HasId won't be as widely available as I originally thought.

One alternative would be a template function like MemUsage::Collect for types that have an array_ref() or size() member. What do you think?

I'm not sure whether this is what you're thinking about with the Collect example, but how about:

Add support to ErrorOr to support printing if T is printable.

Add a SafeGet helper somewhere like toolchain/base/dump.h (i.e., separating the dump-specific functions from regular code) that looks something like template<ValueStoreT, IdT> auto SafeGet(ValueStoreT value_store, IdT id) -> ErrorOr<ValueStoreT::ConstRefType> [if any value store-ish things are missing size, I think it'd be fair to add)

Note, I'm suggesting this to allow changes like:

static auto DumpNoNewline(const File& file, InstId inst_id) -> void { llvm::errs() << inst_id; if (inst_id.has_value()) { - llvm::errs() << ": " << file.insts().Get(inst_id); + llvm::errs() << ": " << SafeGet(file.insts(), inst_id); } }

The example you're looking at with LocId is a little more complex and would need to examine the ErrorOr, but for the SemIR case it should be easier to replace.

jonmeow · 2025-01-30T18:06:59Z

toolchain/base/value_store.h

@@ -68,13 +68,13 @@ class ValueStore

  // Returns a mutable value for an ID.
  auto Get(IdT id) -> RefType {
-    CARBON_DCHECK(id.index >= 0, "{0}", id);
+    CARBON_DCHECK(HasId(id), "{0}", id);


Tiny sidenote, SmallVector has a assert(idx < size());, which already covers what's changing here.

jonmeow · 2025-01-30T18:29:05Z

toolchain/base/value_store.h

+  // Checks whether the ID has a value in this store.
+  //
+  // This is usually unnecessary to check, but can add safety to debug code.
+  // Note that the ID may originate from a different store with a different
+  // value.
+  auto HasId(IdT id) const -> bool {
+    return id.index >= 0 && static_cast<size_t>(id.index) < size();
+  }


I'm not sure whether this is what you're thinking about with the Collect example, but how about:

Add support to ErrorOr to support printing if T is printable.

Add a SafeGet helper somewhere like toolchain/base/dump.h (i.e., separating the dump-specific functions from regular code) that looks something like template<ValueStoreT, IdT> auto SafeGet(ValueStoreT value_store, IdT id) -> ErrorOr<ValueStoreT::ConstRefType> [if any value store-ish things are missing size, I think it'd be fair to add)

Note, I'm suggesting this to allow changes like:

static auto DumpNoNewline(const File& file, InstId inst_id) -> void { llvm::errs() << inst_id; if (inst_id.has_value()) { - llvm::errs() << ": " << file.insts().Get(inst_id); + llvm::errs() << ": " << SafeGet(file.insts(), inst_id); } }

The example you're looking at with LocId is a little more complex and would need to examine the ErrorOr, but for the SemIR case it should be easier to replace.

### Context & Motivation The error handling utilities in `//base/error.h` are very useful for writing code with strong safety guarantees. While hardening the `Dump` debug utilities (from review in #4866), I encountered a rough edge with references and pointers. After a [brief Discord discussion in #contributing-help](https://discord.com/channels/655572317891461132/1052653651895779359/1334675462877610038), it was suggested that adding support for references to `ErrorOr` would be a good candidate to move forward. Using a reference type with the `ErrorOr` class (e.g. `ErrorOr<Node&>`) produces two errors: <ol> <li><strong><code>variant can not have a reference type as an alternative</code></strong> <ul><li>From private field: <code>std::variant<Error, T> val_;</code></li></ul> </li> <li><strong><code>'operator->' declared as a pointer to a reference</code></strong> <ul><li>From member function: <code>auto operator->() -> T*</code></li></ul> </li> </ol> ### Changes To support reference types, both errors are resolved: 1. `std::reference_wrapper` is conditionally used for storage when `T` is a reference type 2. type trait aliases like `using ValueT = std::remove_reference_t<T>` are used to produce compatible types for methods like `auto operator->() -> ValueT*`

Add HasId method to ValueStore for debug checks

05e3199

github-actions bot added the toolchain label Jan 29, 2025

github-actions bot requested a review from jonmeow January 29, 2025 18:58

jonmeow reviewed Jan 29, 2025

View reviewed changes

jonmeow reviewed Jan 30, 2025

View reviewed changes

clavin mentioned this pull request Feb 4, 2025

Support references in ErrorOr #4889

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `HasId` method to `ValueStore` for debug checks #4866

Add `HasId` method to `ValueStore` for debug checks #4866

clavin commented Jan 29, 2025

jonmeow Jan 29, 2025

clavin Jan 30, 2025

clavin Jan 30, 2025

jonmeow Jan 30, 2025 •

edited

Loading

jonmeow Jan 30, 2025

jonmeow Jan 30, 2025 •

edited

Loading

Add HasId method to ValueStore for debug checks #4866

Are you sure you want to change the base?

Add HasId method to ValueStore for debug checks #4866

Conversation

clavin commented Jan 29, 2025

jonmeow Jan 29, 2025

Choose a reason for hiding this comment

clavin Jan 30, 2025

Choose a reason for hiding this comment

clavin Jan 30, 2025

Choose a reason for hiding this comment

jonmeow Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

jonmeow Jan 30, 2025

Choose a reason for hiding this comment

jonmeow Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Add `HasId` method to `ValueStore` for debug checks #4866

Add `HasId` method to `ValueStore` for debug checks #4866

jonmeow Jan 30, 2025 •

edited

Loading

jonmeow Jan 30, 2025 •

edited

Loading