Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HasId method to ValueStore for debug checks #4866

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

clavin
Copy link
Contributor

@clavin clavin commented Jan 29, 2025

Small addition to ValueStore to check if an ID can be safely used to access values from the store. This change iterates on the DCHECK bounds check that currently exists in ValueStore::Get.

When working with multiple ValueStores, IDs can become mixed up and misused to access the wrong ValueStore. This always indicates a bug in the code, so this method is added for the convenience of debug checks to identify bugs.

In many situations there is only one ValueStore instance available with a unique ID type, so it's guaranteed that those IDs came from ValueStore::Add; thus, this check is usually unnecessary. To deter overly-cautious code, this method is documented as a debug tool.

I plan to use this method in a follow-up PR to harden Dump functions against out-of-bounds IDs.

Comment on lines +100 to +107
// Checks whether the ID has a value in this store.
//
// This is usually unnecessary to check, but can add safety to debug code.
// Note that the ID may originate from a different store with a different
// value.
auto HasId(IdT id) const -> bool {
return id.index >= 0 && static_cast<size_t>(id.index) < size();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's it look like if you just check against size() in debug code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonmeow I was writing size() checks in Dump functions, but found it to be a little noisy when working with ValueStore. Here's an example of my first sketch:

toolchain/check/dump.cpp:48-55:

+   if (static_cast<size_t>(loc_id.import_ir_inst_id().index) >=
+        context.sem_ir().import_ir_insts().size()) {
+      llvm::errs() << "LocId(<out of bounds import instruction>)";
+      return;
+    }
     auto import_ir_id = context.sem_ir()
                             .import_ir_insts()
                             .Get(loc_id.import_ir_inst_id())
                             .ir_id;
 
+    if (static_cast<size_t>(import_ir_id.index) >=
+        context.sem_ir().import_irs().size()) {
+      llvm::errs() << "LocId(<out of bounds import file>)";
+      return;
+    }
     const auto* import_file =
         context.sem_ir().import_irs().Get(import_ir_id).sem_ir;
     llvm::errs() << "LocId(import from \""
                  << FormatEscaped(import_file->filename()) << "\")";

For Lex & Parse, each having only one small Dump function, checking against size() is relatively quiet and works well.

Looking now at toolchain/sem_ir/dump.cpp, I realize that many ValueStore types there are wrappers. 😅 So HasId won't be as widely available as I originally thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative would be a template function like MemUsage::Collect for types that have an array_ref() or size() member. What do you think?

Copy link
Contributor

@jonmeow jonmeow Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this is what you're thinking about with the Collect example, but how about:

  • Add support to ErrorOr to support printing if T is printable.
  • Add a SafeGet helper somewhere like toolchain/base/dump.h (i.e., separating the dump-specific functions from regular code) that looks something like template<ValueStoreT, IdT> auto SafeGet(ValueStoreT value_store, IdT id) -> ErrorOr<ValueStoreT::ConstRefType> [if any value store-ish things are missing size, I think it'd be fair to add)

Note, I'm suggesting this to allow changes like:

static auto DumpNoNewline(const File& file, InstId inst_id) -> void {
  llvm::errs() << inst_id;
  if (inst_id.has_value()) {
-     llvm::errs() << ": " << file.insts().Get(inst_id);
+     llvm::errs() << ": " << SafeGet(file.insts(), inst_id);
  }
}

The example you're looking at with LocId is a little more complex and would need to examine the ErrorOr, but for the SemIR case it should be easier to replace.

@@ -68,13 +68,13 @@ class ValueStore

// Returns a mutable value for an ID.
auto Get(IdT id) -> RefType {
CARBON_DCHECK(id.index >= 0, "{0}", id);
CARBON_DCHECK(HasId(id), "{0}", id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny sidenote, SmallVector has a assert(idx < size());, which already covers what's changing here.

Comment on lines +100 to +107
// Checks whether the ID has a value in this store.
//
// This is usually unnecessary to check, but can add safety to debug code.
// Note that the ID may originate from a different store with a different
// value.
auto HasId(IdT id) const -> bool {
return id.index >= 0 && static_cast<size_t>(id.index) < size();
}
Copy link
Contributor

@jonmeow jonmeow Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this is what you're thinking about with the Collect example, but how about:

  • Add support to ErrorOr to support printing if T is printable.
  • Add a SafeGet helper somewhere like toolchain/base/dump.h (i.e., separating the dump-specific functions from regular code) that looks something like template<ValueStoreT, IdT> auto SafeGet(ValueStoreT value_store, IdT id) -> ErrorOr<ValueStoreT::ConstRefType> [if any value store-ish things are missing size, I think it'd be fair to add)

Note, I'm suggesting this to allow changes like:

static auto DumpNoNewline(const File& file, InstId inst_id) -> void {
  llvm::errs() << inst_id;
  if (inst_id.has_value()) {
-     llvm::errs() << ": " << file.insts().Get(inst_id);
+     llvm::errs() << ": " << SafeGet(file.insts(), inst_id);
  }
}

The example you're looking at with LocId is a little more complex and would need to examine the ErrorOr, but for the SemIR case it should be easier to replace.

github-merge-queue bot pushed a commit that referenced this pull request Feb 4, 2025
### Context & Motivation

The error handling utilities in `//base/error.h` are very useful for
writing code with strong safety guarantees. While hardening the `Dump`
debug utilities (from review in #4866), I encountered a rough edge with
references and pointers. After a [brief Discord discussion in
#contributing-help](https://discord.com/channels/655572317891461132/1052653651895779359/1334675462877610038),
it was suggested that adding support for references to `ErrorOr` would
be a good candidate to move forward.

Using a reference type with the `ErrorOr` class (e.g. `ErrorOr<Node&>`)
produces two errors:

<ol>
<li><strong><code>variant can not have a reference type as an
alternative</code></strong>
<ul><li>From private field: <code>std::variant&lt;Error, T&gt;
val_;</code></li></ul>
</li>
<li><strong><code>'operator-&gt;' declared as a pointer to a
reference</code></strong>
<ul><li>From member function: <code>auto operator-&gt;() -&gt;
T*</code></li></ul>
</li>
</ol>

### Changes

To support reference types, both errors are resolved:

1. `std::reference_wrapper` is conditionally used for storage when `T`
is a reference type
2. type trait aliases like `using ValueT = std::remove_reference_t<T>`
are used to produce compatible types for methods like `auto operator->()
-> ValueT*`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants