no_std for common, pylib, codegen#7056
Conversation
📝 WalkthroughWalkthroughAdds optional no_std support across multiple crates by introducing std feature flags, applying crate-level no_std gating, switching std→core imports, and providing cfg-based alternative implementations for panic handling, static storage, and thread-local behavior. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@crates/codegen/Cargo.toml`:
- Around line 11-13: The crate's Cargo.toml currently toggles a local "std"
feature but doesn't propagate it to dependencies, so dependencies like thiserror
and itertools will still enable std; either (A) update workspace dependency
entries for thiserror and itertools to set default-features = false (and add the
needed features like itertools' use_alloc) so they truly work in no_std, or (B)
add feature forwarding in this crate's Cargo.toml by making the std feature
enable the dependencies' std/use_std/use_alloc equivalents (e.g., map this
crate's "std" feature to "thiserror/std" and the appropriate itertools feature)
so disabling std here also disables std in those deps; apply this change to the
Cargo.toml features and/or workspace dependency declarations and ensure feature
names match the dependency crates (thiserror, itertools).
In `@crates/codegen/src/symboltable.rs`:
- Around line 325-341: The PopGuard using a raw *mut pointer is unsound because
calling f(self) re-borrows self and invalidates that pointer; replace the
raw-pointer Drop guard in with_append with a safe length/index-based approach:
record the previous length (or index) of self.v, push x.into(), call f(self) to
get the result, then after f returns pop the element by checking/restoring based
on the recorded length (or pop unconditionally if you know you always pushed
one). Remove the PopGuard type and raw pointer usage (references: with_append,
PopGuard, self.v, f(self)).
In `@crates/common/src/refcount.rs`:
- Around line 38-40: The refcount overflow check currently uses panic! which
weakens safety; change the overflow handling in the refcount increment paths
(the branch checking if old_size & Self::MASK == Self::MASK in the method(s)
that perform increments—e.g., the function implementing inc and inc_by) to call
std::process::abort() when the standard library is available and only use panic!
as a fallback in no_std builds; implement this via cfg(feature =
"std")/cfg(not(feature = "std")) or equivalent cfg!(std) guards so the overflow
always aborts in std builds and apply the exact same change to the inc_by
overflow check as well.
- Gate OS-dependent modules behind `#[cfg(feature = "std")]` - Replace `std::f64` with `core::f64` in float_ops - Replace `std::process::abort` with panic in refcount - Remove `thread_local` from levenshtein (stack buffer) - Split static_cell into threading/non_threading/no_std
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/common/src/str.rs`:
- Line 536: The left-edge initialization in the Levenshtein DP loop is off by
one: update the assignment that sets result for the current row (using variables
result, b_index, and constant MOVE_COST inside the DP routine) so that it uses
(b_index + 1) multiplied by MOVE_COST instead of b_index multiplied by
MOVE_COST, ensuring the row deletion cost corresponds to row b_index+1.
🧹 Nitpick comments (2)
crates/common/src/static_cell.rs (1)
123-126: Impreciseunsafesafety justification —no_stddoes not preclude threading.The comment "without std there can be no threads" is not accurate. Many
no_stdenvironments (RTOS, multi-core bare-metal) support threading. The actual safety invariant here is that thethreadingfeature is disabled, which is a user contract that concurrent access won't occur. Consider revising the comment to reflect the real guarantee:Suggested comment fix
- // unsync::OnceCell is !Sync, but without std there can be no threads. - struct SyncOnceCell<T>(OnceCell<T>); - // SAFETY: Without std, threading is impossible. - unsafe impl<T> Sync for SyncOnceCell<T> {} + // unsync::OnceCell is !Sync, but this path is only active when the + // `threading` feature is disabled, meaning the caller guarantees + // no concurrent access. + struct SyncOnceCell<T>(OnceCell<T>); + // SAFETY: The `threading` feature is off, so the user has opted out of + // multi-threaded use. No concurrent access to the inner cell can occur. + unsafe impl<T> Sync for SyncOnceCell<T> {}crates/common/Cargo.toml (1)
12-13: Thestdfeature doesn't forward to dependency feature gates, which prevents trueno_stdbuilds.If you plan to support
no_std, dependencies likeonce_cellanditertoolsneed conditional feature forwarding. The currentstd = []is inert—disabling it doesn't disablestdin transitive dependencies.Consider extending the feature once
no_stdis fully supported:std = ["once_cell/std", "itertools/use_std"](Note:
getrandomis already explicitly enabled withstdin workspace.dependencies.malachite-biginthas astdfeature but doesn't fully supportno_stdat the crate level, so forwarding it won't be sufficient for trueno_stdbuilds.)If this is an incomplete incremental step, a tracking issue would help clarify the scope.
|
|
||
| let mut result = 0usize; | ||
| for (b_index, b_code) in b_bytes[b_begin..(b_begin + b_end)].iter().enumerate() { | ||
| result = b_index * MOVE_COST; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the levenshtein_distance function around line 536
cat -n crates/common/src/str.rs | sed -n '500,560p'Repository: RustPython/RustPython
Length of output: 2483
🏁 Script executed:
#!/bin/bash
# Search for test cases related to levenshtein_distance
rg -n 'levenshtein_distance|#\[test\]' crates/common/src/str.rs -A 5 -B 1Repository: RustPython/RustPython
Length of output: 474
🏁 Script executed:
#!/bin/bash
# Look for any other occurrences of the pattern to understand the refactoring
git log --oneline -n 20 -- crates/common/src/str.rsRepository: RustPython/RustPython
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Search for any test cases that verify levenshtein distance behavior
find . -name "*.rs" -type f -exec grep -l "levenshtein" {} \;Repository: RustPython/RustPython
Length of output: 121
🏁 Script executed:
#!/bin/bash
# Look for test cases in the file itself
grep -n "test" crates/common/src/str.rs | head -20Repository: RustPython/RustPython
Length of output: 143
🏁 Script executed:
#!/bin/bash
# Try to find CPython's implementation online or in comments
rg -n "CPython|suggestions\.c|UTF8_Edit" crates/ -A 3 -B 1Repository: RustPython/RustPython
Length of output: 50380
🏁 Script executed:
#!/bin/bash
# Check if there's a git history we can inspect for this function
cd crates/common && git log --oneline -n 30 -- src/str.rs 2>/dev/null || echo "Git history not available"Repository: RustPython/RustPython
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Check the suggestion.rs file which might have the levenshtein implementation
cat -n crates/vm/src/suggestion.rs | head -100Repository: RustPython/RustPython
Length of output: 4334
🏁 Script executed:
#!/bin/bash
# Search for any tests that use levenshtein_distance or suggestion functionality
find . -name "*.rs" -type f | xargs grep -l "suggestion\|levenshtein" | head -10Repository: RustPython/RustPython
Length of output: 324
🏁 Script executed:
#!/bin/bash
# Look at the suggestion.rs file to see if it uses the levenshtein_distance function
rg -n "levenshtein_distance" crates/vm/src/suggestion.rs -B 3 -A 3Repository: RustPython/RustPython
Length of output: 542
🏁 Script executed:
#!/bin/bash
# Search for any tests that call levenshtein_distance or validate its output
find . -name "*.rs" -type f | xargs grep -l "test.*levenshtein\|levenshtein.*test" 2>/dev/nullRepository: RustPython/RustPython
Length of output: 49
🏁 Script executed:
#!/bin/bash
# Look for any Python tests that might indirectly test levenshtein via suggestions
find . -name "test*.py" -o -name "*test.py" | xargs grep -l "AttributeError\|NameError\|suggestion" 2>/dev/null | head -5Repository: RustPython/RustPython
Length of output: 214
🏁 Script executed:
#!/bin/bash
# Search for CPython reference or comments about the levenshtein implementation
rg -n "CPython|edit.*cost|levenshtein" crates/common/src/str.rs -B 2 -A 2Repository: RustPython/RustPython
Length of output: 329
🌐 Web query:
CPython _Py_UTF8_Edit_Cost implementation Levenshtein
💡 Result:
In CPython, _Py_UTF8_Edit_Cost() is implemented in Python/suggestions.c and is a bounded, weighted Levenshtein-distance routine over the strings’ UTF-8 byte sequences (it calls PyUnicode_AsUTF8AndSize() for both inputs). [1]
Key details of the implementation:
-
Operation costs
- Insert/delete (“move”):
MOVE_COST = 2[1] - Substitution:
0if equal;CASE_COST = 1for ASCII case-only differences; otherwiseMOVE_COST = 2[1] - Case-only detection is optimized via
LEAST_FIVE_BITS(c) == (c & 31)plus ASCII folding to confirm a case flip. [1]
- Insert/delete (“move”):
-
Algorithm / optimizations
- Trims common prefix and suffix before DP. [1]
- Uses a single-row DP (in-place update) rather than a full matrix. [1]
- Has a max_cost threshold: if the minimum value in a DP row exceeds
max_cost, it bails out early and returnsmax_cost + 1. [1] - Hard limits: if either (trimmed) byte length exceeds
MAX_STRING_SIZE(40), returnsmax_cost + 1(treated as “too expensive”). [1]
-
max_cost == -1behavior- Interpreted as “no practical bound”: it sets
max_cost = MOVE_COST * max(len(a), len(b))and then computes the distance. [1]
- Interpreted as “no practical bound”: it sets
The function is declared as an internal API in Include/internal/pycore_pyerrors.h. [2]
Sources: Python/suggestions.c (Android git mirror of CPython) [1]; pycore_pyerrors.h declaration [2].
Line 536: Off-by-one bug in Levenshtein DP initialization.
The variable result represents the left-edge cost (deletion cost) for the current DP row. For row b_index+1, this should be (b_index + 1) * MOVE_COST, not b_index * MOVE_COST. The current code computes all edit distances with the wrong row-edge values, producing incorrect results. This matches CPython's _Py_UTF8_Edit_Cost implementation, which correctly uses (b_index + 1) * MOVE_COST.
Change line 536 from:
result = b_index * MOVE_COST;to:
result = (b_index + 1) * MOVE_COST;🤖 Prompt for AI Agents
In `@crates/common/src/str.rs` at line 536, The left-edge initialization in the
Levenshtein DP loop is off by one: update the assignment that sets result for
the current row (using variables result, b_index, and constant MOVE_COST inside
the DP routine) so that it uses (b_index + 1) multiplied by MOVE_COST instead of
b_index multiplied by MOVE_COST, ensuring the row deletion cost corresponds to
row b_index+1.
Summary by CodeRabbit
New Features
Bug Fixes
Other