Update hashlib from v3.14.3 and align _hashlib to CPython#7120
Update hashlib from v3.14.3 and align _hashlib to CPython#7120youknowone merged 2 commits intoRustPython:mainfrom
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughReplaces emitted parameter-name strings with mangled variants, updates many hash wrappers to propagate errors and accept the VM, refactors Changes
Sequence Diagram(s)sequenceDiagram
participant PyCode as Python caller
participant HashLib as _hashlib (Rust)
participant VM as VirtualMachine
participant Crypto as Crypto crates (digest/hmac/pbkdf2)
PyCode->>HashLib: call hashlib_new / hmac_digest / pbkdf2_hmac (args)
HashLib->>VM: resolve_data(args) / resolve_digestmod(args, vm)
HashLib->>HashLib: select algorithm from HASH_ALGORITHMS
alt Construct hasher
HashLib->>HashLib: instantiate PyHasher / PyHasherXof
else HMAC/PBKDF2
HashLib->>Crypto: call hmac/pbkdf2 with resolved digest name
Crypto-->>HashLib: bytes / error
end
HashLib->>VM: convert result to PyObject (into_pyobject(vm)) or return Err
VM-->>PyCode: PyObject / PyError
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/stdlib/src/hashlib.rs (1)
554-557:⚠️ Potential issue | 🔴 Critical
compare_digestuses non-constant-time==, defeating its purpose.This function exists specifically to prevent timing side-channel attacks when comparing HMACs/digests. The current
Vec<u8>equality check short-circuits on the first differing byte, leaking information about which prefix matches. A constant-time comparison is required.Proposed fix
+ // Constant-time comparison to prevent timing side-channel attacks. + fn constant_time_eq(a: &[u8], b: &[u8]) -> bool { + if a.len() != b.len() { + return false; + } + a.iter() + .zip(b.iter()) + .fold(0u8, |acc, (x, y)| acc | (x ^ y)) + == 0 + } + let a_hash = a.borrow_bytes().to_vec(); let b_hash = b.borrow_bytes().to_vec(); - Ok((a_hash == b_hash).to_pyobject(vm)) + Ok(constant_time_eq(&a_hash, &b_hash).to_pyobject(vm))
🤖 Fix all issues with AI agents
In `@crates/stdlib/Cargo.toml`:
- Around line 70-71: The Cargo.toml currently lists digest = "0.10.3" which
conflicts with pbkdf2 = { version = "0.12", features = ["hmac"] } because pbkdf2
(with hmac feature) requires digest ^0.10.7; update the digest dependency to at
least "0.10.7" (e.g., digest = "0.10.7" or a compatible caret range) so that
pbkdf2 0.12 and hmac 0.12 can resolve, then run cargo update / cargo build to
verify the dependency graph for no further conflicts.
In `@crates/stdlib/src/hashlib.rs`:
- Around line 625-630: The code casts a validated i64 `len` to `usize` for
`dklen` without an upper bound, which can trigger huge allocations; modify the
Some(obj) branch (where `let len: i64 = obj.try_into_value(vm)?;`) to enforce a
safe maximum before casting: compute a max_dklen = (2u64.pow(32) - 1) *
digest_size (i.e. (2^32-1) * digest_size) using the digest output size from the
relevant digest object, check that `len >= 1 && (len as u64) <= max_dklen`,
return vm.new_value_error(...) if out of range, and only then convert with a
checked conversion (e.g., usize::try_from(len) or len as usize after the bound
check) so dklen cannot request excessive memory.
🧹 Nitpick comments (3)
crates/stdlib/src/hashlib.rs (3)
376-421: Large repetitive match arms inhashlib_new— consider a helper to reduce duplication.The same
PyHasher::new(name, HashWrapper::new::<T>(data)).into_pyobject(vm)pattern is repeated for every fixed-output algorithm. Extracting a small generic helper would collapse ~30 lines into one call per arm.This applies equally to
hmac_digest(Lines 596-608) andpbkdf2_hmac(Lines 646-658).
580-609: HMAC doesn't support Blake2 or Shake, butHASH_ALGORITHMSincludes them.
hmac_digestfalls through to the_arm forblake2b,blake2s,shake_128, andshake_256, returningunsupported hash type. If this is intentional (matching CPython behaviour), it's fine — but it may surprise callers who see those names inopenssl_md_meth_names. Worth a brief comment documenting the intentional exclusion, or extending support if HMAC-Blake2 is desired.
571-578:hmac_newis a stub that always raises — consider documenting intent.If this is a deliberate placeholder awaiting a full HMAC object implementation, a brief
// TODOor code comment would help future contributors understand it's intentional rather than an oversight.
crates/stdlib/src/hashlib.rs
Outdated
| Some(obj) => { | ||
| let len: i64 = obj.try_into_value(vm)?; | ||
| if len < 1 { | ||
| return Err(vm.new_value_error("key length must be greater than 0.".to_owned())); | ||
| } | ||
| len as usize |
There was a problem hiding this comment.
Unchecked i64 → usize cast for dklen can allocate excessive memory.
After verifying len >= 1, Line 630 casts directly with len as usize. While an i64 that fits in usize won't wrap, a very large value (e.g., 2^50) would attempt a multi-petabyte allocation. CPython caps dklen at (2^32 - 1) * digest_size per the PKCS#5 spec. Consider adding a reasonable upper bound.
Proposed fix
Some(obj) => {
let len: i64 = obj.try_into_value(vm)?;
if len < 1 {
return Err(vm.new_value_error("key length must be greater than 0.".to_owned()));
}
- len as usize
+ usize::try_from(len).map_err(|_| {
+ vm.new_overflow_error("key length is too great.".to_owned())
+ })?
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Some(obj) => { | |
| let len: i64 = obj.try_into_value(vm)?; | |
| if len < 1 { | |
| return Err(vm.new_value_error("key length must be greater than 0.".to_owned())); | |
| } | |
| len as usize | |
| Some(obj) => { | |
| let len: i64 = obj.try_into_value(vm)?; | |
| if len < 1 { | |
| return Err(vm.new_value_error("key length must be greater than 0.".to_owned())); | |
| } | |
| usize::try_from(len).map_err(|_| { | |
| vm.new_overflow_error("key length is too great.".to_owned()) | |
| })? | |
| } |
🤖 Prompt for AI Agents
In `@crates/stdlib/src/hashlib.rs` around lines 625 - 630, The code casts a
validated i64 `len` to `usize` for `dklen` without an upper bound, which can
trigger huge allocations; modify the Some(obj) branch (where `let len: i64 =
obj.try_into_value(vm)?;`) to enforce a safe maximum before casting: compute a
max_dklen = (2u64.pow(32) - 1) * digest_size (i.e. (2^32-1) * digest_size) using
the digest output size from the relevant digest object, check that `len >= 1 &&
(len as u64) <= max_dklen`, return vm.new_value_error(...) if out of range, and
only then convert with a checked conversion (e.g., usize::try_from(len) or len
as usize after the bound check) so dklen cannot request excessive memory.
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/hashlib.py dependencies:
dependent tests: (15 tests)
[ ] lib: cpython/Lib/hmac.py dependencies:
dependent tests: (4 tests)
[x] lib: cpython/Lib/inspect.py dependencies:
dependent tests: (44 tests)
[x] test: cpython/Lib/test/test_keywordonlyarg.py dependencies: dependent tests: (no tests depend on keywordonlyarg) [x] test: cpython/Lib/test/test_positional_only_arg.py (TODO: 4) dependencies: dependent tests: (no tests depend on positional_only_arg) [ ] lib: cpython/Lib/smtplib.py dependencies:
dependent tests: (2 tests)
Legend:
|
Summary by CodeRabbit
New Features
Bug Fixes
Chores
hmacandpbkdf2.