Conversation
📝 WalkthroughWalkthroughAdds Windows console I/O: introduces DEFAULT_BUFFER_SIZE, a new winconsoleio/WindowsConsoleIO surface with console detection and UTF-8 console defaults, a PyO3 Changes
Sequence Diagram(s)sequenceDiagram
actor Python
participant io_open as io.open()
participant console_check as ConsoleDetection
participant winconsoleio as WindowsConsoleIO
participant fileio as FileIO
participant py_obj as PythonFileObject
Python->>io_open: open(path, mode)
io_open->>console_check: is_console(fileno)?
alt Console detected
console_check-->>io_open: true
io_open->>winconsoleio: create WindowsConsoleIO
winconsoleio->>winconsoleio: set UTF-8 default, use DEFAULT_BUFFER_SIZE
winconsoleio-->>py_obj: return console-backed file object
else Regular file
console_check-->>io_open: false
io_open->>fileio: create FileIO (uses DEFAULT_BUFFER_SIZE)
fileio-->>py_obj: return file object
end
sequenceDiagram
actor Python
participant write_input as _testconsole::write_input
participant fileno as file.fileno()
participant handle as Win32HANDLE
participant conv as UTF16Conversion
participant winapi as WriteConsoleInputW
participant error as OS_Error
Python->>write_input: write_input(file, bytes)
write_input->>fileno: call fileno()
fileno-->>handle: get HANDLE
write_input->>handle: validate HANDLE != INVALID_HANDLE_VALUE
alt Valid handle
write_input->>conv: bytes -> UTF-16LE Vec<u16>
conv->>conv: build INPUT_RECORD KEY_EVENT array
conv-->>winapi: call WriteConsoleInputW (loop until all written)
winapi-->>Python: success
else Invalid handle
handle-->>error: INVALID_HANDLE_VALUE
error-->>Python: raise exception
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8b37b4a to
5e4c233
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/io.rs`:
- Line 32: DEFAULT_BUFFER_SIZE was changed to 128 KiB and is now exported as
io.DEFAULT_BUFFER_SIZE and used for _blksize; confirm whether this divergence
from CPython's 8 KiB is intentional. Either revert DEFAULT_BUFFER_SIZE to 8 *
1024 (update the constant DEFAULT_BUFFER_SIZE) to preserve CPython
compatibility, or keep 128 * 1024 but add a clear comment and public
documentation where io.DEFAULT_BUFFER_SIZE and _blksize are defined/exported
(referencing the DEFAULT_BUFFER_SIZE constant and the _blksize export)
explaining the rationale for the larger buffer size.
- Around line 5991-6029: The mode-parsing loop currently accepts '+'/'a'/'b'/'x'
but silently ignores them; change the match in the parsing loop (the code
handling mode_str, rwa, readable, writable, console_type) so that encountering
unsupported flags ('+', 'a', 'x' — and any other non-rc/w characters) returns an
explicit vm.new_value_error (include the offending flag and mode_str in the
message) instead of falling through; leave handling for 'r' and 'w' as-is and
ensure the final rwa check remains to error when neither read nor write is
selected.
- Around line 6593-6611: The WideCharToMultiByte calls in read() (and the
similar call in readall()) currently ignore a 0 return and silently drop data;
update both functions to check if WideCharToMultiByte returns 0 and, on failure,
retrieve the Windows error code (GetLastError) and return an appropriate VM I/O
error instead of continuing — use the same error-building mechanism used
elsewhere in this module (e.g., the VM context I/O error helper) so callers get
a clear failure instead of truncated data. Ensure you reference the
WideCharToMultiByte result variable (u8n) and bail out with a descriptive error
containing the GetLastError value when u8n == 0 in both read() and readall().
- Around line 6646-6673: The initial MultiByteToWideChar call in write()
currently treats wlen == 0 as Ok(0); change this to propagate the Windows error
the same way the second call does: when the first MultiByteToWideChar returns 0,
return Err(std::io::Error::last_os_error().into_pyexception(vm)) so callers
receive the OS error instead of a successful 0 length; update the branch
handling the first wlen check around the MultiByteToWideChar invocation
accordingly (refer to MultiByteToWideChar and the surrounding write() logic).
| let mode_str: &str = args | ||
| .mode | ||
| .as_ref() | ||
| .map(|s: &PyUtf8StrRef| s.as_str()) | ||
| .unwrap_or("r"); | ||
|
|
||
| let mut rwa = false; | ||
| let mut readable = false; | ||
| let mut writable = false; | ||
| let mut console_type = '\0'; | ||
| for c in mode_str.bytes() { | ||
| match c { | ||
| b'+' | b'a' | b'b' | b'x' => {} | ||
| b'r' => { | ||
| if rwa { | ||
| return Err( | ||
| vm.new_value_error("Must have exactly one of read or write mode") | ||
| ); | ||
| } | ||
| rwa = true; | ||
| readable = true; | ||
| } | ||
| b'w' => { | ||
| if rwa { | ||
| return Err( | ||
| vm.new_value_error("Must have exactly one of read or write mode") | ||
| ); | ||
| } | ||
| rwa = true; | ||
| writable = true; | ||
| } | ||
| _ => { | ||
| return Err(vm.new_value_error(format!("invalid mode: {mode_str}"))); | ||
| } | ||
| } | ||
| } | ||
| if !rwa { | ||
| return Err(vm.new_value_error("Must have exactly one of read or write mode")); | ||
| } |
There was a problem hiding this comment.
Reject unsupported console mode flags instead of silently ignoring them.
+, a, and x are accepted but ignored, which can lead to misleading downstream errors. Consider explicitly rejecting unsupported flags (or implement the semantics).
🛠️ Suggested fix
- for c in mode_str.bytes() {
- match c {
- b'+' | b'a' | b'b' | b'x' => {}
+ for c in mode_str.bytes() {
+ match c {
+ b'b' => {}
+ b'+' | b'a' | b'x' => {
+ return Err(vm.new_value_error(format!(
+ "unsupported console mode: {mode_str}"
+ )));
+ }
b'r' => {
if rwa {
return Err(
vm.new_value_error("Must have exactly one of read or write mode")
);
}
rwa = true;
readable = true;
}
b'w' => {
if rwa {
return Err(
vm.new_value_error("Must have exactly one of read or write mode")
);
}
rwa = true;
writable = true;
}
_ => {
return Err(vm.new_value_error(format!("invalid mode: {mode_str}")));
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mode_str: &str = args | |
| .mode | |
| .as_ref() | |
| .map(|s: &PyUtf8StrRef| s.as_str()) | |
| .unwrap_or("r"); | |
| let mut rwa = false; | |
| let mut readable = false; | |
| let mut writable = false; | |
| let mut console_type = '\0'; | |
| for c in mode_str.bytes() { | |
| match c { | |
| b'+' | b'a' | b'b' | b'x' => {} | |
| b'r' => { | |
| if rwa { | |
| return Err( | |
| vm.new_value_error("Must have exactly one of read or write mode") | |
| ); | |
| } | |
| rwa = true; | |
| readable = true; | |
| } | |
| b'w' => { | |
| if rwa { | |
| return Err( | |
| vm.new_value_error("Must have exactly one of read or write mode") | |
| ); | |
| } | |
| rwa = true; | |
| writable = true; | |
| } | |
| _ => { | |
| return Err(vm.new_value_error(format!("invalid mode: {mode_str}"))); | |
| } | |
| } | |
| } | |
| if !rwa { | |
| return Err(vm.new_value_error("Must have exactly one of read or write mode")); | |
| } | |
| let mode_str: &str = args | |
| .mode | |
| .as_ref() | |
| .map(|s: &PyUtf8StrRef| s.as_str()) | |
| .unwrap_or("r"); | |
| let mut rwa = false; | |
| let mut readable = false; | |
| let mut writable = false; | |
| let mut console_type = '\0'; | |
| for c in mode_str.bytes() { | |
| match c { | |
| b'b' => {} | |
| b'+' | b'a' | b'x' => { | |
| return Err(vm.new_value_error(format!( | |
| "unsupported console mode: {mode_str}" | |
| ))); | |
| } | |
| b'r' => { | |
| if rwa { | |
| return Err( | |
| vm.new_value_error("Must have exactly one of read or write mode") | |
| ); | |
| } | |
| rwa = true; | |
| readable = true; | |
| } | |
| b'w' => { | |
| if rwa { | |
| return Err( | |
| vm.new_value_error("Must have exactly one of read or write mode") | |
| ); | |
| } | |
| rwa = true; | |
| writable = true; | |
| } | |
| _ => { | |
| return Err(vm.new_value_error(format!("invalid mode: {mode_str}"))); | |
| } | |
| } | |
| } | |
| if !rwa { | |
| return Err(vm.new_value_error("Must have exactly one of read or write mode")); | |
| } |
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 5991 - 6029, The mode-parsing loop
currently accepts '+'/'a'/'b'/'x' but silently ignores them; change the match in
the parsing loop (the code handling mode_str, rwa, readable, writable,
console_type) so that encountering unsupported flags ('+', 'a', 'x' — and any
other non-rc/w characters) returns an explicit vm.new_value_error (include the
offending flag and mode_str in the message) instead of falling through; leave
handling for 'r' and 'w' as-is and ensure the final rwa check remains to error
when neither read nor write is selected.
crates/vm/src/stdlib/io.rs
Outdated
| let wlen = unsafe { | ||
| MultiByteToWideChar( | ||
| CP_UTF8, | ||
| 0, | ||
| data.as_ptr(), | ||
| len as i32, | ||
| core::ptr::null_mut(), | ||
| 0, | ||
| ) | ||
| }; | ||
| if wlen == 0 { | ||
| return Ok(0); | ||
| } | ||
|
|
||
| let mut wbuf = vec![0u16; wlen as usize]; | ||
| let wlen = unsafe { | ||
| MultiByteToWideChar( | ||
| CP_UTF8, | ||
| 0, | ||
| data.as_ptr(), | ||
| len as i32, | ||
| wbuf.as_mut_ptr(), | ||
| wlen, | ||
| ) | ||
| }; | ||
| if wlen == 0 { | ||
| return Err(std::io::Error::last_os_error().into_pyexception(vm)); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
CPython _WindowsConsoleIO MultiByteToWideChar error handling
💡 Result:
In CPython’s _io._WindowsConsoleIO (Windows console stdout/stderr), the write path converts the UTF‑8 bytes you pass into UTF‑16 using:
MultiByteToWideChar(CP_UTF8, 0, ...)(flags 0, i.e. notMB_ERR_INVALID_CHARS), thenWriteConsoleW(...). (chromium.googlesource.com)
That choice drives the “error handling” you see:
- With
dwFlags=0, Windows’ UTF‑8 decoder generally does not fail on ill‑formed input; on Windows Vista+ it will replace illegal sequences with U+FFFD rather than returning an error. (learn.microsoft.com) - If you want strict failure on invalid UTF‑8, you must pass
MB_ERR_INVALID_CHARSand then handleMultiByteToWideCharreturning 0 (typically withGetLastError()==ERROR_NO_UNICODE_TRANSLATION). (learn.microsoft.com)
This is consistent with how CPython handles strict code-page decoding elsewhere: it uses MB_ERR_INVALID_CHARS, but only reliably raises when the decode is “final” (so the decoder knows it’s not waiting for more bytes). (bugs.python.org)
Citations:
- 1: https://chromium.googlesource.com/external/github.com/python/cpython/%2B/8435a2278f964f48d36edbc5092be5ebecfcb120/Modules/_io/winconsoleio.c
- 2: https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar?utm_source=openai
- 3: https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar?utm_source=openai
- 4: https://bugs.python.org/issue45120?utm_source=openai
🌐 Web query:
Windows MultiByteToWideChar return 0 error handling best practices
💡 Result:
Best practices when MultiByteToWideChar() returns 0
-
Treat return value
0as failure and callGetLastError()immediately- On failure,
MultiByteToWideChar()returns0; details are inGetLastError()(common codes:ERROR_INSUFFICIENT_BUFFER,ERROR_INVALID_FLAGS,ERROR_INVALID_PARAMETER,ERROR_NO_UNICODE_TRANSLATION). [1]
- On failure,
-
Use the “two-pass” pattern (query required size, then convert)
- First call with
cchWideChar = 0to get the required UTF‑16 length (in wide chars, including NUL if it’s part of the input), allocate that manywchar_t, then call again to perform the conversion. [1]
- First call with
-
Use strict validation for UTF-8 (and many other multibyte code pages):
MB_ERR_INVALID_CHARS- For
CP_UTF8, Microsoft restrictsdwFlagsto0orMB_ERR_INVALID_CHARS; other flags can yieldERROR_INVALID_FLAGS. [1] - With
MB_ERR_INVALID_CHARS, invalid input makes the call fail (typicallyERROR_NO_UNICODE_TRANSLATION), which is usually preferable to silently accepting/rewriting malformed data. [1][2]
- For
-
Be explicit about input length / NUL-termination
cbMultiByte = 0always fails. [1]- If you want a NUL-terminated UTF‑16 output, pass
cbMultiByte = -1(input must be NUL-terminated) or ensure your explicit byte count includes the terminator; otherwise the output may be not NUL-terminated. [1]
-
Expect “best-effort” decoding differences if you don’t use
MB_ERR_INVALID_CHARS- Starting with Windows Vista, illegal sequences may be replaced (e.g., with U+FFFD for UTF‑8) when
MB_ERR_INVALID_CHARSis not set; older systems had different behavior. [1]
- Starting with Windows Vista, illegal sequences may be replaced (e.g., with U+FFFD for UTF‑8) when
Minimal error-handling template (C/C++)
int needed = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
src, src_len, NULL, 0);
if (needed == 0) {
DWORD err = GetLastError(); // e.g. ERROR_NO_UNICODE_TRANSLATION, etc.
// handle/log err
}
wchar_t *dst = (wchar_t*)malloc((size_t)needed * sizeof(wchar_t));
int written = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
src, src_len, dst, needed);
if (written == 0) {
DWORD err = GetLastError();
// handle/log err
}Sources: Microsoft documentation for return values/flags/error codes and UTF‑8 flag restrictions [1]; discussion of preferring MB_ERR_INVALID_CHARS for strict UTF‑8 validation/security [2].
[1] (learn.microsoft.com)
[2] (stackoverflow.com)
Citations:
- 1: https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar?utm_source=openai
- 2: https://stackoverflow.com/questions/22824537/should-the-mb-err-invalid-chars-flag-be-used-in-utf-8-conversions-with-multibyte?utm_source=openai
🏁 Script executed:
cd crates/vm/src/stdlib && head -n 6700 io.rs | tail -n 100Repository: RustPython/RustPython
Length of output: 3286
🏁 Script executed:
cd crates/vm/src/stdlib && sed -n '6620,6680p' io.rsRepository: RustPython/RustPython
Length of output: 1951
Propagate MultiByteToWideChar conversion failures in write().
When MultiByteToWideChar returns 0 on the first call, it indicates a Windows error (system-level failure) that must be propagated rather than silently converted to Ok(0). This approach:
- Violates Windows API error-handling best practices (0 indicates failure; call
GetLastError()) - Is semantically incorrect (success should not be reported for a failed conversion)
- Creates asymmetric error handling (first call returns
Ok(0), second call returnsErr) - Hides real errors from callers, preventing proper error diagnosis
Suggested fix
- if wlen == 0 {
- return Ok(0);
- }
+ if wlen == 0 {
+ return Err(std::io::Error::last_os_error().into_pyexception(vm));
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let wlen = unsafe { | |
| MultiByteToWideChar( | |
| CP_UTF8, | |
| 0, | |
| data.as_ptr(), | |
| len as i32, | |
| core::ptr::null_mut(), | |
| 0, | |
| ) | |
| }; | |
| if wlen == 0 { | |
| return Ok(0); | |
| } | |
| let mut wbuf = vec![0u16; wlen as usize]; | |
| let wlen = unsafe { | |
| MultiByteToWideChar( | |
| CP_UTF8, | |
| 0, | |
| data.as_ptr(), | |
| len as i32, | |
| wbuf.as_mut_ptr(), | |
| wlen, | |
| ) | |
| }; | |
| if wlen == 0 { | |
| return Err(std::io::Error::last_os_error().into_pyexception(vm)); | |
| } | |
| let wlen = unsafe { | |
| MultiByteToWideChar( | |
| CP_UTF8, | |
| 0, | |
| data.as_ptr(), | |
| len as i32, | |
| core::ptr::null_mut(), | |
| 0, | |
| ) | |
| }; | |
| if wlen == 0 { | |
| return Err(std::io::Error::last_os_error().into_pyexception(vm)); | |
| } | |
| let mut wbuf = vec![0u16; wlen as usize]; | |
| let wlen = unsafe { | |
| MultiByteToWideChar( | |
| CP_UTF8, | |
| 0, | |
| data.as_ptr(), | |
| len as i32, | |
| wbuf.as_mut_ptr(), | |
| wlen, | |
| ) | |
| }; | |
| if wlen == 0 { | |
| return Err(std::io::Error::last_os_error().into_pyexception(vm)); | |
| } |
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 6646 - 6673, The initial
MultiByteToWideChar call in write() currently treats wlen == 0 as Ok(0); change
this to propagate the Windows error the same way the second call does: when the
first MultiByteToWideChar returns 0, return
Err(std::io::Error::last_os_error().into_pyexception(vm)) so callers receive the
OS error instead of a successful 0 length; update the branch handling the first
wlen check around the MultiByteToWideChar invocation accordingly (refer to
MultiByteToWideChar and the surrounding write() logic).
|
reviews need to be checked |
Add _WindowsConsoleIO class to _io module with ReadConsoleW/WriteConsoleW support. Add _testconsole stdlib module with write_input for console input simulation. Integrate console detection in io_open to use _WindowsConsoleIO for console handles.
5e4c233 to
fa3f421
Compare
fa3f421 to
8552e42
Compare
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] lib: cpython/Lib/io.py dependencies:
dependent tests: (89 tests)
[x] test: cpython/Lib/test/test_winconsoleio.py dependencies: dependent tests: (no tests depend on winconsoleio) Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/io.rs`:
- Around line 6111-6114: The code currently forces fd-backed consoles to never
close the fd by unconditionally calling zelf.closefd.store(false); instead honor
the closefd parameter passed to the console initializer (or constructor) so
io.open(closefd=...) semantics are respected: locate the initializer that sets
zelf.closefd (the branch that handles "opened by fd") and change it to store the
provided closefd value (or leave the existing value) rather than overriding it;
ensure any callers that construct from an fd accept and forward a closefd
boolean so explicit user requests to close the fd are honored.
| } else { | ||
| // When opened by fd, never close the fd (user owns it) | ||
| zelf.closefd.store(false); | ||
| } |
There was a problem hiding this comment.
Respect closefd for fd‑backed consoles.
The initializer forces closefd to false when constructed from an fd, which diverges from io.open(closefd=...) semantics and can leak handles when users pass a console fd explicitly. Consider honoring the passed closefd (or document the deviation if intentional).
🔧 Suggested fix
- // When opened by fd, never close the fd (user owns it)
- zelf.closefd.store(false);
+ // When opened by fd, honor closefd to match io.open semantics
+ zelf.closefd.store(args.closefd);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else { | |
| // When opened by fd, never close the fd (user owns it) | |
| zelf.closefd.store(false); | |
| } | |
| } else { | |
| // When opened by fd, honor closefd to match io.open semantics | |
| zelf.closefd.store(args.closefd); | |
| } |
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 6111 - 6114, The code currently
forces fd-backed consoles to never close the fd by unconditionally calling
zelf.closefd.store(false); instead honor the closefd parameter passed to the
console initializer (or constructor) so io.open(closefd=...) semantics are
respected: locate the initializer that sets zelf.closefd (the branch that
handles "opened by fd") and change it to store the provided closefd value (or
leave the existing value) rather than overriding it; ensure any callers that
construct from an fd accept and forward a closefd boolean so explicit user
requests to close the fd are honored.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/io.rs`:
- Around line 6298-6307: In copy_from_buf the implementation uses a 0 byte as a
sentinel (buf[0] != 0) which incorrectly drops valid NUL bytes and can stall
data; change the buffering to track an explicit length or index instead of
sentinel bytes (e.g., maintain a separate usize field like "len" or use a small
VecDeque) and update the logic in copy_from_buf and any producers/consumers that
mutate the SMALLBUF so they push/pop based on length rather than shifting until
a 0; locate and modify the copy_from_buf function and any associated SMALLBUF
management code to read from buf[0..len], shift or pop properly, and decrement
len (or use VecDeque::pop_front) so NUL bytes are preserved and buffer state is
explicit.
| fn copy_from_buf(buf: &mut [u8; SMALLBUF], dest: &mut [u8]) -> usize { | ||
| let mut n = 0; | ||
| while buf[0] != 0 && n < dest.len() { | ||
| dest[n] = buf[0]; | ||
| n += 1; | ||
| for i in 1..SMALLBUF { | ||
| buf[i - 1] = buf[i]; | ||
| } | ||
| buf[SMALLBUF - 1] = 0; | ||
| } |
There was a problem hiding this comment.
Avoid using 0 as a sentinel in the small buffer.
copy_from_buf treats 0 as “no data”, which drops valid UTF‑8 \0 bytes and can permanently stall buffered data. Track an explicit length (or use a small VecDeque) instead of sentinel bytes.
🤖 Prompt for AI Agents
In `@crates/vm/src/stdlib/io.rs` around lines 6298 - 6307, In copy_from_buf the
implementation uses a 0 byte as a sentinel (buf[0] != 0) which incorrectly drops
valid NUL bytes and can stall data; change the buffering to track an explicit
length or index instead of sentinel bytes (e.g., maintain a separate usize field
like "len" or use a small VecDeque) and update the logic in copy_from_buf and
any producers/consumers that mutate the SMALLBUF so they push/pop based on
length rather than shifting until a 0; locate and modify the copy_from_buf
function and any associated SMALLBUF management code to read from buf[0..len],
shift or pop properly, and decrement len (or use VecDeque::pop_front) so NUL
bytes are preserved and buffer state is explicit.
Summary by CodeRabbit
New Features
Chores