Use non-BOM encodings by filmor · Pull Request #2370 · pythonnet/pythonnet

filmor · 2024-05-04T14:54:56Z

Use non-BOM encodings for both C#->Python and Python->C#, as the byteorder is always the native one and the BOM is neither never or always used.

Fixes #2369.

The documentation of the used `PyUnicode_DecodeUTF16` states that not passing `*byteorder` or passing a 0 results in the first two bytes, if they are the BOM (U+FEFF, zero-width no-break space), to be interpreted and skipped, which is incorrect when we convert a known "non BOM" string, which all strings from C# are.

lostmsu · 2024-05-06T06:48:13Z

@filmor can you ELI5? For someone not familiar with intricacies of BOM, but aware of byte order issues.

My biggest question is if this change has any potential to introduce bugs to handling strings that actually have BOM? E.g. imagine a scenario when someone serialized and persisted something with BOM using 3.0.3, but after this change in 3.0.4 if they read it back BOM will be in their string data.

filmor · 2024-05-06T09:11:46Z

It's the other way round. Strings that are being passed between Python and .NET are UTF16 in the respective native byte order (usually LE), without a BOM. The functions that we were using for the conversions (in particular PyUnicode_DecodeUTF16 and the defaulr encoding objects from Encoding) try to be "smart" and will interpret a leading set of FE FF or FF FE as the byte order mark, removing it from the converted string. By passing the correct endian-ness explicitly, this behaviour is disabled.

filmor force-pushed the fix-bom-strings branch 2 times, most recently from 49db3bf to 07f65c7 Compare May 5, 2024 18:41

filmor added 2 commits May 5, 2024 20:42

Use non-BOM encodings

4c46c6d

filmor force-pushed the fix-bom-strings branch from 07f65c7 to dc6f5ef Compare May 5, 2024 18:42

filmor marked this pull request as ready for review May 5, 2024 18:42

filmor requested a review from lostmsu May 5, 2024 18:44

filmor self-assigned this May 7, 2024

lostmsu merged commit 195cde6 into pythonnet:master May 10, 2024

filmor deleted the fix-bom-strings branch May 10, 2024 19:55

dependabot bot mentioned this pull request Jan 16, 2026

Bump pythonnet from 3.0.3 to 3.0.5 mattemangia/Deep3DStudio#99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use non-BOM encodings#2370

Use non-BOM encodings#2370
lostmsu merged 2 commits intopythonnet:masterfrom
filmor:fix-bom-strings

filmor commented May 4, 2024 •

edited

Loading

Uh oh!

lostmsu commented May 6, 2024

Uh oh!

filmor commented May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

filmor commented May 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lostmsu commented May 6, 2024

Uh oh!

filmor commented May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

filmor commented May 4, 2024 •

edited

Loading