-
Notifications
You must be signed in to change notification settings - Fork 117
Description
When processing short inputs (e.g., strings smaller than 16 bytes), calling our fast functions is wasted effort. Now that @pauldreik has done the hard work of moving our scalar (=naive) implementations in header files, what becomes possible is to add short string optimizations: when the input is sufficiently small, prefer the naive (but easy to inline) routine.
It can be an actual issue. Recently @anonrig tried to enable simdutf in ada, within Node, and he reported a negative effect on performance. Though I have not examined the use case, the likely cause in my view is that he is replacing a trivial (small, pure C function) with a non-trivial function call into simdutf.
You can concrete examples in Node...
(The latter example is from @ChALkeR whereas I think that the first one is from me.)
My expectation is not that the actual implementation in simdutf are inefficient for short strings, but rather that replacing a less efficient, but inlinable, function, with a non-inlineable function that is faster (as long as it has enough work to do) can be a slight net negative on short inputs.
With the current code base, it should now be relatively easy to add, directly in the simdutf header file, something like this ...
if(the input is short) {
// call the simple scalar function
} else {
// dispatch into the simdutf lib for the optimized function
}
Obviously, this entails adding relevant benchmarks, which we do not have right now.