Conversation
One of the things we do to speed up compilation of pilelines is use a dynamically typed implementation of the vector sorting algorithm. Instead of having to compile a monomorphized implementation of sorting for every time in the pipeline, we have a single implementation that can be parameterized at runtime by a comparison function and the length of the value (the latter is used to swap values in the process of sorting). Unsurprisingly this implementation is less efficient, especially when sorting simple types like (u64, u64). @gz and I have measured the overhead and it seems to range between 4x (when sorting simple types like u64) and 1.5x for a 10-tuple (u64,...,u64). The main source of the overhead is that the monomorphic implementation cannot rely on compiler optimizations to efficiently copy values (esp small values) and must use std::ptr::copy_nonoverlapping instead. Sorting is used in two places in the DBSP code: 1. In the MergeBatcher, when building sorted batches out of unsorted vectors. This one is only instantiated for weighted tuples. 2. In `trait Vector`, which is instantiated for almost all types, but is only used in a few random places and is not performance critical. This PR switches the former to using statically typed sort_unstable_by from the standard library. It appears that the impact on compilation time is <=5% (I've only done very limited testing though). As an additional benefit, we now take advantage of unstable sorting (I did not manage to quickly implement unstable sorting in DBSP and gave up on it back in the day, so we were using stable sorting everywhere). Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes sorting performance in DBSP's batch merger by replacing a custom dynamically-typed sorting implementation with Rust's standard library sort_unstable_by. The change addresses a performance overhead (ranging from 1.5x to 4x) introduced by the previous implementation while keeping compilation time impact minimal (≤5%).
Changes:
- Replaced
unstable_sort_bycustom function calls withstd::slice::sort_unstable_byin consolidation functions - Removed import of
unstable_sort_byfrom utils module - Maintained identical comparison logic for all sorting operations
|
will unstable sort impact determinism? |
| // This line right here is literally the hottest code within the entirety of the | ||
| // program. It makes up 90% of the work done while joining or merging anything |
There was a problem hiding this comment.
I've never seen that line come up in a profile :-)
There was a problem hiding this comment.
yeah, this comment is not correct. I'll remove it.
There was a problem hiding this comment.
you can replace it with "should be, but it isn't for some unknown reason"
mihaibudiu
left a comment
There was a problem hiding this comment.
Actually I notice that all of them were already unstable
It shouldn't. In fact the function we used here was supposed to implement unstable sort, but it ended up being an alias to the stable sort function. |
Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>
One of the things we do to speed up compilation of pilelines is use a dynamically typed implementation of the vector sorting algorithm. Instead of having to compile a monomorphized implementation of sorting for every time in the pipeline, we have a single implementation that can be parameterized at runtime by a comparison function and the length of the value (the latter is used to swap values in the process of sorting). Unsurprisingly this implementation is less efficient, especially when sorting simple types like (u64, u64). @gz and I have measured the overhead and it seems to range between 4x (when sorting simple types like u64) and 1.5x for a 10-tuple (u64,...,u64).
The main source of the overhead is that the monomorphic implementation cannot rely on compiler optimizations to efficiently copy values (esp small values) and must use std::ptr::copy_nonoverlapping instead.
Sorting is used in two places in the DBSP code:
trait Vector, which is instantiated for almost all types, but is only used in a few random places and is not performance critical.This PR switches the former to using statically typed sort_unstable_by from the standard library. It appears that the impact on compilation time is <=5% (I've only done very limited testing though). The impact on executable size is even smaller.
As an additional benefit, we now take advantage of unstable sorting (I did not manage to quickly implement unstable sorting in DBSP and gave up on it back in the day, so we were using stable sorting everywhere).