Memoize file match patterns#1412
Memoize file match patterns#1412bsideup merged 2 commits intodocker-java:masterfrom swiedenfeld:compile_less_patterns
Conversation
|
|
||
| private static final String PATTERN_CHARS_TO_ESCAPE = "\\.[]{}()*+-?^$|"; | ||
|
|
||
| private static final Map<String, Pattern> PATTERNS = new HashMap<>(); |
There was a problem hiding this comment.
Since it is static, I'd suggest using ConcurrentHashMap here
There was a problem hiding this comment.
Good, I changed it to java.util.concurrent.ConcurrentHashMap
|
@swiedenfeld I was about to merge it but then realized... if the app is long running, such cache may eventually explode and cause an OOM. Since the class is in |
|
@bsideup Good point. I was unaware it could be running long as I focused on its use in build step. There, the number of ignore patterns must be bounded by the lines in For the TTL, do you suggest a reasonable time interval until cache entries expire? |
|
@swiedenfeld there are a few usages of docker-java that run for hours, days or even months :D Since it parses the user input, there is no guarantee that the set will be final, despite the relationship to "a few lines in For TTL, I'd suggest something like 1 hour. Alternatively, IIRC Guava can limit by the cache size, which may be another option (to limit it to 25Mb, or 10k patterns, or similar) |
|
@swiedenfeld ping :) |
|
pong :) |
|
@swiedenfeld FYI I just changed some defaults and will merge as soon as CI is green :) |
I ran a profiler to analyse performance of ScannedResult # addFilesInDirectory in the scenario described in #1409. All ignore strings must be matched on each of the +60k files, and a not neglible amount of time is spent compiling the Strings from ignores into patterns over and over again. It seems sensible to cache already compiled patterns into a map for quick resolution.
Unfortunately, it does not improve the situation in #1409 too much. The majority of processing time is spent walking the directory tree, which is possibly bound by I/O.