Globbing #297

isomarcte · 2023-01-04T14:35:10Z

https://github.com/typelevel/case-insensitive/blob/main/core/src/main/scala/org/typelevel/ci/package.scala#L34

The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.

In 1.x.x of case-insensitive we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.

The distinction here between "simple" and "full" is that a simple case fold will not change the number of char values needed to represent the string, but a full case fold may change the number of char values needed.

In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.

I will follow up with some more concrete examples shortly.

If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.

The text was updated successfully, but these errors were encountered:

isomarcte mentioned this issue Jan 4, 2023

Implement (Almost) All The Unicode Caseless Matching Systems #232

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Globbing #297

Globbing #297

isomarcte commented Jan 4, 2023 •

edited

Loading

Globbing #297

Globbing #297

Comments

isomarcte commented Jan 4, 2023 • edited Loading

isomarcte commented Jan 4, 2023 •

edited

Loading