You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.
In 1.x.x of case-insensitive we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.
The distinction here between "simple" and "full" is that a simple case fold will not change the number of char values needed to represent the string, but a full case fold may change the number of char values needed.
In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.
I will follow up with some more concrete examples shortly.
If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.
The text was updated successfully, but these errors were encountered:
https://github.com/typelevel/case-insensitive/blob/main/core/src/main/scala/org/typelevel/ci/package.scala#L34
The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.
In 1.x.x of
case-insensitive
we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.The distinction here between "simple" and "full" is that a simple case fold will not change the number of
char
values needed to represent the string, but a full case fold may change the number ofchar
values needed.In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.
I will follow up with some more concrete examples shortly.
If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.
The text was updated successfully, but these errors were encountered: