IDN Support suggestion #15

wadih · 2022-09-09T16:20:09Z

It seems any IDN can't get past the regexp in regDomain.class.php:

if (!preg_match("/^([a-z0-9])(([a-z0-9-])*([a-z0-9]))*$/", $domPart)) return FALSE;

If anybody's interested, the library could be augmented to support UTF8 letter characters by adding the \p{L} character set along with a-z0-9 and adding /u at the end:

if (!preg_match("/^([a-z0-9\p{L}])(([a-z0-9-\p{L}])*([a-z0-9\p{L}]))*$/u", $domPart)) return FALSE;

I tested it as such and it worked:

echo (new regDomain())->getRegisteredDomain("example.мон", false);

And it works for that IDN (.xn--l1acc):

example.мон

I could submit a pull request if nobody sees an issue

The text was updated successfully, but these errors were encountered:

wadih · 2022-09-09T19:21:42Z

I found an issue on above regexp, that validation regexp doesn't appear to match for Indian tld's like, although 99% of the rest has worked.

ಭಾರತ     xn--2scrj9c
ଭାରତ     xn--3hcrj9c
ভাৰত     xn--45br5cyl
भारतम्    xn--h2breg3eve

I tried varying the regexp but not finding one that works, using \p{Devanagari} without success.

Maybe will have to go through the ascii domain instead to avoid these vocabulary challenges.

usrflo · 2023-02-05T20:08:59Z

@wadih, thanks for your feedback and research: in 9fccafa I simply disabled the domain label length validation and the regexp to prevent false analysis.

I tend to change the processing so there is a second suffix tree being encoded to ACE and all checks including length validation and character validation are done in ASCII. Downward compatibility should be kept.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDN Support suggestion #15

IDN Support suggestion #15

wadih commented Sep 9, 2022 •

edited

Loading

wadih commented Sep 9, 2022 •

edited

Loading

usrflo commented Feb 5, 2023

IDN Support suggestion #15

IDN Support suggestion #15

Comments

wadih commented Sep 9, 2022 • edited Loading

wadih commented Sep 9, 2022 • edited Loading

usrflo commented Feb 5, 2023

wadih commented Sep 9, 2022 •

edited

Loading

wadih commented Sep 9, 2022 •

edited

Loading