[Solved] How can I identify Unicode in a string?

a4xrbj1 · September 16, 2022, 6:39pm

I do have the problem of user-input strings which in some rare cases use foreign characters in Unicode, eg.:

const jsonString = ‘Jes\u00fas Roberto’;
// Jesús Roberto

I can convert them with the native2ascii NPM package however I first need to identify if there’s Unicode in the string.

I’ve also search Stackoverflow but the solution won’t work (they return false for a string with unicodes): javascript - How to find whether a particular string has unicode characters (esp. Double Byte characters) - Stack Overflow

Solution 1:

function containsNonLatinCodepoints(s) {
    return /[^\u0000-\u00ff]/.test(s);
}

Solution 2

const regex = /[^\u0000-\u00ff]/; // Small performance gain from pre-compiling the regex
function containsNonLatinCodepoints(s) {
    return regex.test(s);
}

Solution 3:

function isDoubleByte(str) {
    for (var i = 0, n = str.length; i < n; i++) {
        if (str.charCodeAt( i ) > 255) { return true; }
    }
    return false;
}

All return false. Running this on the backend (not sure if that’s a difference).

Thanks for your help!

a4xrbj1 · September 18, 2022, 5:33pm

FYI - I just run the string by this function in any case. It doesn’t do anything if the string doesn’t contain any Unicode and it changes it properly if it does. So that’s the way to do it