Wrong character encoding from HTTP.get()

I’m calling a REST API that returns results in German language, i.e. the result includes umlauts like ä, ö, ü. If I try the API call in my browser, I’m getting a correct UTF-8-encoded result. However, if I’m using Meteor’s HTTP.get(), all special characters are returned as UTF 0xFFFD characters, displayed as �.

I’ve tried to set

HTTP.call(‘GET’, url, {
npmRequestOptions: {
encoding: ‘utf8’
}
});

but this did not change anything.

I’ve also tried the aldeed:http package which allows to set an encoding directly, but also this failed.

On Stack Overflow I saw the question of a Turkish guy who had the same problems but could not get a satifying answer. So I am asking myself, how can HTTP.get() be used beyond simple ASCII characters?

Are you making the call on the client or server? On the server utf8 encoding is set by default:

Can you post the actual code, or a minimum reproduction?

I iwould try to specify which version I want to get returned by something like this

header : {'Content-Type' : 'application/json; charset=UTF-8'}

I’m running this on the server side.

My code is quite simple:

var url = ‘http://opengtindb.org/?ean=9783785541876&cmd=query&queryid=300000000’;
var resultFromOpenGTINdb = HTTP.call(‘GET’, url);

This is the basic version without the additional options I’ve tried later (npmRequestionOptions; using aldeed:http), but which also did not change anything.

The aldeed:http version was:
var resultFromOpenGTINdb = HTTP.call(‘GET’, url, {
encoding: ‘utf8’, responseType: ‘string’
});

The other version encapsulated the encoding parameter in the npmRequestOptions object instead, as shown in my original post. Yet, as npm is using ‘utf8’ by default, this did not change anything, as I would have expected. But still, the results are not correct.

You can try the URL in a web-browser to see the difference / the correct results.

BTW: I’ve experienced the same problem with another database backend from another service provider (isbndb.com).

well, you are telling http package that returned content is in utf-8 what is not true
why you do not tell the other server that u WANT response in utf-8 ?
or check response.header and read response encoding type there ?

Hm. I had the same problem with XML based responses that clearly stated UTF-8 as its encoding. But maybe this was a “lie”. Trying to figure out how to set the headers now.

I tried this now:

HTTP.call(‘GET’, url, {
headers: {
‘Content-Type’ : ‘application/json; charset=UTF-8’
}
});

But still, the characters are not in the correct format. If I call the URL in a browser, I cannot see any encoding in the header. Is there any way to detect it?

network tab in inspector
Select All next to the filter
select file and on the right of it is info, select Header tab there.

I already tried this, but the server does not send any info on the character encoding.

Yet I found out using a text editor that it is something like ISO 8859-1. So the question is now how to convert character encodings properly in JavaScript? I will now try to integrate the npm package node-iconv, hope that will solve the encoding problems.

I tried node-iconv now, but also this package is not able to do the conversion. And I think I also found the reason why: HTTP.get() returns characters it does not understand as 0xFFFD characters, so there’s no chance for any post-processor to fix this.

Hence, the question still is: How can I tell HTTP.get() to leave the character set as it is returned by the server, which clearly does not include #0xFFFD characters?

Ok, I now managed to convert the response successfully, but I’m not sure if this is really elegant. Anyway, I’ll leave it here to share it with others running into problems with incorrect character encodings as well.

ad 1. Install the aldeed:http package which lets you state encoding and responseType as options.

ad 2. Also require the npm package ‘iconv-lite’ for character conversion. In package.js this can be done using Npm.depends(“iconv-lite”: “0.4.11”});

ad 3. Use this call to retrieve the content:

var response = HTTP.call(‘GET’, url, {
encoding: null, // get content as binary data
responseType: ‘buffer’ // get it as a buffer
});

If you set encoding = null, the GET call will return the result as pure binary data. Setting responseType = ‘buffer’ returns the result as an NPM Buffer object.

ad 4. With these preparations, you can decode the character set this way:

var iconv = Npm.require(‘iconv-lite’);
var result = iconv.decode(response.content, ‘iso-8859-1’);

Of course, the encoding should be set to the right one. The supported encodings can be found here: https://github.com/ashtuchkin/iconv-lite

Still wondering if there was a much more elegant way to solve this…

2 Likes

Having done a little more playing I think this is more due to the browser being more forgiving than the node request package (which is what Meteor’s HTTP uses). The site you are requesting data from always returns a content-type of text/plain with no indication of encoding type or character set, which likely implies ASCII. I suspect that if a browser sees a utf8 character even with a content-type of text/plain, it honours it, whereas request just tries to map the character (using the toString method) and resorts to a standard “unknown character” code.

I checked against an API returning a content-type of application/json (which implies utf8) and it worked exactly as expected - utf8 characters were correctly returned.

This probably means that your workaround is probably the only way you’re going to circumvent the problem of content-type not being set appropriately, at least for now.

And if you try set that http property encoding: to that 8859-1 how it behaves ?

@robfallows: Yes, I could not find any other way of converting for now. Glad that iconv-lite works as a charm.

@shock: I did not want to rely on the server enconding the response to the right encoding, as I do not know how servers behave typically when requesting a different encoding. So I tried instead to tell HTTP.call() to convert the results into the right encoding, using aldeed:http and its parameters { responseType: ‘string’, encoding: ‘utf-8’ }, assuming this would convert the response to UTF-8 based on the HTTP response header’s encoding, but this did not work. So I also converted the content of the web page myself, by requesting it as binary data and using iconv-lite for the conversion. This worked.

Hi,
I know this topic is not recent, but I had the same problem and I resolved mine with an http plugin encoding writed by Robolon.

I found it on this topic

I just installed the package and used it with this code

var result = HTTP.getWithEncoding(url, {
     encoding: {'from': 'iso-8859-15', 'to': 'iso-8859-1'}
});

It’s helps me a lot.

1 Like

try setting npmRequestOptions: { encoding: ‘iso-8859-1’} as one of the property of second parameter passed in HTTP.get call.

I’m trying Rebolon’s package but to no avail as I can’t identify what encoding the website is using. It’s showing up fine on my browser (Chrome) and when I check it says ‘utf-8’ which I don’t believe as it’s scrambled. I guess Chrome shows me the encoding after converting it automatically.

The website in question is: https://www.gedmatch.com/login1.php

Can someone point me in the right direction which character set the website is using. It doesn’t say anything about encoding, similar to what @waldgeist was experiencing.

Seems the problem might be this: content-encoding: "gzip"

UPDATE: it was indeed the gzip that caused it, had it in one of the headers that I send along the HTTP.get (copied from my browser) so taking that out solved the problem.