Store 3rd party password save for repeated access by back-end only

a4xrbj1 · September 1, 2015, 11:50am

Continuing the discussion from How do you store sensitive data in mongodb:

Let me ask my case which is very similar to the one the OP explains here.

I have a third party website which has no API, my app will be storing the credentials (email and password) to allow the back-end repeatedly to login, call a specific page of the third party app and then analyze the resulting content (basically decoding the html answer) for changes (so that the user isn’t missing any important updates and doesn’t have to manually check every time period he wants).

I know you wrote that there is no case the credentials should be stored but this is the used case for my app.

So how can I store this password in the safest way? I have never worked with any password obfuscation before. Are there any packages on Meteor that can help me achieve this?

Thanks to the OP for bringing this topic up, makes me feel better that I’m not the only asking this

How do you store sensitive data in mongodb

It depends heavily on the third party service.
Usually services that give you API access will supply you with a token or cookie that will allow you to make repeated calls to a service. When you need a users credentials to get this token, you often have to have the user visit the other site either manually or though a pop up in order to get the token.
This is essentially what happens with the third party login services like google or facebook login.
If you provide more details on what the credentials actually are, I can provide more advice.

P.S. on the off chance you are asking how to store a users password to a service you don’t directly control, the short answer is you shouldn’t
There is no way to do this securely, however if you absolutely must do so, ensure that the password for the service is encrypted before being stored in the database and the key stored on some other secure system. This won’t protect you against online attacks, but if you database is compromised and downloaded, then it offers some protection.

entropy · September 1, 2015, 12:56pm

So before I start talking about solutions, I’ll clarify what I meant be there is no secure way to store credentials. Firstly the reason that a token based system is preferred over credentials when providing access to an API is firstly, you don’t need to have a login flow every time, once the API user has to token that is it. Secondly you are able to put limits on tokens, so you can say, a token may access the read API but not the delete endpoint etc. So if the token is ever stolen the damage is minimized.

So now to actually trying to solve your problem. So knowing that there is no way to do this perfectly we have to focus on “what ifs”. What if someone gains access to my application server, how much information can they get? Do that have access to the whole database? Can they modify the database?. What if they download the database, what damage can they do with the stolen data, can they impersonate a person with their credentials. There are all concerns you need to think about.

So well start with the basic, assume that someone gets a copy of your database, that is a problem in itself. But if the credentials to your 3rd party are stored in clear text they can use them basically until the 3rd party disables the account. So what you need to do is encrypt them. There are many methods for this and doing it yourself is not recommended, though the commercial solutions such as Volumetric encryption are not cheap. The other concern with this scheme is that you need some service to store the encryption key. Since if it is kept on the app server or database the attacker can just take it and decrypt the data anyway. There are a few open source projects that offer key management, namely Hashicorp Vault. This way the key is only ever stored in memory and not on the app servers disks.

The problem with this is that if an attacker secretly takes control of the server, they can just sit and wait for the credentials to be decrypted and put in memory before stealing them. And the short answer is if you are storing your credentials in your main application database, there is no way to protect against this. No amount of encryption or obfuscation (obfuscation is no protection anyway).

So your final alternative is to make it as hard as possible for an attacker to get the credentials. And I talked about this a bit in the original thread. You want to have two separate systems, with a very controlled communication link.

You have your Meteor app servers, there normal, they have a mongo database etc. You have to deal with their security in the normals ways.
Now you also have your API scraper. This system is much more locked down than the normal app servers and they can not be accessed from the public internet, all they do is scrape data and process it. They have their own database for storing the credentials they need for scraping, that cannot be accessed by the main app servers.
They then write this processed data into you main apps Mongo database (or to their own depending on how sensitive it is)
The application servers only every have write access to the secure scraper database (either through a custom service or database access controls). This way if an app server is compromised they can only ever overwrite the users details, not steal them.

Now you might ask, what if my scraper servers get compromised, well that is a possibility, but this way you can harden them as much as possible and put a strict set of communication rules between them and the app servers (which face the internet and are the most likely point of first compromise by an attacker).

Hopefully this was helpful and I am happy to elaborate on any points that are confusing. Though if the 2 service set up sounds too complicated, you can always just store the raw credentials in the database and make sure that the user know that it should be a unique password to the 3rd party service and that if they or the 3rd party notice suspicious activity to notify you and change the credentials.

a4xrbj1 · September 2, 2015, 4:33am

Thank you so much @entropy for your detailed answer. I was contemplating anyway with a two server setup but more from the POV that if I set it up with a separate scraper server then a performance problem on that server wouldn’t affect the app server at all (as the web scraping isn’t an urgent thing).

I’ve also checked the excellent https://vaultproject.io and it sounds like the correct, secure solution for storing my users passwords to the third party service.

Will therefore follow your advice and try to setup with the two servers and storing in Vault.

I do have two more questions on your answer (more to fully understand every point):

You wrote “Now you also have your API scraper. This system is much more locked down than the normal app servers and they can not be accessed from the public internet, all they do is scrape data and process it. They have their own database for storing the credentials they need for scraping, that cannot be accessed by the main app servers.”

My idea (and understanding) is that the scraping server communicates with the app server via API calls (back-end to back-end communication). The scraping server pushes the data it has extracted from the third party screens to the back-end of the app server, which writes it into their own MongoDB. This data is anyway public for everyone who creates a free account with the third party service.

But how do I managed that my scraper server isn’t accessed from the public internet? It’s probably a newbie question but by eg hosting it on my own local web server (which is behind a firewall). I have a Synology diskstation which runs a web server, thanks to my ISP blocking port 80 it’s not possible to reach it from the internet but I guess I can make a call (PUT/GET) out to the internet still (have to check). Would that be a suitable solution?

You wrote: “The application servers only every have write access to the secure scraper database (either through a custom service or database access controls). This way if an app server is compromised they can only ever overwrite the users details, not steal them.”

I don’t understand why the app server should have any (read or write) access to the scraper server at all? It can send a GET api call to the scraping server which triggers off the scraping process via a normal route. Like I wrote above the eventual results of that process are written into the app servers database from the scraping server directly. Somehow you lost me on your last point.

BTW, I plan to use the excellent https://atmospherejs.com/vsivsi/job-collection package on the scraping server

entropy · September 3, 2015, 12:12pm

To answer your questions.

When I said that the scraping server was not publicly accessible to the internet, I was assuming you were using a hosted environment such as Digital Ocean or Amazon Web Services. In these scenarios you would set up an internet network with your scraping server and other backend services and they would communicate with your application servers through the Internal (with adequate access controls). The front end would be behind a web server which would provide public access and probably load balance and TSL terminate.

If you are forced to use a home environment I would suggest setting up a DMZ for your front end servers. Then use a DMZ pinhole for communication to your scraping servers. Be aware that ISPs do not like when people host business websites on personal plans. Even if you get around their port blocking, they will probably monitor for unusual traffic and depending on their terms of service ask you about the strange usage.

2.So I made the assumption that the user would be able to change their credentials and that they would be able to cause a scraping procedure to occur. This is why you need some sort of communication channel or RPC. When you say App server makes a GET call to the scraping server (Without getting into REST semantics) this counts as write access, since the action causes the scraping sever to update data, I didn’t specifically mean database access.