How fit is Galaxy for a multi-service architecture?

illustreets · July 24, 2016, 1:21pm

Hi everyone,

I really appreciate if seasoned Galaxy users, or someone from MDG, can guide me. We have developed a pretty complex application, all of it with Meteor. It does, however, depend on a multitude of other services that cannot - or seemingly cannot - exist in Galaxy at the moment. I’m talking about: PostgreSQL + PostGIS, Mongo, a tile server built in Java, Varnish (adapted for tile caching and tied to the tiler), several ETL processes (some of them written in bash) for importing files.

Do you think that the following is a setup that would reliably work with Galaxy:

The two databases to be deployed via Compose.io, in the same AWS regions as Galaxy (we plan deployments in both regions).
The tile service (together with the caching service) in AWS, in the same region as Galaxy AND the Postgres DB. Metror, the tiler, and PG need to be communicating in realtime, all the time.
The ETL processes in the same container as the Meteor app. These are simply bash or compiled excutables, that are spawned as child processes at various steps during file imports into PG.

From your experience, how are things regarding latency? If I simply put everything pertaining to one deployment in the same region, would that be ok?

Then what about security? I cannot see how I would be putting all these in the same subnet, then a reverse proxy in front of everything, and firwall the whole herd. Is there a secure alternative to this setup, short of having to install SSL certs everywhere?

I am very keen to find a way of using Galaxy with all this setup.

Thanks!
Manuel

illustreets · July 25, 2016, 12:02pm

Trying to bump it. Isn’t there anyone who could answer this?

lucfranken · July 26, 2016, 8:36am

Your solution seems very reasonable. That’s how the AWS platform works, you just use all of those different services and connect them.

I would try to see them as much as separate services and connect to them with (with SSL always) unless you really run into a performance or security issues. On the specifics on how to connect and run the traffic locally in AWS read up their white papers. It differs sometimes per product. For security it seems you only need to give access to the Galaxy instances (by default) so you can just use the AWS tools for keeping your internal services from the internet. You will need to find out how to give the Galaxy machines access off course.

On 3 I am not sure, in general we would not run worker processes on user-facing machines. But it may work in your case because it’s too tightly connected. We would move such background processes to an EC2 instance for example or even a hosted process like lambda.

Also I am not sure how your ETL processes will work if containers go down, get restarted etcetera.

illustreets · July 26, 2016, 12:35pm

Thank you for your reply.

I would prefer the security of a subnet instead of that of SSL, but it’s almost impossible to implement it fully, it seems, in this architecture.

There is no way of knowing the IPs of the Galaxy containers. I guess we should put some sort of orchestrator service at the entry point into the local network, to mediate the communication between Meteor and the rest. The Meteor app and the orchestrator would be connected via SSL.

Good point about the ETL module - the more I think about it, the more I realise that I don’t want it coupled to the Meteor app. It does make sense to separate it and have it listening to S3 events, then process the files as they arrive in the bucket.

lucfranken · July 27, 2016, 7:26pm

Happy it’s usefull for you!

I think you have to accept that when working with multiple services. It’s still secure and you can lock it down quite far. Also MDG might have some specifics for that issue. If you need help from them get a Galaxy account so they can support you. Don’t invest in some in-between orchestrator before you get in touch with them. And by the way. how do you connect to Compose?

About ETL: In almost any case that’s the way to go.

More general: I think it’s healthy that you have multiple services which are not your own connected. It does seem like the way to go since hosting all by yourself it too hard.

Taking that in consideration: Scaling is also easier because it becomes an issue of the suppliers. Clearly you need control but you don’t need control on the nitty details of scaling. So it can get you up to speed. Later on you can get to a point where you host the most important pieces yourself if needed. Can be custom hosted on ec2 but also even on your own servers if you really want.

Even if your need to move out s3 you can buy or create a replacement with the same api quite easily. That’s what’s interesting when your services get separated. We normally try to build most in Meteor and later move out the pieces we don’t want inside the single app anymore. If you have already done that it’s quite an advantage in the view of separation of concerns for sure.

illustreets · July 31, 2016, 11:57pm

I got in touch with the sales team and they also recommended ticketing.

Compose connection - well, just the usual way it seems. If I choose the setup above, then it’ll simply be via SSL… and that’s about it. This is a thorny issue. I’m less and less inclined to go with Galaxy. I will be storing confidential data in both databases so I want to be able to fully secure them.

I think we will end up with the Meteor app on Elastic Beanstalk, so I can implement IP whitelisting/private subnets. Even better, maybe even forget about Compose, deploy a Meteor Atlas replica set, an Amazon RDS instance, and all of them in a VPC. Done.

gothicmage · August 1, 2016, 1:29am

You may find an answer for many of questions of yours at Galaxy FAQ
Galaxy is AWS based, so in theory it may be more versatile than what most Galaxy clients need.

You may still want to wait for MDG staff reply however @sashko, @rohit2b

illustreets · August 2, 2016, 4:07am

You can bet I’ve read that. Over and over again

AWS based, and versatile… but not versatile enough. Could be OK for projects not handling very sensitive data. If you need to connect to database instances in a VPC (hard), or to a low-cost, managed database such as Compose, which lives outside a VPC so it needs IP whitelisting (almost impossible), then you are out of luck with Galaxy.