ossmalware icon indicating copy to clipboard operation
ossmalware copied to clipboard

Move from EC2 to a container-based workload

Open jordan-wright opened this issue 5 years ago • 3 comments

Right now we need to maintain EC2 worker instances for processing which isn't ideal since weird errors that occur during processing can propagate to cause the entire host to stall.

Ideally, we would move to a workload based solution like Fargate so that each package is installed in a totally isolated environment. Most seem to support SYS_PTRACE which is required for sysdig to work. It'll just be a matter of figuring out how to make it work.

jordan-wright avatar Nov 14 '20 03:11 jordan-wright

To make sure I understand your idea:

  • One container would start up in something like fargate
  • The entrypoint there would be a binary we control
  • This binary would setup the necessary monitoring (tcpdump, sysdig, ptrace, etc.)
  • This binary would execute the package installation as a subprocess
  • This binary would parse/upload the resulting data

Is that the rough idea?

dlorenc avatar Nov 28 '20 23:11 dlorenc

I can see a few options. I guess I considered each package being executed as a "task", where a task included setting up the monitoring and installing the package in a separate container.

The specifics of this are still up in the air. Ideally, we'd install the package in its own container and do all the monitoring out of band. Right now, that's done at the host-level, but it's possible that we could construct some kind of container workload that sets up the monitoring for us (e.g. a tcpdump container) and then does the package installation.

Alternatively, we could just have the entire task use our binary as the entrypoint, and it could setup the monitoring and execution, like you suggested. Kind of like a docker-in-docker situation or something.

I'm definitely open to ideas!

jordan-wright avatar Nov 29 '20 03:11 jordan-wright

I agree with your points, and I think there's probably a larger design discussion worth having here.

The specifics of this are still up in the air. Ideally, we'd install the package in its own container and do all the monitoring out of band. Yup, this sounds ideal.

Right now, that's done at the host-level, but it's possible that we could construct some kind of container workload that sets up the monitoring for us (e.g. a tcpdump container) and then does the package installation.

I'm having a hard time imagining how this would work, given the limitations of the container hosting platforms I've seen (Fargate, Cloud Run, etc.). Most restrict to one container, or don't allow the full host access that would be required for one container to start up and monitor the others.

I think we might also want to add support for Windows/Mac environments eventually, since installations could/do behave differently in different environments.

dlorenc avatar Nov 30 '20 15:11 dlorenc