Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Support running Heron on Amazon Ecs. #1837

Open
wants to merge 62 commits into
base: master
Choose a base branch
from
Open

Conversation

ananthgs
Copy link

@ananthgs ananthgs commented May 5, 2017

Pre requisite: AWS and ECS-CLI needs to be installed on the submitter machine as per guidelines from Amazon.

#heron.statemgr.connection.string: "xx.xx.xx.xx:2181"


# path of the root address to store the state in a local file system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a local file system or it expects zookeeper?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does expect zookeeper to be running on one of the ec2 instances. As suggested I will put up a document explaining the offline set ups on Amazon ec2 and the following steps on the scheduler.


heron.statemgr.zookeeper.retry.interval.ms: 10000

####################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require tunneling? If not, please remove the tunnel configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done set default to false

@@ -155,7 +171,14 @@ def init_parsed_args(self, args):
# id within docker, rather than the host's hostname. NOTE: this 'HOST' env variable is not
# guaranteed to be set in all Docker executor environments (outside of Marathon)
if is_docker_environment():
self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname()
# Need to set the HOST environment vairable if docker is for AWS ECS tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding code to this function, can we make a overall function called get_host - in which we can check whether it is running in docker or ECS or use socket.gethostname() and return the hostname.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for ways to do this without one-off logic to support ECS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added gethost function

public static final String ECS_CLUSTER_BINARY = "heron-examples.jar";
public static final String COMPOSE_CMD = "ecs-cli compose --project-name ";
public static final String UP = " up";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of hard coding of values here. Not sure why you need them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing as much of this hard-coding as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@kramasamy
Copy link
Contributor

@ananthgs - can you write a good concise PR description about how it works with ECS?

@kramasamy
Copy link
Contributor

@ananthgs - you might want to add a detailed documentation of how to set it up and use, as described in other schedulers for heron.io. However, this can be done in a different PR.

Copy link
Contributor

@billonahill billonahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission! Please add unit tests and accept the CLA:

https://twitter.github.io/heron/docs/contributors/community/


heron.statemgr.zookeeper.retry.interval.ms: 10000

####################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tunneling could be needed for any of the schedulers and they're valid configs so I think it's ok to keep them in the config file. Maybe we set it to False by default though.

@@ -155,7 +171,14 @@ def init_parsed_args(self, args):
# id within docker, rather than the host's hostname. NOTE: this 'HOST' env variable is not
# guaranteed to be set in all Docker executor environments (outside of Marathon)
if is_docker_environment():
self.master_host = os.environ.get('HOST') if 'HOST' in os.environ else socket.gethostname()
# Need to set the HOST environment vairable if docker is for AWS ECS tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for ways to do this without one-off logic to support ECS.

public static final String ECS_CLUSTER_BINARY = "heron-examples.jar";
public static final String COMPOSE_CMD = "ecs-cli compose --project-name ";
public static final String UP = " up";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing as much of this hard-coding as possible.

+ " herondata:\n"
+ " driver: local";
public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle";
public static final String ECS_CLUSTER_BINARY = "heron-examples.jar";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this code require a dep on heron-examples.jar?

+ "volumes:\n"
+ " herondata:\n"
+ " driver: local";
public static final String DESTINATION_JVM = "/usr/lib/jvm/java-8-oracle";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does jvm location need to be hardcoded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your Docker image contains a pre-installed version of the Java 8 JRE, you should be able to reference $JAVA_HOME here. You should also be able to set that from within the scheduler configuration as well, via the heron.directory.sandbox.java.home option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

public String replacePortNumbers(int container, String content) {
int basePortnumber = 5000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5000 here and the logic in this method looks duplicative of other logic in this class. Can you consolidate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

public String getDockerFileContent(String execCommand, int container) {
String commandBuiler = EcsContext.PART1 + EcsContext.CMD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Favor StringBuilder over string concat. Or better yet, use String.format(..)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Will modify

public String replacePortNumbers(int container, String content) {
int basePortnumber = 5000;
String localContent = new String(content);
for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify in the javadocs what this method does. Also I suspect StringBuilder might be a better fit here.

}
builder.append(string);
}
String stringToReturn = builder.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return builder.toString();

return null;
}

public boolean onKill(Scheduler.KillTopologyRequest request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods all need to be implemented and annotated with @OverRide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please implement those methods in the following commits

@@ -108,6 +109,21 @@ def log_pid_for_process(process_name, pid):
def is_docker_environment():
return os.path.isfile('/.dockerenv')

def isEcsAmiInstance():
meta = 'http://169.254.169.254/latest/meta-data/ami-id'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ananthgs - why is this URL hard coded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove hard coded url addresses

pass it from out side if necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, Passing it as argument

+ "services:\n"
+ " container_number:\n"
+ " image: ananthgs/onlyheronandubuntu\n";
public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ananthgs - Is this image a generic ubuntu image downloaded from docker hub? or specially created image? If this is specially created image - what does it contain other than the basic OS?

public static final String ECSNETWORK = " networks:\n"
+ " - heron\n"
+ " ports:\n"
+ " - \"5000:5000\"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these ports are hard coded? Can we ask ECS to provide a dynamic port?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done removed hard coded ports

self.master_host = subprocess.Popen(["curl",
"http://169.254.169.254/latest/meta-data/local-ipv4"]
, stdout=subprocess.PIPE).communicate()[0]
os.environ['HOST'] = self.master_host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this line since you won't be grabbing os.environ['HOST'] after this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

+ " image: ananthgs/onlyheronandubuntu\n";
public static final String CMD = " command: [\"sh\", \"-c\", \"mkdir /s3; cd /s3 ;"
+ "aws s3 cp s3://herondockercal/TOPOLOGY_NAME/topology.tar.gz /s3 ;"
+ "aws s3 cp s3://herondockercal/heron-core-testbuild-ubuntu14.04.tar.gz /s3 ;cd /s3;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That paths to the core binaries will need to be more dynamic than this and will need to feed out of the heron.package.core.uri configuration option in the scheduler library right?

String content = null;
try {
tempDockerFile = File.createTempFile("docker", ".yml");
content = getDockerFileContent(finalExecCommand, container);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what the process is here for creating this docker.yml file just so I understand what's going on? What is it used for and what is it composed of -- just so I can understand a little better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS ECS task can be triggered via a docker style compose command. This Task is then run on the AWS Cluster in a Container instance. The overall approach is explained here :
https://docs.google.com/document/d/1ecbCuA46cIKPfY0SP0F1dcRlei4DIPz3pZ6ZSZ5zZgc/edit?usp=sharing.
Please feel free to comment on the approach.

@@ -108,6 +109,21 @@ def log_pid_for_process(process_name, pid):
def is_docker_environment():
return os.path.isfile('/.dockerenv')

def isEcsAmiInstance():
meta = 'http://169.254.169.254/latest/meta-data/ami-id'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove hard coded url addresses

pass it from out side if necessary

# Need to set the HOST environment vairable if docker is for AWS ECS tasks
if isEcsAmiInstance():
self.master_host = subprocess.Popen(["curl",
"http://169.254.169.254/latest/meta-data/local-ipv4"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same hard code url problem

return null;
}

public boolean onKill(Scheduler.KillTopologyRequest request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, please implement those methods in the following commits

}

public List<String> getJobLinks() {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an empty ArrayList is better than a null

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the get joblink feature to return the ecs tasks

private String[] getExecutorCommand(int container) {
List<Integer> freePorts = new ArrayList<>(SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR);
for (int i = 0; i < SchedulerUtils.PORTS_REQUIRED_FOR_EXECUTOR; i++) {
//freePorts.add(SysUtils.getFreePort());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason not using this getFreePort() method but hard code the port number?

LOG.info("Starting to deploy topology: " + EcsContext.topologyName(config));
LOG.info("Starting executor for TMaster");
startExecutor(0);
// for each container, run its own executor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the event of starting normal container here

@billonahill
Copy link
Contributor

@ajorgensen you guys also run on EC2, correct? If so could you review this to see if your approach and this approach could coexist in this codebase?

	modified:   ecs-heron-role.json
	modified:   ecs-heron-policy.json
# Conflicts:
#	heron/executor/src/python/heron_executor.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants