Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log Pipeline Extensions #121

Merged
merged 46 commits into from
Aug 21, 2017
Merged

Log Pipeline Extensions #121

merged 46 commits into from
Aug 21, 2017

Conversation

kornicameister
Copy link
Contributor

Full log pipeline to monasca-docker.
Pipeline is defined in docker-compose extensions
log-pipeline.yml to minimize an impact on
metric pipeline.

log-pipeline features:

  • monasca-log-{api,persister,transformer,metrics}
  • elasticsearch + template loading
  • kibana

@kornicameister
Copy link
Contributor Author

Note that this commit is WIP and although the goal is to have this merged, I would consider this as the common ground to get a feeling about some conventions monasca-docker follows.

@timothyb89
Copy link
Member

Interesting work! I have no objections to merging once everything's ready.

One quick issue: CI/CD integration is using build.yml now rather than build.sh, I've been meaning to remove those and document the new system for ages now... at any rate, once a build.yml is added we'll have automatic build+test+release for the new components.

If I'm interpreting things right, the log pipeline is in a separate docker-compose yaml file? We may need to make some changes to /ci.py to include that in our docker-compose testing (if that's desirable). It should be pretty straightforward as per https://docs.docker.com/compose/extends/#understanding-multiple-compose-files, though we'll need to figure out when to run monasca + log api versus just monasca since it might be a bit heavy to run during normal tests.

@kornicameister
Copy link
Contributor Author

@timothyb89 ideally I wanted to make create another yml file called metric-pipeline.yml. In other words there would be a YML that would describe all components for metric processing. That said, the 3rd YML would be for common stuff (zookeeper, kafka AFAIR) - however, that might be a bit too much for now and in overall - so left that for now.

Going back to this PR. Yeah, there is seperate log-pipeline.yml. My goal was to minimize the impact over what you've already done. There is just one common part - the kafka topics. Maybe there's a way to extend a service to include just those new components, or we might try and define a single-job container that would depend on kafka, do what already has been done for metric pipeline and do some extra job required by log pipeline. Anyway that's up to you I guess.

I would just ask, from my part, to take closer look at monasca-log-api image, that present a bit different approach to to the topic of building python-based image. I am curious about your opinion on that.

@kornicameister
Copy link
Contributor Author

@timothyb89 I ran into the problem with multi-stage build. Does docker in travis supports that ?
Other then that - I think that current codebase for log-pipeline looks a lot better and fits into what you've already have done.

Copy link
Member

@timothyb89 timothyb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: multi-stage builds, it should be possible to upgrade according to https://docs.travis-ci.com/user/docker/#Installing-a-newer-Docker-version - I'll look into that soon

I think it would be good to move the kafka topics script into it's own init container like mysql-init/influx-init/etc. Then it should be possible to reuse it for any optional components.

python setup.py install && \
cd / && \
rm -rf /monasca-log-api && \
apk del build-dep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main caveat here is that apk del build-dep is run in another layer so without --squash we'll end up including all the build dependencies in the resulting image anyway. The log API image ends up being ~30 MiB larger than the vanilla monasca-api's right now (as measured by docker hub, 70 MB vs 100 MB) since the version of Docker on Travis doesn't have --squash available/enabled.

In general I like the idea of REBUILD_... args (I made a point of supporting them in dbuild which runs our CI builds) however they depend on docker build --squash to produce small images. Unfortunately squash functionality is still experimental and seemed to behave poorly when I last evaluated it. (I'd tested it with the thresh container but ended up removing the REBUILD instructions to keep builds small in Travis).

It might be worthwhile to revisit enabling --squash in Travis (especially with #80), so I think there's 3 options:

  1. don't use REBUILD_... args yet (longer build times for devs, smaller images from Travis)
  2. use REBUILD_, ignore large images from Travis (shorter build times for devs, larger images from Travis)
  3. update Docker and enable experimental flags in Travis (short and small builds)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...missed that bit. You're right, hmm for #80 I have submitted #126 , so might be that once we clear #126 it will be possible to have those bits here. For now, I will just leave them commented and in future we can comment these out.

@@ -0,0 +1,58 @@
FROM python:2-alpine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think installing python via alpine's package manager results in a (slightly) smaller image than python:alpine but probably not enough to worry about. I would say we should use python:2-alpine3.6 though, that base tag uses alpine:3.4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About that I had mixed feeling.
From one side I'd prefer reusing official images instead of manually installing Python.
On the other side my biggest concern is lack of possibility to easily build Python3 based container.

monasca-api does not have Py3 support, but log-api is different beast, it was much easier to have it implemented there. Not to mention that recently we've managed to enable monasca-notification (still some job is needed, but at least we won't have regression of failing tests).

Having all that said, I think I will simply rollback to monasca-api approach here to enable that possibility.

pip install --no-cache-dir Jinja2 gunicorn -c $CONSTRAINTS_FILE

ARG REBUILD_CHECKOUT=1
RUN git clone $LOG_API_REPO --depth 1 --branch $LOG_API_BRANCH monasca-log-api && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using a four-step clone in most of our other Dockerfiles since it gives more options for ..._BRANCH. For instance here it would let us pass --build-arg LOG_API_BRANCH=refs/changes/43/485443/2 to build directly from an OpenStack patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, rollback to 4-step-clone will be in next commit.

@kornicameister
Copy link
Contributor Author

@mhoppal @timothyb89 - I've reached, or at least I believe, I reached a point where is pretty much nothing that I could add over here.

There is, though, one glitch - that I am not really sure. The fact that even if you lunch only metric-pipeline (a.k.a. docker-compose up) you'll end up with 2 extra topics that are not used by metric-pipeline. The question is, should we write down kafka-init image or are you fine with pulling this is as it is.

If an answer is no-no ( ;-) ) I will simply write down all the documentation requires here and someone will pick up kafka-init container (actually I could start working on this now, but I'd need some help with helm later on).

What do you think ?

@kornicameister kornicameister self-assigned this Jul 28, 2017
@kornicameister kornicameister changed the title [WIP] Log Pipeline Extensions Log Pipeline Extensions Jul 31, 2017
runtime.

The config file sources are available [in the repository][5]. If necessary, the
generated config files can be viewed at runtime by running:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| `ACCESS_LOG_FORMAT` | `%(asctime)s [%(process)d] gunicorn.access [%(levelname)s] %(message)s` | Log format for access log |
| `ACCESS_LOG_FIELDS` | `%(h)s %(l)s %(u)s %(t)s %(r)s %(s)s %(b)s "%(f)s" "%(a)s" %(L)s` | Access log fields |

If additional values need to be overridden, new config files or jinja2 templates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some info how to do this replacements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I copied that over from the api documentation. Can we do this in another PR ?

|---------------------------|------------------|------------------------------------|
| `ZOOKEEPER_URI` | `zookeeper:2181` | An URI to Zookeeper server |
| `KAFKA_URI` | `kafka:9092` | The host and port for kafka |
| `KAFKA_WAIT_FOR_TOPICS` | `log-transformed,metrics` | Topics to wait on at startup |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be transformed-log?

| Variable | Default | Description |
|----------------------------|-------------------|-------------------------------------|
| `ZOOKEEPER_URI` | `zookeeper:2181` | An URI to Zookeeper server |
| `KAFKA_WAIT_FOR_TOPICS` | `log-transformed` | Topics to wait on at startup |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformed-log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ELASTICSEARCH_SNIFFING=true \
ELASTICSEARCH_SNIFFING_DELAY=5 \
ZOOKEEPER_URI=zookeeper:2181 \
KAFKA_WAIT_FOR_TOPICS=log-transformed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformed-log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be transformed-log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my...got lost in all that :/

# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
from __future__ import print_function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Tags
----

TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill with info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

variants:
- tag: 0.0.1
aliases:
- latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be - :latest - causing a tag parse error in CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done - will be in next commit.

@kornicameister
Copy link
Contributor Author

@timothyb89 any idea what might gone wrong this time :/. I am looking at the logs but cannot determine the cause :/

Tomasz Trębski added 16 commits August 7, 2017 11:00
Full log pipeline to monasca-docker.
Pipeline is defined in docker-compose extensions
`log-pipeline.yml` to minimize an impact on
metric pipeline.

log-pipeline features:

* monasca-log-{api,persister,transformer,metrics}
* elasticsearch + template loading
* kibana
Following commit completes several items:

* added build.yml to integrate into existing CI
* removed all images, apart from master and stable/ocata
* made constraints control a little bit easier with just
  contstrains branch variable
Previously tags were build in wrong way.
Now they are correctly written using `:` as
prefix
Introducing 3 new checkpoints for
monasca-log-api docker image:

- REBUILD_DEPENDENCIES
- REBUILD_CHECKOUT
- REBUILD_CONFIG
Commit adjusts monasca-log-{metrics,persister,transformer}
to:

* include build.yml to be picked up by CI
* shortened images names (removed monasca- prefix)
Commits introduces build checkpoints for:

* monasca-log-metrics
* monasca-log-persister
* monasca-log-transformer
log-pipeline.yml includes bad references
to monasca-log-{metrics,transformer,persister} images
Right now it is possible to enable or disable
monasca-kibana-plugin with environment variable
MONASCA_PLUGIN_ENABLED.

By default the plugin is disabled, to allow
to access Kibana.
* removed FROM python image, using alpine instead and
installing Python by hand
* removed one REBUILD_ step, for now it is better to keep
usage of these low until newer docker is available in travis
All images for monasca-docker should be placed
under monasca namespace
monasca-api variable for the port name starts with
`MONASCA_CONTAINER_` prefix. Make equaivalent log-api
variable to follow the same principle.
kornicameister and others added 6 commits August 11, 2017 12:25
Being explic in build.yml
Now, when there is a separate image to create topic
in kafka, log pipeline should now affect the metric
deployment.
kibana plugin -i monasca-kibana-plugin -u file:///monasca-kibana-plugin.tar.gz && \
rm -rf /monasca-kibana-plugin.tar.gz

CMD /start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ENTRYPOINT if you don't want developers to change the executable that is run when the container starts. Prefer ENTRYPOINT than CMD when building executable Docker image

$KEYSTONE_ADMIN_DOMAIN \
$MONASCA_CONTAINER_LOG_API_PORT

CMD ["/start.sh"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ENTRYPOINT if you don't want developers to change the executable that is run when the container starts. Prefer ENTRYPOINT than CMD when building executable Docker image

COPY log-metrics* /etc/monasca/
COPY template.py start.sh kafka_wait_for_topics.py /

CMD ["/start.sh"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ENTRYPOINT if you don't want developers to change the executable that is run when the container starts. Prefer ENTRYPOINT than CMD when building executable Docker image

COPY log-transformer* /etc/monasca/
COPY template.py start.sh kafka_wait_for_topics.py /

CMD ["/start.sh"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ENTRYPOINT if you don't want developers to change the executable that is run when the container starts. Prefer ENTRYPOINT than CMD when building executable Docker image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In every image there's a CMD is used. In that's a bad practice example, as you told me face-to-face, I guess we should face cover other images, in separate PR. And, as usual, after that update that PR to include this solution.

@timothyb89 what do you think about that ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I tend to like CMD since you can more easily override with e.g. sh if needed for test/debugging. Plus eventually I'd like to see our containers all use tini which recommends setting the entrypoint to a different executable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea with tini. I can submit a PR with that after #121 and #143 are merged. I mean #143 for sure, and #121 I would like to have ;)

@kornicameister
Copy link
Contributor Author

@matrixik could you take & run this one today ? I wonder is there anything more I could've missed ;/

kornicameister and others added 2 commits August 16, 2017 06:43
Same as monasca-api, monasca-log-api should use memcache
to speed up token related operations.
@kornicameister
Copy link
Contributor Author

@timothyb89 can I ask you to take a closer look at this one ? We're kind of approaching a deadline @ Fujitsu and we would like to have this inside the repository. I believe we have some items to cover here (launching a tempests / CI, include all that inside travis etc - but that's a lot of items we need to discuss anyway [should I write an issue for that ?]).

Thx.

@matrixik
Copy link
Member

Looks like it's working for me.

@matrixik
Copy link
Member

Just some random complain: I hate github PRs for reviewing longer/bigger code... In compare Gerrit feels like heaven...

@kornicameister
Copy link
Contributor Author

I wonder if we should add .env entries for log-pipeline now or after first images will get published to the hub. I'd say the best approach would be to freeze it after all images are already pushed.

@timothyb89 do you agree ?

@kornicameister
Copy link
Contributor Author

@timothyb89 @mhoppal friendly ping ;)

@timothyb89
Copy link
Member

@kornicameister yeah, we'll need to add them after since we won't know the timestamps in advance

Copy link
Member

@timothyb89 timothyb89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the kibana image tag I think this is working fine on my machine. Will merge now since we'll need to add log pipeline versions in .env in a follow-up PR anyway.

Thanks for the change!

- kafka

kibana:
image: monasca/kibana:4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be monasca/kibana:4.6.3-master? Everything seems to be working on my end with the tag here changed to something valid.

@timothyb89 timothyb89 merged commit 08e9963 into monasca:master Aug 21, 2017
@timothyb89
Copy link
Member

Unrelated to this patch, but the agent seems to be broken right now, keystone auth is failing for some reason. There might be CI failures until we figure out the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants