Celery tasks¶
build_experiment
task¶
The business requirements for this project included the ability for experiments to rely on versioned dependencies without causing conflicts between experiments.
The experiment application is dependent on the ember-lookit-frameplayer
repo.
Researchers have the ability to specify a custom github url for the
ember-lookit-frameplayer
repo. They can also specify a SHA for the commit that
they would like to use. These fields are on the Build Study Page.
What happens¶
The build process uses celery, docker, ember-cli, yarn, and bower.
When a build or preview is requested a celery task is put into the build queue.
Inside the task, the study and requesting user are looked up in the
database. If it’s a preview task it’s current state is copied into a new
variable to be saved for later, then the study is put into the state of
previewing
and saved. If it’s a deployment the study is put into the
state of deployment
and saved. Since these states don’t actually
exist in the workflow definition this short circuits the workflow engine
so that studies currently undergoing deployment or preview can neither
move through the workflow or be previewed or deployed concurrently.
The SHAs are checked in the study model’s metadata field, if they are
empty or invalid the HEAD of the default branch is used. This
requires HTTPS calls to github for the
ember-lookit-frameplayer
repository.
A zip file of each repo is downloaded to a temporary directory and
extracted. The lookit-frame-player
archive is extracted in the
checkout directory (ember_build/checkouts/{player_sha}
).
A docker image is built based on the node:8.2.1-slim image. It is rebuilt every time because it doesn’t change very often and docker rebuilds of unchanged images are very fast.
The container is started passing several environment variables. It
installs python3 and several other dependencies with apt-get
. Then
it installs yarn, bower, and ember-cli@2.8 globally with npm. Next it
mounts the ember-build/checkouts
directory to /checkouts
inside
the container and the ember-build/deployments
directory to
/deployments
inside the container. It copies
ember-build/build.sh
and ember-build/environment
into the root
(/
) of the container and executes /bin/bash /build.sh
.
build.sh
copies the contents of the checkout directory into the
container (/checkout-dir/
inside the container) for faster file
access. A couple of sed
replacements are done where there are
experiment specific data that needs to be hardcoded prior to
ember-build
running. The environment
files are copied into the
correct places. Then yarn install --pure-lockfile
and
bower install --allow-root
are run for
ember-lookit-frameplayer
. Once those have completed ember-build -prod
is run to create a distributable copy of the app. The contents of the
dist
folder is then copied into the study output directory. The
container is now destroyed.
Once the build process is finished the files in the dist
folder are
copied to a folder on Google Cloud Storage. If it’s a preview they go
into a preview_experiments/{study.uuid}
folder in the bucket for the
environment (staging or production). If it’s a deployment they go into a
experiments/{study.uuid}
folder in the bucket for the environment
(staging or production).
When the task is finished copying the files to Google Cloud Storage an email is sent to the study admins and organization admins.
If the task was a preview task the state of the study is set back to it’s previous state. If it was a deployment the study is set to active. If the study is marked as discoverable, it will now be displayed on the lookit studies list page.
Finally, regardless of whether the task completed successfully a study
log will be created. The extra field (a JSON field) will contain the
logs of the image build process, the logs of the ember build process
than ran inside the docker container, any raised exception, and the any
logs generated by python during the entire task run. This is very
helpful for debugging. Line endings are encoded for ease of storage so
to read the results easily copy the contents of a study logs extra field
from the admin into an editor and replace the overly escaped linebreaks
(\\\\n|\\n)
with actual line breaks \n
. You can also use a JSON
beautifier/formatter to aid readability.
build_zipfile_of_videos
¶
This task downloads videos from MIT’s Amazon S3 bucket, zips them up, uploads them to Google Cloud Storage, generates a signed url good for 30m, and emails the requesting user that URL.
- The zip filename is generated from the study uuid, a sha256 of the included filenames, and whether it’s consent videos or all videos.
- If a zip file exists on Google Cloud Storage with the same name the file is not regenerated, an email with a link is immediately sent.
- After the task is completed all video files are immediately removed from the server. They still exist on s3 and Google Cloud Storage.
cleanup_builds
¶
This finds build directories older than a day and deletes them. It’s scheduled to run every morning at 2am.
cleanup_docker_images
¶
This finds unused docker images from previous builds and deletes them. It’s scheduled to run every morning at 3am.
cleanup_checkouts
¶
This finds checkout (extracted archives of github repos) directories older than a day and deletes them. It’s scheduled to run every morning at 4am.