Installation

The system is split into 7 substacks, each with its own docker-compose.yml file. In the Services section, they were highlighted in the main sketch.

We will configure the system in the following order:

flowchart LR
  Download --> Traefik --> s[KC Setup] --> KC --> Kaldi --> PnC --> Monitoring --> Backend --> Frontend

At the root of the repository, you will find an env.sh helper file. This script will set up appropriate environment variables for Docker Compose for each of the steps listed above. By running source env.sh you can see which of the substacks are enabled and which are disabled. When finished with the installation, source env.sh should output:

1
2
3
4
5
6
7
Substack on-traefik is running.
Substack on-keycloak is running.
Substack on-kaldi is running.
Substack on-pnc is running.
Substack on-monitoring is running.
Substack on-backend is running.
Substack on-frontend is running.

All of the configuration and deployment files are located in the /prod directory.

Slash prod Slash prod

Step 0

Copy ./prod/.env.example to ./prod/.env and fill in the required values. The DNS fields should contain no protocol, just the domain name.

E.g.:

1
2
3
DNS=on.uni-lj.si
DNS_KC=auth-on.uni-lj.si
DNS_MANAGEMENT=management-on.uni-lj.si

Also, don’t forget to set MINIO_ACCESS_KEY and MINIO_SECRET_KEY. You are free to change all usernames and passwords to fit your requirements.

Downloading the Models

Run source env.sh download and then follow up with docker compose up. Wait until all the containers finish. Note that this step will fail if you haven’t properly configured the MINIO_HOST, MINIO_ACCESS_KEY, and MINIO_SECRET_KEY in the ./prod/.env file.

The downloaded models reside in the ./data/kaldi and ./data/pnc directories respectively.

After the download it is also advised to run sudo chown -R YOUR_USERNAME data to change the ownership of the downloaded files, since they retain their original ownership set by the remote storage.

Setting up the Reverse Proxy

Copy the configuration file with cp prod/services/proxy/traefik.example.yaml prod/services/proxy/traefik.yaml. In the traefik.yaml file, set the certificatesResolvers.maincertresolver.acme.email to your email address.

Afterwards, run source env.sh traefik and then docker compose up -d. This will set up the reverse proxy.

Run docker compose logs to check if everything is running correctly. The logs should be mostly empty. Visit the https://Managment DNS/dashboard/ (dont forget the trailing slash) to verify that the Traefik dashboard is running. Enter appropriate credentials from the .env file (TRAEFIK_DASHBOARD_CREDS).

Setting up the Authentication Service Keycloak

Run source env.sh setup and then docker compose up. In a secondary terminal, change the directory to terraform and create a new terraform.tfvars file with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
kc_admin_username = "<KEYCLOAK_ADMIN from .env>"
kc_admin_password = "<KEYCLOAK_PASS from .env>"
kc_admin_url = "https://<DNS_KC from .env>"
idp_uni_lj_client_id = ""
idp_uni_lj_client_secret = ""

smtp_host = ""
smtp_port = 465
smtp_auth_username = ""
smtp_auth_password = ""
smtp_from_email = ""
smtp_ssl = true

dns_origins = ["<DNS from .env>"]

Check the terraform/variables.tf file for more information on the variables and their descriptions.

If you are unfamiliar with Terraform, you can find more information on the official website.

Instead of installing Terraform on your machine, you can use the Docker image. Run the following commands in the terraform directory:

docker run -it --rm -v $PWD:$PWD -w $PWD hashicorp/terraform:latest init

This will download the required plugins for Terraform. Afterwards, you can run:

docker run -it --rm -v $PWD:$PWD -w $PWD hashicorp/terraform:latest apply

After being prompted to confirm the changes, answer yes and the setup will be complete.

Verify the Keycloak Setup

To verify the setup, navigate to https://Keycloak DNS and log in with the credentials you set in the terraform.tfvars file. Note that the administrator login is only available when the setup is in progress. After the setup is complete, the administrator login will be disabled.

Keycloak login Keycloak login

Switch to the on realm and verify that the developer account was created.

On realm On realm

Afterwards, close the terminal with the Terraform setup and interrupt the docker compose up command with Ctrl+C.

Keycloak

All we need to do now is run source env.sh keycloak and then docker compose up -d. This will start the Keycloak service in the background. Verify that the service is running with docker compose logs. Note that the administraor login is now blocked by the reverse proxy for security reasons.

Kaldi Transcription Services

In the .env file, set the following variables:

  • KALDI_ON_DR_NUM_OF_WORKERS - the number of workers for non-STEM courses.
  • KALDI_ON_NT_NUM_OF_WORKERS - the number of workers for STEM courses.

This will dictate the number of possible concurrent streaming sessions for each type of course. A rule of thumb is to reserve 1.5GB of RAM per worker.

We already set up the Kaldi services in the Download step. Now, we just need to run source env.sh kaldi and then docker compose up -d. Verify that the service is running with docker compose logs.

Punctuation and Capitalization Service

Again, we already set up the PnC service in the Download step. Now, we just need to run source env.sh pnc and then docker compose up -d. Verify that the service is running with docker compose logs. Note that PnC is best deployed with a GPU. The helper script will try to automatically detect the GPU and set the appropriate environment variables. Look at the source code of the env.sh script for more information and the ./prod/compose.pnc-gpu.yaml file for GPU-specific configurations.


Intermediate step

At this point, the output of source env.sh should be:

Substack on-traefik is running.
Substack on-keycloak is running.
Substack on-kaldi is running.
Substack on-pnc is running.
Substack on-monitoring is not running.
Substack on-backend is not running.
Substack on-frontend is not running.

Monitoring

Run source env.sh monitoring and then docker compose up -d. Verify that the service is running with docker compose logs. Note that the first-time run might take a while.

Navigate to https://Management DNS/grafana/login and log in with the credentials from the .env file.

Grafana login Grafana login

To verify everything is working, click the Dashboard button and the Keycloak system dashboard. It should look something like this:

Grafana dashboard Grafana dashboard

Verify that log collection is working by navigating to the “Logs explore” page and selecting the Loki datasource. You should see logs from the keycloak service.

Grafana logs Grafana logs

Backend

In ./prod/services/backend/, copy the application.example.yaml to application.yaml and fill in the required values. In addition, place your GoogleTranslation.json (which you acquired in the prerequisites) in the same directory.

Run source env.sh backend and then docker compose up -d. Verify that the services are running with docker compose logs.

Frontend

Run source env.sh frontend and then docker compose up -d. Verify that the services are running with docker compose logs.

Navigate to https://Main DNS/, https://Main DNS/admin and https://Main DNS/stream to verify that the frontend services are running - all should redirect you to the login page.

Login page Login page