Concepts (AEN 4.1.3)#

System overview

The Anaconda Enterprise Notebooks platform consists of 3 main service groups: AEN server, AEN gateway and AEN compute, which are called “nodes”:

  • Server node—The administrative front-end to the system where users login, user accounts are stored, and administrators manage the system.
  • Gateway node(s)—A reverse proxy that authenticates users and directs them to the proper compute node for their project. Users will not notice this node after installation as it automatically routes them.
  • Compute nodes—Where projects are stored and run.
../../../_images/ae-notebooks/4.1.3/aen-install-components.png

These services can be run on a single machine or distributed across multiple servers.

../../../_images/ae-notebooks/4.1.3/aen-install-network-diagram.png

Organizationally, each AEN installation has exactly 1 server instance and 1 or more gateway instances. Each compute node can only be connected to a single gateway. The collection of compute nodes served by a single gateway is called a data center. You can add data centers to the AEN installation at any time.

EXAMPLE: An AEN deployment with 2 data centers, where 1 gateway has a cluster of 20 physical computers, and the second gateway has 30 virtual machines, must have the following services installed and running:

  • 1 AEN server instance
  • 2 AEN gateway instances
  • 50 AEN compute instances (20 + 30)

Nodes must be configured and maintained separately.

Server node

The server node controls login, accounts, admin, project creation and management as well as interfacing with the database. It is the main entry point to AEN for all users. The server node handles project setup and ensures that users are sent to the correct project data center.

Since AEN is web-based, it uses the standard HTTP port 80 or HTTPS port 443 on the server.

AEN uses MongoDB for its internal data persistency. It is typically run on the same host as the server but can also be installed on a separate host.

Server nodes use NGINX to handle the user-facing AEN web interface. NGINX acts as a request proxy for the actual server web-process which runs on a high numbered port that only listens on localhost. NGINX is also responsible for static content.

Server is installed in the /opt/wakari/wakari-server directory.

Server processes

When you view the status of server processes, you may see the processes explained below.

supervisord details
description Manage wakari-worker, multiple processes of wk-server.
user wakari
configuration /opt/wakari/wakari-server/etc/supervisord.conf
log /opt/wakari/wakari-server/var/log/supervisord.log
control service wakari-server
ports none
wk-server details
description Handles user interaction and passing jobs on to the wakari gateway. Access to it is managed by NGINX.
user wakari
command /opt/wakari/wakari-server/bin/wk-server
configuration /opt/wakari/wakari-server/etc/wakari/
control service wakari-server
logs /opt/wakari/wakari-server/var/log/wakari/server.log
ports Not used in versions after 4.1.2 *

* AEN 4.1.2 and earlier use port 5000. This port is used only on localhost. Later versions of AEN use Unix sockets instead. The Unix socket path is: unix:/tmp/wakari-server.sock

wakari-worker details
description Asynchronously executes tasks from wk-server.
user wakari
logs /opt/wakari/wakari-server/var/log/wakari/worker.log
control service wakari-server
nginx details
description Serves static files and acts as proxy for all other requests passed to wk-server process. *
user nginx
configuration /etc/nginx/nginx.conf /opt/wakari/wakari-server/etc/conf.d/www.enterprise.conf
logs /var/log/nginx/woc.log /var/log/nginx/woc-error.log
control service nginx status
port 80

* In AEN 4.1.2 and earlier the wk-server process runs on port 5000 on localhost only. In later versions of AEN the wk-server process uses the Unix socket path unix:/tmp/wakari-server.sock.

NGINX runs at least two processes:

  • Master process running as root user.
  • Worker processes running as nginx user.

Gateway node

The gateway node serves as an access point for a given group of compute nodes. It acts as a proxy service and manages the authorization and mapping of URLs and ports to services that are running on those nodes. The gateway nodes provide a consistent uniform interface for the user.

NOTE: The gateway may also be referred to as a data center because it serves as the proxy for a collection of compute nodes.

You can put a gateway in each data center in a tiered scale-out fashion.

AEN gateway is installed in the /opt/wakari/wakari-gateway directory.

Gateway processes

When you view the status of server processes, you may see the processes explained below.

supervisord details
description Manages the wk-gateway process.
user wakari
configuration /opt/wakari/wakari-gateway/etc/supervisord.conf
log /opt/wakari/wakari-gateway/var/log/supervisord.log
control service wakari-gateway
ports none
wakari-gateway details
description Passes requests from the AEN Server to the Compute nodes.
user wakari
configuration /opt/wakari/wakari-gateway/etc/wakari/wk-gateway-config.json
logs /opt/wakari/wakari-gateway/var/log/wakari/gateway.application.log /opt/wakari/wakari-gateway/var/log/wakari/gateway.log
working dir / (root)
port 8089 (webcache)

Compute node(s)

Compute nodes are where applications such as Jupyter Notebook and Workbench actually run. They are also the hosts that a user sees when using the Terminal app or when using SSH to access a node. Compute nodes contain all user-visible programs.

Compute nodes only need to communicate with a gateway, so they can be completely isolated by a firewall.

Each project is associated with one or more compute nodes that are part of a single data center.

AEN compute nodes are installed in the /opt/wakari/wakari-compute directory.

Each compute node in the AEN system requires a compute launcher service to mediate access to the server and gateway.

Compute processes

When you view the status of server processes, you may see the processes explained below.

supervisord details
description Manages the wk-compute process.
user wakari
configuration /opt/wakari/wakari-compute/etc/supervisord.conf
log /opt/wakari/wakari-compute/var/log/supervisord.log
control service wakari-compute
working dir /opt/wakari/wakari-compute/etc
ports none
wk-compute details
description Launches compute processes.
user wakari
configuration /opt/wakari/wakari-compute/etc/wakari/wk-compute-launcher-config.json /opt/wakari/wakari-compute/etc/wakari/scripts/config.json
logs /opt/wakari/wakari-compute/var/log/wakari/compute-launcher.application.log /opt/wakari/wakari-compute/var/log/wakari/compute-launcher.log
working dir / (root)
control service wakari-compute
port 5002 (rfe)

Wk-compute loads each of the following configuration files, in this order:

  • /etc/wakari/config.json.
  • /etc/wakari/compute-launcher-config.json.
  • ./compute-launcher-config.json.
  • Any configuration file specified by the -c option.

If an option is specified in multiple files, the last one encountered takes precedence.

Supervisor and supervisord

AEN uses a process control system called “Supervisor” to run its services. Supervisor is run by the AEN Service Account user, usually wakari or aen_admin.

The Supervisor daemon process is called “supervisord”. It runs in the background and should rarely need to be restarted.

Anaconda environments

Each project has an associated conda environment containing the packages needed for that project. When a project is first started, AEN clones a default environment with the name “default” into the project directory.

For more information about environments, see Working with environments.

Projects and permissions

AEN users interact with the system predominantly through projects.

Projects are associated with a single data center within the AEN environment. The team of users includes one owner, which is the user that created the project.

Projects live in the projectRoot folder on the compute node—by default, /projects.

The project directory is created the first time a project is started. The start-project script clones it from /opt/wakari/wakari-compute/lib/node_modules/wakari-compute-launcher/skeleton.

Project directory permissions are:

owner: rwx, user who created the project
group: rwx, group of the owner
other: --x, to allow access to the Public folder
ACL: rwx for any other team members

Files and subdirectories within the project directory have the same permissions as the project directory, except:

  • The public folder and everything in it are open to anyone.
  • Any files hardlinked into the root anaconda environment—/opt/wakari/anaconda—are owned by the root or wakari users.

Project file and directory permissions are maintained by the start-project script. All files and directories in the project will have their permissions set when the project is started, except for files owned by root or the AEN_SRVC_ACCT user—by default, wakari or aen_admin.

The permissions set for files owned by root or the AEN_SRVC_ACCT user are not changed to avoid changing the permissions settings of any linked files in the /opt/wakari/anaconda directory.

CAUTION: Do not start a project as the AEN_SRVC_ACCT user. The permissions system does not correctly manage project files owned by this user.