Remote computing is the ability to access and use a computer or a network from a non-local location. Analysis of large biological data can require extensive computation time and substantial computing power, often more than what personal computers are capable of providing. The modern developer environment is one that can seamlessly transition from local to remote compute. This enables researchers to access powerful computational resources on remote servers whenever they are required.
Some tools for remote computing:
Code for this tutorial can be found here.
VScode is a fully featured integrated development environment (IDE) produced by Microsoft. VScode is open source and quickly become the default IDE. Download VScode and start to get a feel for how it works on your local machine. There are some very good docs on how to get started.
The Secure Shell Protocol (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network - https://en.wikipedia.org/wiki/Secure_Shell
The ssh
commamd is used to securely log into a remote server.
ssh [options] [user@]hostname [command]
A simple SSH command looks like:
Transmitting your password through the internet to connect to a remote server is dangerous. Instead you should use SSH public keys as the authentication method (see rderik’s blog for more details).
A private key is part of a key pair used in asymmetric encryption, along with a public key. This key pair is used for securing the connection between a client and a server. The private key, as the name suggests, is meant to be kept private and secure, known only to the owner. The public key, on the other hand, is shared freely and is used by others to encrypt messages that only the private key can decrypt.
On macOS, we can use ssh-keygen
utility to generate a new key pair. You can use -b
to set the number of bits and increase key complexity (default is 3072). Use -C
to set comments, makes it easy to identify which keys are for what.
When creating a private key, enter a passphrase with a high level of entropy (something long)!
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/wwirth/.ssh/id_rsa):
/Users/wwirth/.ssh/id_rsa already exists.
Enter passphrase (empty for no passphrase): <- DO NOT LEAVE EMPTY
Enter same passphrase again:
Your identification has been saved in /Users/wwirth/.ssh/id_rsa
Your public key has been saved in /Users/wwirth/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:bfdZjvF/K0/0pNhAg+LBhMXXtl7+3UG4sGukSZYMUgQ [email protected]
The key's randomart image is:
+---[RSA 3072]----+
| Eo=o . |
| .+. ..o |
| . +...o.. |
| . o + o.o.. |
| . S +.=o+.o|
| * +.*o@.|
| o + o *oB|
| o o .. *|
| . oo+|
+----[SHA256]-----+
(The randomart is an easy way for humans to validate keys)
ls ~/.shh
id_rsa.pub <- public key
id_rsa <- private key
Treat your private key (~/.ssh/id_rsa
) as a password. Never share your private key or store it on a shared system (e.g. a HPC). Never create a private key without a passphrase. Hackers will look in ~/.ssh/
for unencrypted private keys.
The anatomy of a public SSH key:
The SSH server uses these keys in a challenge-response protocol to verify the authenticity of a user:
Add your public SSH key to the server to allow private-public key authentication. This command sends the public key to the server where it is append to the .ssh/authorized_keys
files. The next time you ssh
into the server this key will be used for authentication. Use the ssh-copy-id
command to copy your public key to the server:
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
If ssh-copy-id
is unavailable you can manually copy the keys with this command:
cat ~/.ssh/id_rsa.pub | ssh [email protected] 'cat >> .ssh/authorized_keys'
Note: you may need to set the permission of .ssh
and .ssh/authorized_keys
e.g. chmod 700 .ssh && chmod 600 .ssh/authorized_keys
.
For more information see the offical university guide.
Entering a long passphrase every time you connect to the server is laborious. This may become a security issue if users get tired of entering the long passphrase and reduce it’s complexity or worse remove it completely. On macOS you can use the keychain to store ssh passphrases thus allowing passwordless-passphraseless login.
Store passphrase in the Keychain:
ssh-add --apple-use-keychain ~/.ssh/id_rsa
Configure SSH-agent to always use the Keychain. Append the following to ~/.ssh/config
:
Host *
AddKeysToAgent yes
UseKeychain yes
IdentityFile ~/.ssh/id_rsa
You can use VScode to access remote servers with the Remote - SSH extension. Install it using the extensions marketplace build into VScode (command+shift+X).
More details in this blog.
RStudio Server enables you to run the RStudio IDE on a Linux server, accessed from your web browser. You can use singularity
to install RStudio server on a powerful remote server, thus allowing interactive analysis with large compute resources.
Container platforms (docker, singularity, etc.) allow you to create and run containers that package up pieces of software in a way that is portable and reproducible.
Why use containers?
Singularity is the preferred container platform for HPC clusters as each container is only a single file, users don’t need root access to run the containers, and containers can be managed by users. Here we will use a container to easily install RStudio without sudo.
Start by ssh
ing to the remote server (hint use VScode remote extension) and create a containers
directory.
mkdir -p $HOME/containers/rstudio/run && mkdir -p $HOME/containers/rstudio/var-lib-rstudio-server
Create a database config file containers/rstudio/database.conf
add add the following:
provider=sqlite
directory=/var/lib/rstudio-server
Download the latest tidyverse container and save it in singularity image format (sif) at $HOME/containers/rstudio/tidyverse_latest.sif
:
singularity pull $HOME/containers/rstudio/tidyverse_latest.sif docker://rocker/tidyverse:latest
Add the rstudio executable to you $PATH
:
cp bin/rstudio $HOME/.local/bin/ && chmod u+x $HOME/.local/bin/rstudio
Start the RStudio server:
rstudio