Skip to main content

AMI Setup Guide

This guide walks you through setting up a fresh Amazon Machine Image (AMI) environment based on the provided setup script.

This AMI is designed to run on a preconfigured AWS ParallelCluster CloudFormation setup here, is AWS's detailed guide for setup.


Prerequisites

  • Amazon Linux AMI instance (m5.xlarge is suggested).
  • Internet access
  • Sufficient permissions to install packages and configure the system
  • Select parallelcluster VPC setup.
  • Select Key pair (login) for created AWS user.
  • Configure storage suggested: 30 GiB.

Step-by-Step Setup Instructions

These steps can be run by downloading or copying the headnode-startup.sh script, or they can be done independently by following each step below.

1. Environment Variable Setup

Define and persist key environment variables:

echo "export DATA=/common" >> $HOME/.bashrc
echo "export DATAHUB=/common" >> $HOME/.bashrc
echo "export WORK=/common" >> $HOME/.bashrc
echo "export APPS=$HOME/code" >> $HOME/.bashrc
echo "export USERS=/home" >> $HOME/.bashrc

mkdir -p $HOME/code

Reload your .bashrc or re-login for changes to take effect after install complete.

2. Install Miniconda & AWS CLI

Install Miniconda for Python and use Conda to install AWS CLI:

cd $HOME
sudo yum install -y bzip2 wget
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/miniconda

export PATH="$HOME/miniconda/bin:$PATH"
echo "export PATH=\$HOME/miniconda/bin:\$PATH" >> $HOME/.bashrc

conda install -c conda-forge -y awscli
rm Miniconda3-latest-Linux-x86_64.sh

Verify installations:

which python
python --version
which pip
aws --version

3. Install Python Packages

Install required Python libraries:

python -m pip install pyfiglet requests boto3 vcfpy argparse biopython

4. Install Amazon Corretto 17 (Java)

Install Amazon's JDK for Java applications:

wget -O corretto-17.rpm https://corretto.aws/downloads/latest/amazon-corretto-17-x64-linux-jdk.rpm
sudo yum localinstall -y corretto-17.rpm
rm corretto-17.rpm

JAVA_PATH=$(dirname $(dirname $(readlink -f $(which java))))
export JAVA_HOME=$JAVA_PATH
export PATH=$JAVA_HOME/bin:$PATH

echo "export JAVA_HOME=$JAVA_HOME" >> $HOME/.bashrc
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> $HOME/.bashrc

java -version

5. System Update and Install Software

Detect OS and install necessary packages:

sudo yum install -y vim git docker
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER

6. AWS CLI Configuration

Configure AWS CLI:

aws configure

Test AWS connection:

aws s3 ls

7. UCGD Configuration

Create a location for the needed master config file:

mkdir -p /home/ec2-user/.config/master
touch /home/ec2-user/.config/master/master.config

You will need to update this master.config file with your Mosaic login credentials.

8. Install Nextflow and Plugins

Install Nextflow and its AWS plugin:

curl -s https://get.nextflow.io | bash
sudo mv nextflow $APPS
sudo chmod +x $APPS/nextflow

echo "export PATH=$APPS:\$PATH" >> $HOME/.bashrc

$APPS/nextflow plugin install nf-amazon

9. Clone UCGD Repositories

Clone the required repositories with confirmation prompts:

# Clone UCGD-Cloud
https://github.com/UCGD/UCGD-Cloud.git

# Clone ucgd_utils
https://github.com/UCGD/ucgd_utils.git

Repos will be cloned to $APPS ($HOME/code).

Make scripts executable and globally available:

# UCGD-Cloud scripts
sudo chmod -R +x $APPS/UCGD-Cloud/src/*
sudo chmod -R +x $APPS/UCGD-Cloud/cloud/*.sh
sudo ln -s $APPS/UCGD-Cloud/src/*.py /usr/bin/

# Key shell scripts
ln -sf $APPS/UCGD-Cloud/cloud/calypso.sh
ln -sf $APPS/UCGD-Cloud/cloud/docker-login.sh
ln -sf $APPS/UCGD-Cloud/cloud/variant-caller.sh

# UCGD-Utils
sudo chmod -R +x /home/ec2-user/code/ucgd_utils/perl/script/*
sudo ln -sf /home/ec2-user/code/ucgd_utils/perl/script/generate_shards.sh /usr/bin/
sudo ln -sf /home/ec2-user/code/ucgd_utils/bin/all_prep.pl /usr/bin/

11. Docker Login Script Setup

Append Docker login to shell startup:

echo "bash /home/ec2-user/docker-login.sh" >> ~/.bash_profile

Final Checks

  • Reload the shell session or source your .bashrc and .bash_profile.
  • Test Nextflow, AWS CLI, Docker, and Python tools.

Notes

  • Ensure you have added the Mosaic API credentials manually.
  • Use sudo carefully when managing permissions.
  • AWS credentials must be configured with correct IAM policies.
  • Ensure Docker permissions (re-login after adding to Docker group).
  • This completed image will be used by pcluster create-image to build the needed pcluster environment.