Calypso Re-Annotation Pipeline Guide
This guide provides step-by-step instructions for setting up and running the UCGD Calypso re-annotation pipeline, or to run Nextflow as needed independent of the created bash script. Each input parameter for Nextflow execution is explained in detail.
Prerequisites
- Access to AWS S3 buckets.
- Access to the Mosaic server.
- Basic understanding of genomic references (GRCh38/GRCh37).
- Access to the EC2 environment with necessary permissions.
Step-by-Step Instructions
If you not decide to use the given calypso.sh
script here are explanations of each prompt:
1. Start the Script
Run the script to initiate the setup.
2. Provide Required Inputs
You will be prompted to enter the following inputs interactively:
a. VCF File Path
- Prompt:
Enter S3 path including VCF file:
- Description: Full S3 path where your input VCF file is located.
- Example:
s3://your-bucket/input.vcf
b. Final VCF Path
- Prompt:
Enter S3 path for final VCF files:
- Description: S3 path where the output/final VCF files will be stored.
- Example:
s3://your-bucket/
c. Mosaic Project ID
- Prompt:
Enter Mosaic Project ID:
- Description: Unique project identifier in Mosaic.
- Example:
1685
d. Project Name
- Prompt:
Enter Project name:
- Description: Human-readable name for the project.
- Example:
M1685-Project
e. Mosaic Config Server
- Prompt:
Enter Mosaic config server:
- Description: URL or IP address of the Mosaic server configuration.
- Example:
ucgd-mosaic
f. Genomic Reference
- Prompt:
Enter Genomic Reference to use (GRCh38/GRCh37)
- Description: Choose the genomic reference build.
- Options:
GRCh38
orGRCh37
g. Run as Singleton
- Prompt:
Run as singleton? (true/false)
- Description: Indicates whether to process samples as singletons.
- Options:
true
orfalse
Directory Setup
Work Directory
- Path:
/common/nextflow/$MOSAIC_PROJECT_ID
- Created and permissioned the needed EFS directory for Nextflow execution.
- Path:
Processing Directory
- Named:
[M]$MOSAIC_PROJECT_ID
- Contains the linked Nextflow output.
- Named:
Nextflow Command Construction
The following required parameters are assembled into a Nextflow command, but can be used to run an ad hoc process as needed:
--vcf
: Path to input VCF.--complete
: Path for final VCF output.--mosaic_project_id
: Mosaic project identifier.--mosaic_server
: Mosaic server config file name.--project
: Project name.--genome
: Genomic reference (GRCh38/GRCh37).-params-file
: Pipeline parameters file.--singleton
: Whether processing as singleton.-w
: Work directory.-profile cloud
: Execute using cloud profile.-with-trace
: Enable tracing.-resume
: Resume from existing checkpoints.
Execution Confirmation
- Review the generated Nextflow command.
- Confirm execution by typing
yes
.
If confirmed:
- Logs into Docker.
- Executes Nextflow pipeline.
- Monitors execution and reports success or failure.
If not confirmed:
- Cancels the command execution.
Important Notes
- Permissions: Ensure necessary AWS and EC2 permissions.
- Storage: Validate available storage on EFS and S3.
- Reference Genome: Consistency between input VCF and selected reference.
- EFS: After process completion, you must remember to delete your
/common/nextflow/M[project]
directory, or you will incureAWS EFS
charges.