Question & Answer
Question
Answer
Description
The National Center for Biotechnology Information (NCBI) offers a wealth of databases analysis tools and reports for use in research by the medical and scientific community.
These resources are freely available to download from the NCBI website. Because of the large sizes of most of the datasets (on the level of gigabytes or terabytes) the recommended method of transfer is with the Aspera Connect browser plugin.
You can use Aspera Connect directly through the NCBI website on your browser by clicking and downloading the datasets of your choice.
Alternatively you can also choose to download data from NCBI through the command line with ascp
Asperas transfer tool which comes bundled with your Connect installation.
Usage
The general syntax for downloading data from NCBI is the following:
/path/to/ascp -T -k 1 -i path/to/private/key anonftp@ftp.ncbi.nlm.nih.gov:/path/to/data /local/location
The components of the command can be broken down as follows:
/path/to/ascp
You will need to specify the full path to theascp
program a reference for which can be found in the next section.-k 1
If the transfer stops because of connection loss or other issues thek
option tells the transfer to resume from where it left off rather than restarting the entire transfer over. This is important because of the large size of most NCBI data. The1
specifies that a sparse checksum will be performed before resuming a transfer which is the best choice for NCBI data because a full checksum on large files may be slow. For more information on the resume transfer option see this Knowledge Base article.-T
This option tells the server not to encrypt the transfer as NCBIs download server doesnt offer encryption.-i /path/to/private/key
This is an option which specifies the path to the private key used to authenticate this transfer. Ensure that you specify the FULL path to the key (in other words~/path/to/key
or similar shortcuts will not work).anonftp
is the transfer user configured on NCBIs Aspera server.ftp.ncbi.nlm.nih.gov
is the hostname of NCBIs Aspera server./path/to/data
is the path to the data you are downloading. You can find a reference of these paths here./local/location
is the path to the folder on your own machine that you want the NCBI files to be downloaded to.
Private key and ascp locations
The private key you will use is asperaweb_id_dsa.openssh
which comes with your Connect installation.
Below are locations where you can generally find the private key and the ascp
executable. Where applicable replace usernamewith the name of the user you're logged in as.
Mac
Private key
- Local installation of connect -
/Users/username/Applications/Aspera\ Connect.app/Contents/Resources/asperaweb_id_dsa.openssh
- System wide installation of Connect -
/Applications/Aspera\ Connect.app/Contents/Resources/asperaweb_id_dsa.openssh
ascp
- Local installation of connect -
/Users/username/Applications/Aspera\ Connect.app/Contents/Resources/ascp
- System wide installation of Connect -
/Applications/Aspera\ Connect.app/Contents/Resources/ascp
Linux
Private key
/home/username/.aspera/connect/etc/asperaweb_id_dsa.openssh
/opt/aspera/etc/asperaweb_id_dsa.openssh
ascp
/opt/aspera/bin/ascp
Windows
Private key
"C:\Program Files (x86)\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh"
C:\Users\username\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh
ascp
C:\Program Files\Aspera\Aspera Connect\bin\ascp.exe
C:\Users\username\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe
Examples
The following examples demonstrate usage of ascp
to download real data from NCBI. Commands for Mac Linux and Windows will be shown with the assumption that we are downloading from a user account on the system named janedoe and downloaded data will go to the folder NCBI_data
in janedoes home directory. The path locations of the datasets are shown on NCBI's public download directory.
1. Say you need to download all the data NCBI offers on epigenomics. There is a 223.79 GB sized folder on the topic containing 5 subfolders worth of data. In order to download the entire folder via ascp
you would use the following command:
On a Mac:
$ /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/ascp -T -k 1 -i /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/epigenomics /Users/janedoe/NCBI_data
On a Windows:
> C:\Users\aspera\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe -T -k 1 -i C:\Users\janedoe\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/epigenomics C:\Users\janedoe\NCBI_data"
On Linux:
# /opt/aspera/bin/ascp -T -k1 -i /home/janedoe/.aspera/connect/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/epigenomics /home/janedoe/NCBI_data
2. Perhaps you are conducting a study on tree-dwelling lizards and want to examine the genome data NCBI offers for the Anolis carolinensis species. To download the genome data for this species you would use the following command:
On a Mac:
$ /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/ascp -T -k 1 -i /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/genomes/anolis_carolinensis /Users/janedoe/NCBI_data
On a Windows:
> C:\Users\aspera\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe -T -k 1 -i C:\Users\janedoe\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/genomes/anolis_carolinensis C:\Users\janedoe\NCBI_data"
On Linux:
# /opt/aspera/bin/ascp -T -k 1 -i /home/janedoe/.aspera/connect/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/genomes/anolis_carolinensis /home/janedoe/NCBI_data
3. As part of a research paper youre writing you need to look at NCBIs RefSeq project data concerning protein and RNA sequencing data in humans. You know there is 1.69 GB worth of available data on NCBI and you proceed to download it with the following command:
On a Mac:
$ /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/ascp -T -k 1 -i /Users/janedoe/Applications/Aspera\ Connect.app/Contents/Resources/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/refseq/H_sapiens/mRNA_prot /Users/janedoe/NCBI_data
On a Windows:
> C:\Users\aspera\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe -T -k 1 -i C:\Users\janedoe\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/refseq/H_sapiens/mRNA_prot C:\Users\janedoe\NCBI_data"
On Linux:
# /opt/aspera/bin/ascp -T -k 1 -i /home/janedoe/.aspera/connect/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/refseq/H_sapiens/mRNA_prot /home/janedoe/NCBI_data
4. Another Windows example an actual command line and the file download status:
Was this topic helpful?
Document Information
Modified date:
20 February 2022
UID
ibm10746935