PNCC processes data and distributes it to users via the EMSL Computing facility (https://www.emsl.pnl.gov) at Pacific Northwest National Laboratory in Richland, Washington (PNNL). The EMSL computing facility features parallel HPC systems, petabytes of data storage, and dedicated high-speed data transfer capability. We offer five routes for you to access your PNCC data.
- The most instant, direct way to get your data is by having login access to PNCC Compute where the data is staged and processed on PNCC’s Boreal Supercomputer.
- The fastest transfer option is using Globus. This is the preferred method because you can get near real-time access to the data as it streams off the microscope and detector. Globus also provides a single point of access for you to see old and new data using a single endpoint rather than needing separate download links for each dataset. However, not all institutions permit Globus transfers inbound. PNCC now requires each proposal to list the Globus ID for at least one team member at time of proposal submission. Once your project is approved please reach out to our Data Team to do a test transfer via a shared endpoint to confirm if Globus works for you and to establish expected transfer speeds to your local storage. Setting up Globus access is pretty straightforward and is completely free for academics and non-profits. To get the ID go to https://www.globusid.org/create where you can register for a Globus basic account if your institution doesn’t already have an institutional license.
- The second fastest transfer option to download PNCC data is to use the MinIO. We are currently testing MinIO (S3 API) to provide users an alternate route of accessing their data. There are many S3 compatible clients, but since this option is still in testing, we have not yet integrated this with the PNCC user account system. Users wanting to test this option should contact our Data Team so that we can issue an access key and provide any additional help needed.
- The third fastest transfer option to download PNCC data is to use the ARIA2 parallel download manager. For every dataset collected an autogenerated email will be sent to members of the team to notify that data is available for download. However, this option is only available after a microscope session is complete and the full dataset has been uploaded to the archive. Installation of ARIA2 is straightforward with options to compile from source or direct download of binaries for OS X and Windows. Once installed, the use of ARIA2 is very easy. At the bottom of each PNCC autogenerated dataset email is a link for specific use with ARIA2 which points to the particular dataset. Simply copy this link and launch it with the ARIA2 command line to initiate the transfer. Similar to Globus, if your local connection is intermittent or the download gets halted, ARIA2 will automatically resume after disruption. ARIA2 parallelized download options are particularly helpful for big files or when data transfer rates are slow between your institution and the PNCC archive.
- The slowest but most accessible transfer option is via the PNCC User Portal “Get Data.” To access the datasets, login to the PNCC User Portal then click the “Get Data” tile or click the MyEMSL data download link provided in the autogenerated email mentioned above. Note: The portal login gives you a token needed to see the data and the token can time out after a while. If you have problems seeing the data, try logging in to the portal on another browser page then refreshing the data portal page.
If you have any issues or difficulties, please contact our Data Team directly by email. None of these tools will perform well if you are distant from the United States Pacific Northwest and have not addressed the tuning issues described in "Configure and check your machines".