How do I get my data? – PNCC Help Center

PNCC collects data and distributes it to users via the EMSL Computing facility (https://www.emsl.pnl.gov) at Pacific Northwest National Laboratory in Richland, Washington (PNNL). The EMSL computing facility features parallel HPC systems, petabytes of data storage, and dedicated high-speed data transfer capability. We offer numerous routes for you to access your PNCC data.

The most instant and direct way to access your data for processing is by having login access to PNCC Compute where the data is staged and processed on PNCC’s Boreal Supercomputer.
The fastest transfer option to your institution is Globus. This is the preferred method because you can get near real-time access to your data as it is collected. Globus also provides a single point of access for you to see old and new data using a single endpoint rather than needing separate download links for each dataset. However, not all institutions permit inbound Globus transfers. PNCC now requires each proposal participant list their Globus ID on their PNCC Portal login. For more on how to add your Globus ID visit this FAQ. If you data is older than 30 days you can find additional instructions at the FAQ: I have my Globus ID added to the PNCC portal but I do not see my data listed. What do I do?
A secondary option to download PNCC data is to use the ARIA2 parallel download manager. For every dataset collected, an autogenerated email will be sent to members of the project team specifying that data is available for download. However, this option is only available after a microscope session is complete and the full dataset has been uploaded to the archive. Installation of ARIA2 is straightforward with options to compile from source or direct download of binaries for OS X and Windows. Once installed, the use of ARIA2 is very easy. At the bottom of each PNCC autogenerated dataset email is a link for specific use with ARIA2 which points to the particular dataset. Simply copy this link and launch it with the ARIA2 command line to initiate the transfer. Similar to Globus, if your local connection is intermittent or the download gets halted, ARIA2 will automatically resume after disruption. ARIA2 parallelized download options are particularly helpful for big files or when data transfer rates are slow between your institution and the PNCC archive.
Another option is via the PNCC User Portal . To access your datasets, login to the PNCC User Portal then click the “Get Data” tile or click the MyEMSL data download link provided in the autogenerated email mentioned above. Note: The portal login gives you a token needed to see the data and the token can time out after a while. If you have problems seeing the data, try logging in to the portal on another browser page then refreshing the data portal page.
A final option to download PNCC data is to use MinIO. This option uses rclone as a client and the MinIO (S3 API) to provide users an alternate route of accessing their data. This option is typically used in cases when no other option is viable. Users wanting to test this option should contact our Data Team so that we can issue an access key and provide any additional help needed.

If you have any issues or difficulties, please contact our Data Team directly by email. None of these tools will perform well if you are distant from the United States Pacific Northwest and have not addressed the tuning issues described in "Configure and check your machines".

Related to

Related articles