RTN-057: L3 - Ready for "DRP-like" Processing

  • Richard Dubois

Latest Revision: 2023-04-10

Note

This technote is a work-in-progress.

1 Abstract

Description of the L3 milestone for early DRP type processing.

2 Introduction

This document serves as a description of the milestone USDF ready for DP1 processing as well as a report on testing to achieve the milestone. The operations milestone ticket for this is PREOPS-1667. Other relevant documents include:

The description is in Section 3 while the test status is in Section 4.

3 Milestone Description

Annual Data Release Processing is planned to be a multi-site activity with the facilities in France, the UK and the US, wherein all data taken to date are reprocessed withe the latest algorithms and calibrations. It is anticipated that the Rucio and FTS tools will be used for data distribution and PanDA for distributed workflow. All three servers should be in production at the USDF. A key goal of this work is to demonstrate processing at a sufficient rate

Based on DMTN-213, here is a description of multi-site DRP processing. We assume that a local source Butler and registry have been set up at each Data Facility (DF) with a skymap and reference catalogs, but not necessarily with calibration products. We also assume that each DF will handle processing for a contiguous subset of skymap patches that are unique to that DF.

  1. Calibration products are replicated to each DF and ingested into the local source Butler by the Replica Monitor.

  2. Campaign Management tooling (CMt) determines which raw visit data should be copied to each DF to support coadd generation (nominally step 3) and later processing and to minimize downstream data movement. We assume individual visits will not be subdivided between DFs, i.e., DFs will only consider whole visits.

  3. CMt generates the yaml files for all steps, specific to each DF, and distributes these yaml files to the DFs.

  4. Raw data are replicated as necessary to FrDF and UKDF, and the Replica Monitor ingests raw data into the local source Butler at each DF.

  5. For each step in the DRP pipeline:

  1. Needed data for the current step are replicated from USDF to the remote DF.

  2. BPS generates QuantumGraphs (QGs) and Execution Butlers (EBs) from the yaml files at each DF, and relevant information from the QGs & EBs are transferred to the USDF to maintain a global view of the processing.

  3. PanDA/BPS submits the workflows from USDF to the corresponding DFs.

  4. When the workflows finish, they merge the EB into the local source Butler. The relevant datasets are also registered with Rucio.

  5. Rucio replicates appropriate subsets of the remote data back to USDF, where they are ingested into the USDF Butler.

Notes:

  • Aside from the final, non-temporary data products that will be replicated to USDF, the only other data that needs to be transferred will be the PVIs that are needed for coadd generation at a given DF. These PVIs will be in the “overlap regions” of their corresponding visit and the sky patches that are assigned to the neighboring regions assigned to the other DFs. Since PVIs are “final” data products, they will all be copied to USDF and can be transferred to remote DFs from USDF as needed.

  • Following section 5.2 of DMTN-213, we assume that QG and EB generation will be local to the specific DFs. This avoids having to register temporaries, like the warps, with the USDF Butler, as would be needed for BPS to generate the QGs/EBs there.

  • The above sequence assumes we will be using the DC2 test-med-1 dataset, which does not include global calibration at the tail end of single frame processing. The global calibration steps are part of HSC processing (see steps 2b-e in the HSC DRP-Prod pipelne), and (I think) they require all of the visit-level catalog data to be processed at a single site, i.e., at USDF, before proceeding with steps 3 and later. Should we consider an HSC data set instead for these tests?

  • We need a mechanism for tracking the BPS yaml files generated by CMt and distributed to the DFs.

3.1 Needed functionality of key components

3.1.1 Data Replication

  1. Dataset registration with Rucio triggers transfers to the the designated sites.

  2. The local Replica Monitor instance detects the incoming data and registers them with the local butler.

3.1.2 Campaign Management tooling (?)

  1. Determines which PVIs and visits will be processed at each DF given a skymap partitioning, or some other criteria, among the DFs.

  2. Generates the BPS yaml files for all DRP steps at all DFs, given a data partitioning.

  3. Tracks BPS yaml configuration and generation.

  4. Runs BPS at remote DFs to generate QGs/EGs.

4 Tests and Test Status

4.1 Distribution of raw input data to all sites

4.2 Distribution of processing graphs

4.3 Registration of output products

4.4 Distribution of output products

4.5 Demonstrate processing at a sufficient rate

See the reStructuredText Style Guide to learn how to create sections, links, images, tables, equations, and more.