RTN-032: Panda/Rucio Multi-site Configuration

  • Wei Yang

Latest Revision: 2025-10-01

Note

Branch: main

Overview.

This document is intended for Rubin Data Facility (DF) and Data Access Center (DAC) administrators. It provides technical specification for CEs (Grid Computer Element) and RSEs (Rucio Storage Elelment) that will be used at Rubin DFs and DACs. This document also provides instruction on registering these resources to the Rubin CRIC (Computing Resource Information Catalogue).

The CE and RSE information in CRIC will be used by Rubin’s Panda workflow management system and Rucio distributed data management system for data production. In the future, it is possible that their usage scope will go beyond the Rubin Data Release Production (DRP).

1 Authentication and Authorization Mechanism in DRP

Rubin Data Release Production will use X509 and VOMS for authentication and authorization. The Virtual Organization (VO) name for Rubin is lsst. Administrators can register themselves at the Rubin VOMS server, subject to approval (a X509 certificate needs to be loaded in the web browser in order to access the VOMS server URL)

Rubin will use the following VO attributes for Panda job execution and data movement.

  • /lsst: for Rucio download

  • /lsst/Role=pilot: for Panda (production) job submission and Rucio upload, download and deletion

  • /lsst/Role=ddmopr: for Rucio data transfer, upload, download and deletion

2 Specification of Computing Element (CE)

(This is for Data Facility (DF) only. There is currently no plan to require a CE at DACs.)

Rubin recommends their DFs to use a ARC-CE version 6 as the gateway to their local batch systems. Rubin’s workflow management system, Panda will submit jobs to the ARC-CE via its REST interface. Rubin ARC-CEs should be configured to support the VO attributes listed in the above Authz section.

In addition, the following are also needed on a ARC-CE host in order for it to function:

  1. /etc/grid-security/certificates (or another location defined by Unix environment variable X509_CERT_DIR).

  2. /etc/grid-security/vomsdir (or another location defined by Unix environment variable X509_VOMS_DIR).

  3. A client tools/library to submit jobs to DF’s local batch system.

  4. Enable ENV/PROXY: this ARC CE Run Time Environment (RTE) creates a delegated x509 proxy and makes it available to the corresponding batch job via Unix environment variable $X509_USER_PROXY. To enable this RTE, run command arcctl rte enable ENV/PROXY.

In Rubin, we agreed that Panda/Harvester needs to expicitly request ENV/PROXY when submitting jobs. In HTCondor job submission (not to confused with HTCondor-CE) using grid_resource = arc …”, this means adding arc_resources = <RuntimeEnvironment>ENV/PROXY</RuntimeEnvironment> to the job description.

HTCondor-CE version 8 and early releases of version of 9 are also supported. But HTCondor-CE will likely drop support of GSI/X509 authentication in version 9.3 and beyond.

There is currently no plan to require a CE at DACs.

3 Requirement on batch nodes

The following are required on batch nodes:

  1. x86_64 CentOS 7 or equivalent

  2. Outbound TCP connection. NAT is acceptable

  3. /cvmfs/sw.lsst.eu

  4. valid /etc/grid-security/certificates and /etc/grid-security/vomsdir

So far there is no requirement for the number of core and RAM per core (This may change in the future). The following are recommended for batch nodes:

  • 4GB+ local scratch space per core

  • Singularity container.

4 Specification of Rucio Storage Element (RSE)

Rubin will use the Thirty Party Copy (TPC) mechanism developed by the LHC/WLCG community, in particular, the HTTP TPC (xrootd TPC is acceptable if necessary) to move data. The WLCG TPC supports dCache, EOS, DPM, Xrootd, s3, and Posix storage systems.

If you are using a dCache system, EOS and DPM, the HTTP TPC and xrootd TPC support are built in to those systems.

4.1 Xrootd installation

In most cases, a standalone Xrootd installation is sufficient. Open Science Grid (OSG) provides instructions on `how to install Xrootd on EL 9 system<https://osg-htc.org/docs/data/xrootd/install-standalone/>`_. After finishing the Installing Xrootd section, follow the instructions in the next section (in this document) to configure Xrootd.

This will install: * various Xrootd rpms * several voms relate rpms * /etc/grid-security/certificates and /etc/grid-security/vomsdir. Note that the info in /etc/grid-security/vomsdir/lsstis out-of-date. * /etc/vomses (only used by client). The lines referring to “lsst” is out-of-date.

For the above “out-of-date”, please refer to the Rubin VOMS server configuration page. for up-to-date info. If you can not see the above URL, ask Data Facility team or iDAC coordination team for help. The following is the current info (as of 2025-10-01) on the VOMS configuration page.

Two lines in /etc/vomses:

“lsst” “voms.slac.stanford.edu” “15003” “/DC=org/DC=incommon/C=US/ST=California/O=Stanford University/CN=voms.slac.stanford.edu” “lsst”

“lsst” “voms.hec.lancs.ac.uk” “15003” “/C=UK/O=eScience/OU=Lancaster/L=Physics/CN=voms.hec.lancs.ac.uk” “lsst”

Two .lsc files in /etc/grid-security/vomsdir/lsst, two lines per file

  1. voms.slac.stanford.edu.lsc

/DC=org/DC=incommon/C=US/ST=California/O=Stanford University/CN=voms.slac.stanford.edu

/C=US/O=Internet2/CN=InCommon RSA IGTF Server CA 3

  1. voms.hec.lancs.ac.uk.lsc

/C=UK/O=eScience/OU=Lancaster/L=Physics/CN=voms.hec.lancs.ac.uk

/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B

4.2 Xrootd confgiruation

Xrootd storage (including Xrootd on shared Posix file system such as Lustre and GPFS, and non-local Xrootd storage) and s3 storage. Example configuration can be found at the Xrootd HOW-To page.

Rucio and FTS will manage the data transfer among RSEs, and use VOMS attribute from the ‘lsst’ VO to authorize access to RSEs, as described in the above Authz section. This corresponds to the following lines in the Xrootd authorization file (usually /etc/xrootd/auth_file):

= lsstddmopr o: lsst g: /lsst r: ddmopr

x lsstddmopr /dir rwildn

o lsst /dir rl

In the future, we may also ask storage systems to provide periodic dumps (list of files) to discover dark and missing data.

5 Site Validation

The following describes a set of simple validation to check site environment. It is not a replacement of validation that is proposed to the USDF execution team, using real BPS submittion.

A simple validation will use an ARC CE client or HTCondor to submit a job to the WS interface (not GridFTP interface) of a site’s ARC CE, and check the following

  1. CE job submission.

  2. Availablity of /cvmfs/sw.lsst.eu

  3. Outbound TCP connection. This can be as simple as a ping to a USDF DTN node.

  4. OS version and availability of software such as Singularity, client tools to (object) storage.

  5. Available posix storage layout (df -h)

  6. Pointers to local scrach space, object store, Grid infrastructure (CAs, vomsdir), DBs, secrets.

Detail ARC CE xRSL job (or HTCondor job) example is available at … (under construction)