Note
Branch: main
Overview.
This document is intended for Rubin Data Facility (DF) and Data Access Center (DAC) administrators. It provides technical specification for CEs (Grid Computer Element) and RSEs (Rucio Storage Elelment) that will be used at Rubin DFs and DACs. This document also provides instruction on registering these resources to the Rubin CRIC (Computing Resource Information Catalogue).
The CE and RSE information in CRIC will be used by Rubin’s Panda workflow management system and Rucio distributed data management system for data production. In the future, it is possible that their usage scope will go beyond the Rubin Data Release Production (DRP).
2 Specification of Computing Element (CE)¶
(This is for Data Facility (DF) only. There is currently no plan to require a CE at DACs.)
Rubin recommends their DFs to use a ARC-CE version 6 as the gateway to their local batch systems. Rubin’s workflow management system, Panda will submit jobs to the ARC-CE via its REST interface. Rubin ARC-CEs should be configured to support the VO attributes listed in the above Authz section.
In addition, the following are also needed on a ARC-CE host in order for it to function:
/etc/grid-security/certificates (or another location defined by Unix environment variable X509_CERT_DIR).
/etc/grid-security/vomsdir (or another location defined by Unix environment variable X509_VOMS_DIR).
A client tools/library to submit jobs to DF’s local batch system.
Enable ENV/PROXY: this ARC CE Run Time Environment (RTE) creates a delegated x509 proxy and makes it available to the corresponding batch job via Unix environment variable $X509_USER_PROXY. To enable this RTE, run command arcctl rte enable ENV/PROXY.
In Rubin, we agreed that Panda/Harvester needs to expicitly request ENV/PROXY when submitting jobs. In HTCondor job submission (not to confused with HTCondor-CE) using grid_resource = arc …”, this means adding arc_resources = <RuntimeEnvironment>ENV/PROXY</RuntimeEnvironment> to the job description.
HTCondor-CE version 8 and early releases of version of 9 are also supported. But HTCondor-CE will likely drop support of GSI/X509 authentication in version 9.3 and beyond.
There is currently no plan to require a CE at DACs.
3 Requirement on batch nodes¶
The following are required on batch nodes:
x86_64 CentOS 7 or equivalent
Outbound TCP connection. NAT is acceptable
/cvmfs/sw.lsst.eu
valid /etc/grid-security/certificates and /etc/grid-security/vomsdir
So far there is no requirement for the number of core and RAM per core (This may change in the future). The following are recommended for batch nodes:
4GB+ local scratch space per core
Singularity container.
4 Specification of Rucio Storage Element (RSE)¶
Rubin will use the Thirty Party Copy (TPC) mechanism developed by the LHC/WLCG community, in particular, the HTTP TPC (xrootd TPC is acceptable if necessary) to move data. The WLCG TPC supports dCache, EOS, DPM, Xrootd, s3, and Posix storage systems.
If you are using a dCache system, EOS and DPM, the HTTP TPC and xrootd TPC support are built in to those systems.
4.1 Xrootd installation¶
In most cases, a standalone Xrootd installation is sufficient. Open Science Grid (OSG) provides instructions on `how to install Xrootd on EL 9 system<https://osg-htc.org/docs/data/xrootd/install-standalone/>`_. After finishing the Installing Xrootd section, follow the instructions in the next section (in this document) to configure Xrootd.
This will install: * various Xrootd rpms * several voms relate rpms * /etc/grid-security/certificates and /etc/grid-security/vomsdir. Note that the info in /etc/grid-security/vomsdir/lsstis out-of-date. * /etc/vomses (only used by client). The lines referring to “lsst” is out-of-date.
For the above “out-of-date”, please refer to the Rubin VOMS server configuration page. for up-to-date info. If you can not see the above URL, ask Data Facility team or iDAC coordination team for help. The following is the current info (as of 2025-10-01) on the VOMS configuration page.
Two lines in /etc/vomses:
“lsst” “voms.slac.stanford.edu” “15003” “/DC=org/DC=incommon/C=US/ST=California/O=Stanford University/CN=voms.slac.stanford.edu” “lsst”
“lsst” “voms.hec.lancs.ac.uk” “15003” “/C=UK/O=eScience/OU=Lancaster/L=Physics/CN=voms.hec.lancs.ac.uk” “lsst”
Two .lsc files in /etc/grid-security/vomsdir/lsst, two lines per file
voms.slac.stanford.edu.lsc
/DC=org/DC=incommon/C=US/ST=California/O=Stanford University/CN=voms.slac.stanford.edu
/C=US/O=Internet2/CN=InCommon RSA IGTF Server CA 3
voms.hec.lancs.ac.uk.lsc
/C=UK/O=eScience/OU=Lancaster/L=Physics/CN=voms.hec.lancs.ac.uk
/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B
4.2 Xrootd confgiruation¶
Xrootd storage (including Xrootd on shared Posix file system such as Lustre and GPFS, and non-local Xrootd storage) and s3 storage. Example configuration can be found at the Xrootd HOW-To page.
Rucio and FTS will manage the data transfer among RSEs, and use VOMS attribute from the ‘lsst’ VO to authorize access to RSEs, as described in the above Authz section. This corresponds to the following lines in the Xrootd authorization file (usually /etc/xrootd/auth_file):
= lsstddmopr o: lsst g: /lsst r: ddmopr
x lsstddmopr /dir rwildn
o lsst /dir rl
In the future, we may also ask storage systems to provide periodic dumps (list of files) to discover dark and missing data.
5 Site Validation¶
The following describes a set of simple validation to check site environment. It is not a replacement of validation that is proposed to the USDF execution team, using real BPS submittion.
A simple validation will use an ARC CE client or HTCondor to submit a job to the WS interface (not GridFTP interface) of a site’s ARC CE, and check the following
CE job submission.
Availablity of /cvmfs/sw.lsst.eu
Outbound TCP connection. This can be as simple as a ping to a USDF DTN node.
OS version and availability of software such as Singularity, client tools to (object) storage.
Available posix storage layout (df -h)
Pointers to local scrach space, object store, Grid infrastructure (CAs, vomsdir), DBs, secrets.
Detail ARC CE xRSL job (or HTCondor job) example is available at … (under construction)