diff --git a/source/data/.vscode/settings.json b/source/data/.vscode/settings.json deleted file mode 100644 index cd614f18..00000000 --- a/source/data/.vscode/settings.json +++ /dev/null @@ -1,3 +0,0 @@ -{ - "restructuredtext.confPath": "c:\\Users\\u0137480\\Desktop\\VscDocumentation\\source" -} \ No newline at end of file diff --git a/source/data/architecture/general_overview.png b/source/data/architecture/general_overview.png deleted file mode 100644 index a665e1e9..00000000 Binary files a/source/data/architecture/general_overview.png and /dev/null differ diff --git a/source/data/architecture/resource.png b/source/data/architecture/resource.png deleted file mode 100644 index 2b8e2adb..00000000 Binary files a/source/data/architecture/resource.png and /dev/null differ diff --git a/source/data/cadaver/cadaver_access.png b/source/data/cadaver/cadaver_access.png deleted file mode 100644 index 0dea80c5..00000000 Binary files a/source/data/cadaver/cadaver_access.png and /dev/null differ diff --git a/source/data/cadaver_client_access.rst b/source/data/cadaver_client_access.rst deleted file mode 100644 index f213e110..00000000 --- a/source/data/cadaver_client_access.rst +++ /dev/null @@ -1,95 +0,0 @@ -.. _cadaver_client_access: - -Cadaver Client Access -===================== - -Cadaver is a command-line client for WebDAV and is available on UNIX-like operating systems such as Linux (native and Windows WSL) and MacOS. It supports file upload, download, on-screen display, namespace operations (move/copy), collection creation and deletion, and locking operations. After connecting to a WebDAV endpoint a session opens. - -As WebDAV is an extension on HTTP, a WebDAV server supports the basic HTTP request methods such as GET, PUT and DELETE. WebDAV extends these i.a. with MKCOL, MOVE and COPY. It is therefore possible to use cURL to do file and collection operations. Please note Cadaver is more user friendly and wraps all these operations in a Unix-like session. - -Using cURL with iRODS ---------------------- -To use cURL (and cadaver), first go to https://irods.hpc.kuleuven.be and note your user account, temporary (4h) password and the Davrods client url (https://irods.hpc.kuleuven.be:8443). - -Before continuing with cURL (not cadaver) you should install the SSL certificate provided by https://irods.hpc.kuleuven.be:8443. In most browsers certificates can be downloaded by clicking the lock icon next to the url bar and following through to the certificate. On Chrome and Edge on Windows this opens a standard certificate viewer offering a .cer file, on Firefox it's a .pem file. For further instructions converting and installing certificates on Ubuntu, please follow the instructions at https://ubuntu.com/server/docs/security-trust-store. You can skip installing the certificate by providing the -k switch to curl; however this is insecure as it bypasses SSL security. As it is better to use cadaver we only show the most basic capabilities. - -To show an overview of your collections, you can use the following command. It only displays an html response. - -:: - - $ curl https://irods.hpc.kuleuven.be:8443/home/vscXXXXX/ --user vscXXXXX:password - -To create a new Collection 'collection1': - -:: - - $ curl https://irods.hpc.kuleuven.be:8443/home/vscXXXXX/ --user vscXXXXX:password -X MKCOL 'https://irods.hpc.kuleuven.be:8443/home/vscXXXXX/collection1/' - -To upload a file to your 'collection1' Collection: - -:: - - $ curl https://irods.hpc.kuleuven.be:8443/home/vscXXXXX/collection1/ --user vscXXXXX:password -T test.txt - -Using cadaver with iRODS ------------------------- - -Installation on Debian/Ubuntu is as follows: -:: - - $ sudo apt-get update - sudo apt-get install cadaver - -To use cadaver, first go to https://irods.hpc.kuleuven.be and note your user account, temporary (4h) password and the Davrods client url (https://irods.hpc.kuleuven.be:8443). - -Start a cadaver session by executing ``cadaver``. You can also connect to your iRODS root collection in one command as follows: - -:: - - cadaver https://irods.hpc.kuleuven.be:8443/home/vscXXXXX - - -.. image:: cadaver/cadaver_access.png - -If not yet the case, connect to the Tier 1 zone by executing: - -:: - - dav:!> open https://irods.hpc.kuleuven.be:8443 - -The first time you connect, it will warn you with 'Untrusted server certificate presented for irods.hpc.kuleuven.be' and then prompt you to install the certificate. Proceed. - -Then complete your username and password. You are now connected and can perform the WebDAV operations. - -Type ``help`` to discover all commands. - -To create a Collection, you can use either ``mkcol`` or ``mkdir``: - -:: - - dav:!> mkcol cadaver_test - -Now you can upload data objects to this new collection by first specifying the local absolute path to the file, and then the remote relative or absolute path: - -:: - - dav:!> put /home/user/test.txt /home/vscXXXXX/cadaver_test - -To leave the session and close cadaver, type ``exit``. - -It is also possible to run cadaver as a oneliner from the shell by providing it a list of instructions. - -Make a ~/davscript file with the following contents: - -:: - - put /home/user/test.txt /home/vscXXXXX/cadaver_test - exit - -You can also provide the client address next to an ``open`` command inside the script. - -Now you can execute these commands on the fly: - -:: - - $ cadaver -r ~/davscript https://irods.hpc.kuleuven.be:8443 diff --git a/source/data/command_line_clients_index.rst b/source/data/command_line_clients_index.rst deleted file mode 100644 index f56c3984..00000000 --- a/source/data/command_line_clients_index.rst +++ /dev/null @@ -1,10 +0,0 @@ -Command line Access -=================== - -It's possible to access an iRODS Zone via command line software that support the WebDAV protocol. WebDAV stands for Web Distributed Authoring and Versioning, which is an extension to HTTP, the protocol that web browsers and web servers use to communicate with each other. -WebDAV can be used for remotely managing files over the internet. With WebDAV, we can access files stored in the VSC Tier-1 Data component by using the same interface as we do with our local files. - -.. toctree:: - :maxdepth: 3 - - cadaver_client_access \ No newline at end of file diff --git a/source/data/cyberduck/cduck1.png b/source/data/cyberduck/cduck1.png deleted file mode 100644 index dd522540..00000000 Binary files a/source/data/cyberduck/cduck1.png and /dev/null differ diff --git a/source/data/cyberduck/cduck2.png b/source/data/cyberduck/cduck2.png deleted file mode 100644 index 44ee8216..00000000 Binary files a/source/data/cyberduck/cduck2.png and /dev/null differ diff --git a/source/data/cyberduck/cduck3.png b/source/data/cyberduck/cduck3.png deleted file mode 100644 index ee89b4a0..00000000 Binary files a/source/data/cyberduck/cduck3.png and /dev/null differ diff --git a/source/data/cyberduck/cduck4.png b/source/data/cyberduck/cduck4.png deleted file mode 100644 index 10157647..00000000 Binary files a/source/data/cyberduck/cduck4.png and /dev/null differ diff --git a/source/data/cyberduck/cduck5.png b/source/data/cyberduck/cduck5.png deleted file mode 100644 index 436c8ee2..00000000 Binary files a/source/data/cyberduck/cduck5.png and /dev/null differ diff --git a/source/data/cyberduck/cduck6.png b/source/data/cyberduck/cduck6.png deleted file mode 100644 index cb1a8c01..00000000 Binary files a/source/data/cyberduck/cduck6.png and /dev/null differ diff --git a/source/data/cyberduck/cduck7.png b/source/data/cyberduck/cduck7.png deleted file mode 100644 index d4beb588..00000000 Binary files a/source/data/cyberduck/cduck7.png and /dev/null differ diff --git a/source/data/cyberduck/cduck8.png b/source/data/cyberduck/cduck8.png deleted file mode 100644 index b362dde4..00000000 Binary files a/source/data/cyberduck/cduck8.png and /dev/null differ diff --git a/source/data/cyberduck/vsc_Tier1_Data.cyberduckprofile b/source/data/cyberduck/vsc_Tier1_Data.cyberduckprofile deleted file mode 100644 index 1ae4aa1f..00000000 --- a/source/data/cyberduck/vsc_Tier1_Data.cyberduckprofile +++ /dev/null @@ -1,39 +0,0 @@ - - - - - - - Protocol - irods - Vendor - VSC - Description - Flemish Supercomputing Center (VSC) - Hostname Configurable - - Port Configurable - - Default Hostname - irods.hpc.kuleuven.be - Region - kuleuven_tier1_pilot:default - Default Port - 1247 - Authorization - PAM - - diff --git a/source/data/cyberduck_access_irods.rst b/source/data/cyberduck_access_irods.rst deleted file mode 100644 index 6dba2413..00000000 --- a/source/data/cyberduck_access_irods.rst +++ /dev/null @@ -1,65 +0,0 @@ -.. _cyberduck_access_irods.rst: - -Using Cyberduck for Accessing iRODS -=================================== - -Cyberduck is a free cross-platform, high-throughput and parallel data transfer open source file transfer program that supports multiple transfer protocols (FTP, SFTP, WebDAV, Cloud files, Amazon S3, etc.). -This allows users to transfer large files, depending on the user's available bandwidth and network settings. Cyberduck can also be used to rename files and browse other shared or public Data Store locations. - -Installation and first time configuration ------------------------------------------ - -- Visit https://cyberduck.io/ and select the download compatible with your operating system. - -.. image:: cyberduck/cduck1.png - -- Open the Cyberduck.exe file and click "run", install in accordance with your institution's application install policy. - -.. image:: cyberduck/cduck2.png - -- Click the following link to download the VSC-Tier1_Data Cyberduck profile configuration file on your local machine. - -:download:`VSC-Tier1_Data Cyberduck profile `. - -- After you download the profile, save it to your computer. - -.. image:: cyberduck/cduck4.png - -- Double-click on the vsc_Tier1_Data.cyberduckprofile file. - -- This will launch Cyberduck. - -.. image:: cyberduck/cduck5.png - -- Thanks to this file relevant access information is auto-populated as you can see on the screen above. - -- Enter your username and temporary password obtained from https://irods.hpc.kuleuven.be/ following the same procedure you may have done using WinSCP. - -.. note:: You can use Cyberduck for other remote connections to do file transfers etc. Just save your connection as a bookmark. - -Upload from your local computer to iRODS ----------------------------------------- - -.. warning:: When uploading your data to iRODS you should not upload files/folders with names containing spaces (e.g. test1 for iRODS.txt) or name that contain special characters (e.g. ~ ` ! @ # $ % ^ & * ( ) + = { } [ ] | : ; ” ‘ < > , ? /). The command line side will/may typically not tolerate these characters. For long file/folder names the use of underscores (e.g. test1_for_iRODS.txt) is the recommended practice. - -- Double-click the “irods.hpc.kuleuven.be – IRODS” bookmark to connect to iRODS. This bookmark was created thanks to importing the profile configuration file. - -.. image:: cyberduck/cduck6.png - -- Enter your vsc-account in username and click “login”. - -.. image:: cyberduck/cduck7.png - -.. note:: Cyberduck logs in automatically if there is only one bookmark. You don’t have to do the above two steps unless your password has not expired or you have not deleted your “irods.hpc.kuleuven.be – IRODS” bookmark. If you have more than one bookmark, open the Cyberduck application and double-click the “irods.hpc.kuleuven.be – IRODS” bookmark. - -To upload data from your local machine to iRODS, you can select file(s) or folder(s) from your local machine's File Explorer or Finder and drag them into the Cyberduck window to the destination folder/collection. You can also create a new folder under File>New Folder. -Or you can click the 'Upload' button on the ribbon and select one or more files to the current directory. - -Download from iRODS to local computer using Cyberduck ------------------------------------------------------ - -You can download from iRODS to your local machine similar to data upload: select the data object(s)/collection(s) from the Cyberduck window and drag them to a location on your local computer. - -.. image:: cyberduck/cduck8.png - -A ‘Transfers’ window will appear that can be used to monitor the download to completion. You can also do “synchronization” which means it will check both sides and will update your local folder that you can create/choose based on the data in iRODS. \ No newline at end of file diff --git a/source/data/data_discovery.rst b/source/data/data_discovery.rst deleted file mode 100644 index 89053757..00000000 --- a/source/data/data_discovery.rst +++ /dev/null @@ -1,23 +0,0 @@ -.. _data_discovery: - -Data Discovery -============== - -iRODS allows users and administrators to access and contribute descriptive information about their data (metadata). This metadata improves the search experience and therefore enables data discovery. Users can search for data objects using any metadata descriptor as search terms. -Both automatic, system-generated metadata and user-created metadata are supported in iRODS. - -What is metadata? ------------------ - -Metadata is often called data about data. It describes the data in some way, such as providing information about the content, context of its origin or use, quality, condition, and associations to other data and objects. Metadata can describe a collection, a file or a component of file. -Metadata can be embedded in a data object, or stored in a database and linked to the object it describes. Metadata is used to facilitate data discovery to improve search and retrieval. - -Scholars classify metadata into three categories: descriptive, structural, and administrative. Descriptive metadata is intended to support data discovery and identification. Title, abstract and keywords are some example to descriptive metadata. Structural metadata describes the structure of the data object. For example, title page, chapters, how pages are ordered, number of pages, etc. Administrative metadata is intended to facilitate management and processing of the data. Identifying how the data was created, its file type, resolution, copyright information, licensing -information, access privileges are some administrative metadata examples. - -Why metadata? -------------- - -Metadata serves a variety of purposes, with resource discovery one of the most common. Here, it can be compared to effective cataloging, which includes identifying resources, defining them by criteria, bringing similar resources together and distinguishing among those that are dissimilar. -It is also a means of facilitating interoperability and integrating resources. Using metadata to describe resources enables its understanding by humans as well as machines. -In addition to supporting data discovery, metadata also organizes and provides contextual and historical information about data objects, identifies structural relationships within and between data objects. \ No newline at end of file diff --git a/source/data/data_service/tier1_vsc_data.png b/source/data/data_service/tier1_vsc_data.png deleted file mode 100644 index 9a0834bf..00000000 Binary files a/source/data/data_service/tier1_vsc_data.png and /dev/null differ diff --git a/source/data/desktop_clients_index.rst b/source/data/desktop_clients_index.rst deleted file mode 100644 index 8b63d487..00000000 --- a/source/data/desktop_clients_index.rst +++ /dev/null @@ -1,11 +0,0 @@ -Desktop client access -=================== - -It's possible to access an iRODS Zone via a desktop agents that support the WebDAV protocol. WebDAV stands for Web Distributed Authoring and Versioning, which is an extension to HTTP, the protocol that web browsers and web servers use to communicate with each other. -WebDAV can be used for remotely managing files over the internet. With WebDAV, we can access files stored in the VSC Tier-1 Data component by using the same interface as we do with our local files. - -.. toctree:: - :maxdepth: 2 - - winscp_access_irods - cyberduck_access_irods \ No newline at end of file diff --git a/source/data/glossary.rst b/source/data/glossary.rst deleted file mode 100644 index 0a7f8539..00000000 --- a/source/data/glossary.rst +++ /dev/null @@ -1,88 +0,0 @@ -.. _glossary: - -Glossary of iRODS -================= - -**API** - -An Application Programming Interface (API) is a computing interface to enable other software to communicate with it. Basically, an API specifies how software components should interact. iRODS defines a client API and expects that clients connect and communicate with iRODS servers in this controlled manner. iRODS has an API written in C, and another written in Java (Jargon). Other languages have wrappers around the C API (Python, PHP, etc.). - -**AVU** - -Attribute-Value-Unit (AVU) triples are associated with Collections or Data Objects and are the building blocks of metadata. - -**Client** - -A client in the iRODS client-server architecture gives users an interface to interact with iRODS to manipulate iRODS objects based on users' account profile and access level. iRODS clients include: WebDAV clients, web interfaces, iCommands, Python API etc. The programming clients are more fully fledged and also allow users to automate some repetitive works. - -**Collection** - -A Collection is the logical representation of physical containers, similar to directories or folders that are found in a file system. A Collection can have sub-collections, and hence provides a hierarchical structure. - -**Grid** - -The hardware, operating system, and other machinery that support a Zone. - -**Data Object** - -A Data Object is a data file that is stored in iRODS. It is given a unique internal identifier in iRODS (allowing a global name space), and is associated with (situated in) a single Collection. - -**iCAT** - -The iCAT, or iRODS Metadata Catalog, is a database (e.g. PostgreSQL, MySQL, Oracle) that stores references to and between the iRODS entities in an iRODS Zone: Data Objects, Collections, Users and Groups. It also stores information such as system metadata, the mapping between logical and physical storage locations and user-defined metadata (AVUs). There is one iCAT per iRODS Zone. - -**iCommands** - -iCommands are Unix utilities that give users a command-line interface to operate on data in the iRODS system. iCommands provide the most comprehensive set of client-side standard iRODS manipulation functions. - -**Inheritance** - -Collections have an attribute named Inheritance. When a Collection has this attribute set to Enabled, new Data Objects and Collections added to this Collection inherit the access permissions (ACLs) of the Collection. - -**Logical Name** - -This a virtual identifier used by iRODS to uniquely name a Data Object, Collection, Resource, or User. - -**Metadata** - -Metadata is data about data. Metadata can include system or user-defined attributes associated with a Data Object, Collection or Resource stored in the iCAT database. The metadata is in the form of AVUs (attribute-value-unit tuples). - -**Microservice** - -Microservices are small, well-defined procedures/functions that perform a certain server-side task and are either compiled into the iRODS server code or packaged independently as shared objects. Rules invoke Microservices to implement data management policies. - -**Policy Enforcement Point (PEP)** - -An event trigger in iRODS is called a PEP (Policy Enforcement Point) and it invokes an interpreted rule script via the rule engine configured in iRODS' server for the purpose of influencing a data management operation. - -**Replica** - -An identical, physical copy of a Data Object on another storage server. - -**Resources** - -A resource, or storage resource, is a software/hardware system that stores digital data. iRODS clients can operate on local or remote data stored on different types of resources through a common interface. - -**Rules** - -Rules are definitions of actions that are to be performed by the server. These actions are defined in multiple ways, depending on the language that is used to define the actions. The native language in iRODS is the iRODS Rule Language that defines actions with microservices and other actions. - -**Rule Engine** - -The Rule Engine interprets Rules written in one of the supported rule engine plugin languages. - -**Server** - -An iRODS server is software that interacts with the access protocol of a specific storage system. It enables storing and sharing data distributed geographically and across administrative domains. - -**Vault** - -The physical location of Data Objects on a storage device. - -**Workflow** - -Some form of computation or action performed on Data Objects, with a specific start and end point. - -**Zone** - -An iRODS deployment is called a Zone. \ No newline at end of file diff --git a/source/data/iCommands.rst b/source/data/iCommands.rst deleted file mode 100644 index 3b224b7a..00000000 --- a/source/data/iCommands.rst +++ /dev/null @@ -1,465 +0,0 @@ -.. _iCommands: - -iCommands -=============== - -iCommands is one of the client-side communication with iRODS server to provide users with data management and metadata management functions to do any data-related actions. In short, iCommands is an Unix utility that gives users a command-line interface. -There are more than 50 iCommands. A regular user however may use only a few of them for his/her daily needs. iRODS offers other user interfaces but the underlying point is that iCommands is the most powerful and easy to use interface to iRODS. - -All iCommands accept standard common line options (e.g., -a for all, -h for help) that gives more capabilities to the commands. To see a subset of these options and to know the details of any iCommand, you can follow the below specified options: - -- You can visit the `iCommands documentation `__ - -- You can use the ``–h`` option with the command (e.g., ``iput –h``) - -- You can use the ``ihelp`` command with the argument that you would like to learn more about (e.g., ``ihelp iput``). - -Please keep in mind some iCommands don't work with tab press auto-complete. Also remember that folders in iRODS are called 'Collections' and files are called 'Objects' or 'Data Objects'. - -The following sections illustrate the usage of some iCommands organized on the following categories: "Informative commands", "Working with Collections", "Data upload and download", "Structuring data", "Access Control" and "Handling metadata". - -Installing iCommands locally ----------------------------- -iCommands is installed on the KU Leuven Tier-1 and some of the Tier-2 clusters. As it is a client to any iRODS system, it can also be used from any local computer after installing it there. - -On a Linux OS you can use a package manager to install iCommands in the terminal. Instructions for configuring via the appropriate package manager can be found at the link https://packages.irods.org/. -You can install iCommands on different distributions as follows: - -Centos 7: - -.. code:: sh - - # Installing prerequisites - yum update - yum install wget sudo - - # Add the iRODS repository to your package manager (if you haven't done so already) - sudo rpm --import https://packages.irods.org/irods-signing-key.asc - wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo - - # Installing iCommands - yum install irods-icommands - -Almalinux 8/Rocky Linux 8: - -.. code:: sh - - # Installing prerequisites - yum update - yum install wget sudo - - # Add the iRODS repository to your package manager (if you haven't done so already) - sudo rpm --import https://packages.irods.org/irods-signing-key.asc - wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo - - # irods runtime needs to be installed manually because of https://github.com/k3s-io/k3s/issues/5588 - yum install irods-runtime - - # Installing iCommands - yum install irods-icommands - -Debian 11: - -.. code:: sh - - # Installing prerequisites - apt-get update - apt-get install wget lsb-release sudo gnupg - - # Add the iRODS repository to your package manager (if you haven't done so already) - wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - - echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/renci-irods.list - sudo apt-get update - - # Installing iCommands - apt-get install irods-icommands - -Ubuntu 18/20: - -.. code:: sh - - # Installing prerequisites - apt-get update - apt-get install wget lsb-core sudo - - # Add the iRODS repository to your package manager (if you haven't done so already) - wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - - echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/renci-irods.list - sudo apt-get update - - # Installing iCommands - apt-get install irods-icommands - -Ubuntu 22: - -.. code:: sh - - # Installing prerequisites - apt-get update - apt-get install gnupg wget sudo - wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb - sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb - - # Add the iRODS repository to your package manager (if you haven't done so already) - wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - - echo "deb [arch=amd64] https://packages.irods.org/apt/ focal main" | sudo tee /etc/apt/sources.list.d/renci-irods.list - sudo apt-get update - - # Installing iCommands - apt-get install irods-icommands - -.. note:: - Depending on your linux distribution and version, the installation procedure may vary. - If you are running a different Linux distribution, please contact data@vscentrum.be. - - - -Authenticating --------------- - -On the Tier-2 systems of KU Leuven, you can authenticate as follows: - -.. code:: sh - - irods-setup | bash - -On the login nodes of other Tier-1 or Tier-2 systems of the VSC on which iCommands are installed, you can use: - -.. code:: sh - - ssh login.hpc.kuleuven.be irods-setup | bash - -Of course, you can also authenticate on a local machine. -To do so, go to the `ManGO portal `__ -and log in. Click on ‘How to connect’ next to your zone, copy the code -under ‘iCommands for Linux’ and paste it into your terminal. This should -authenticate your for 168 hours. - - -Informative iRODS Commands --------------------------- - -These commands help us find and understand some required information that we may need while implementing a data related task. - -The most important command that will print out all commands will be:: - -$ ihelp - -If you would like to know the settings details you can run:: - -$ ienv - -To know about the detailed information of an user you can run the below command following with an user account. This command will show for example to which groups a user belongs:: - -$ iuserinfo vscXXXXX - -To be able to learn what an error code stands for, you can then use the command below with a code number:: - -$ ierror 826000 - -If you want to log out from iRODS you can run ``iexit full`` , but take into account that then you will need to log on again by executing ``ssh irods.hpc.kuleuven.be | bash`` if you want to use iRODS again. - -Working With Collections ------------------------- - -The iCommands that will be used in this part completely emulate standard Unix commands such as ``cd``, ``ls``, and ``pwd``. - -To identify the current working collection you can use the ``ipwd`` command. The current working collection is the default location for data to be read or written. Basically this command tells you where you are in iRODS. - -:: - - $ ipwd - kuleuven_tier1_pilot/home/vsc33586 - -To change the collection to the one you want, you would use ``icd`` with an absolute path or a relative path. In other words, to navigate around folder(s), do:: - -$ icd testCollection - -In order to see the content of any collection (directory), we can use ``ils``. With this command, we can check whether there is data in our iRODS-home directory. - -:: - - $ ils - /kuleuven_tier1_pilot/home/vscXXXXX: - -What we get here is “kuleuven_tier1_pilot”: the name of the iRODS zone and “/home/”: your default working directory. Because in our iRODS-home directory we don't have any data or collections yet there is no file listed. -In what follows we will show more arguments for the ``ils`` command to gather more details about data or collections. - -Data upload and download ------------------------- - -In this part we cover how we can ingest datafiles into iRODS. We will also find out where iRODS places the files. To upload data to iRODS and to download data from iRODS to a local file system, the ``iput`` and ``iget`` commands are used. - -**Create data:** - -Create/open a file with a text editor (nano, vi,..) on the linux filesystem (i.e., $VSC_DATA). You can also download a data file externally (i.e., ``wget [url]``).:: - -$ nano test1.txt - -.. image:: iCommands/nano.png - -With the linux command ``ls`` you can check that the file has been created and is accessible on the User Interface machine. - -:: - - $ ls - test1.txt - -**Upload data:** - -We now upload the data to iRODS:: - -$ iput -K test1.txt - -The flag ``-K`` triggers iRODS to create a checksum and store this checksum in the iCAT metadata catalogue. - -We can safely remove the original file from our linux directory to see what happens:: - -$ rm test1.txt - -Check your local directory with ``ls`` and see you don't have your ``test1.txt`` file. - -To check that the file is now only available on the iRODS server: - -:: - - $ ils - /kuleuven_tier1_pilot/home/vsc33586/test1.txt - -**Connection between logical and physical namespace:** - -iRODS provides an abstraction from the physical location of the files. ``/kuleuven_tier1_pilot/home/vsc33586/test1.txt`` is the logical path which only iRODS knows. But how can we know where is the file actually on the server that hosts iRODS? - -:: - - $ ils –L - /kuleuven_tier1_pilot/home/vsc33586: - vsc33586 0 default;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-3-a-weight;tier1-p-irods-posix-3-a 26 2020-05-11.10:26 & test1.txt - sha2:fB8VYoW+cGLd5z/dvrekiLPTuMvhkQKJW2c99/+WNls= generic /irods/a/home/vsc33586/test1.txt - vsc33586 1 default;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-4-a-weight;tier1-p-irods-posix-4-a 26 2020-05-11.10:26 & test1.txt - sha2:fB8VYoW+cGLd5z/dvrekiLPTuMvhkQKJW2c99/+WNls= generic /irods/a/home/vsc33586/test1.txt - -The result looks a bit confusing in the beginning, let us look at what these mean: - -- ``/kuleuven_tier1_pilot/home/vsc33586``: Logical path to the file as iRODS exposes it to the user -- vsc33586: owner of the file -- 0, 1: Index of replica of that file in the iRODS system, the Tier-1 Data is configured to ensure that by default 2 copies of each file are created (copy 0 and copy 1) in two different data centers. -- default: the name of the physical data resource, e.g. a unix folder -- 26: File size in KB -- Creation date & name of the file -- Checksum -- ``/irods/a/home/vsc33586/test1.txt``: Physical path on the server that hosts iRODS, only the linux user "vsc33586" who runs iRODS has access to that path. - -All the information above is stored in the iCAT metadata catalogue and can also be retrieved in sql-like queries (you will see this under the structuring data section). - -**Download data:** - -To download or to restore the file (=copying it from iRODS to your linux home) you can do:: - -$ iget -K test1.txt test1-restore.txt - -We store the iRODS file ``test.txt`` in a new file called ``test1-restore.txt`` in our linux home directory. Here the flag ``-K`` triggers iRODS to verify the checksum. Checksums are used to verify data integrity upon data moving. - -Note: The ``iput`` and ``iget`` commands also work for directories and collections, simply use the ``-r`` (for recursive) flag. - -Structuring data ----------------- - -As you create folder structures to organize your data, you can do same by creating collections in iRODS. Let's create a test collection (folder):: - -$ imkdir dataExample - -Let us move our ``test1.txt`` file to the this collection:: - -$ imv test.txt dataExample - -We can change our current working collection to the newly created directory. - -:: - - $ icd dataExample - $ ipwd - -The ``ils`` command will by default give you the content of ``dataExample`` collection. - -If you want to go back to your home collection, you can do one of the options below: - -:: - - $ icd /kuleuven_tier1_pilot/home// - $ icd .. - $ iexit - -With the following ``–r`` argument of ``ils``, - -:: - - $ ils -r - -you can list all collections and subcollections in iRODS recursively. - -If we want to delete/remove a data object, we simply use the ``irm`` command. - -:: - - $ irm text1.txt - -When we inspect what happens, we will not see ``text1.txt`` in our current working collection. As we won't find the file, it seems to be deleted. However, an inspection of the trash folder shows us that only the file's physical and logical path was changed. This is called a *soft delete*. - -:: - - $ ils -L /kuleuven_tier1_pilot/trash/home/vsc33586 - - /kuleuven_tier1_pilot/trash/home/vsc33586/dataExample: - vsc33586 0 default;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-3-a-weight;tier1-p-irods-posix-3-a 26 2020-05-11.14:13 & test1.txt - generic /irods/a/trash/home/vsc33586/dataExample/test1.txt - vsc33586 1 default;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-a-4-a;tier1-p-irods-posix-4-a-weight;tier1-p-irods-posix-4-a 26 2020-05-11.14:13 & test1.txt - - -That means you can restore the file with the following commands.:: - -$ imv /kuleuven_tier1_pilot/trash/home/vsc33586/dataExample/test1.txt /kuleuven_tier1_pilot/home/vsc33586/dataExample - -To remove the file completely from the system, you need to execute either; - -:: - - $ irmtrash - -Or:: - -$ irm –f test1.txt - -This is called a *hard delete*. Now the file is removed from the system and from the iCAT catalogue. - -.. note:: The ``irmtrash`` command empties the trash folder completely. - -The ``istream`` command with the ``read`` option prints the contents of a data object in iRODs like the ``cat`` command in CLI shells. - -:: - - $ istream read test.txt - -Access Control --------------- - -With the option ``ils -A`` we can list the access control list of files and collections. Let us check the ``dataExample`` collection: - -:: - - $ ils –A dataExample - /kuleuven_tier1_pilot/trash/home/vsc33586/dataExample: - ACL - vsc33586#kuleuven_tier1_pilot:own - Inheritance - Disabled - test1.txt - ACL - vsc33586#kuleuven_tier1_pilot:own - -We can understand from this that the ``dataExample`` collection and the ``test1.txt`` object are only visible to the user ``vsc33586``. -Collections have a flag "Inheritance". If this flag is set to 'true', all the content of the folder will inherit the access rights from the folder. - -Let's change the access right of the “dataExample” collection and choose another user who we want to give read access (for instance someone from our research group):: - -$ ichmod inherit dataExample -$ ichmod read vsc33585 dataExample - -To summarize, with ``ichmod`` we can set “read”, “write” and “own” permissions and we can also set the inheritance for collections. - -if we want to check the result of our change: - -:: - - $ ils –A dataExample - /kuleuven_tier1_pilot/trash/home/vsc33586/dataExample: - ACL - vsc33586#kuleuven_tier1_pilot:own vsc33585#kuleuven_tier1_pilot:read object - Inheritance - Enabled - test1.txt - ACL - vsc33586#kuleuven_tier1_pilot:own - -So we can see here that inheritance is enabled for the dataExample collection and user vsc33585 has now the right to read the data object. - - -Handling metadata ------------------ - -Creating Attribute, Value, Unit triples -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -iRODS provides the user with the possibility to create Attribute-Value-Unit (AVU) triples for any iRODS entity (Data Objects, Collections, Resources or Users). The triples are stored in the iCAT catalogue (in the database), which can be queried to identify and retrieve the correct objects when needed. - -This enables us to ask the iRODS system to provide all data (files and collections) based on the matching query criteria. - -First we will explore how to create these AVU triples for which we can search later. - -- Annotate a data file:: - - $ imeta add -d test1.txt weight 2 kg - - $ imeta add -d test1.txt 'author' 'Jan Ooghe' 'ICTS' - - $ imeta add -d test1.txt 'shareable' yes - -In the last one we left the 'Unit' part empty. That means unit is not mandatory to write if there is no relevant element for that. - -.. note:: Please note that apostrophes are not mandatory but are needed to store Values containing spaces. - -- Annotate a collection:: - - $ imeta add -C dataExample 'type' 'collection' - - $ imeta add -C dataExample 'book' 'chemistry' 'KULeuven' - -List metadata -^^^^^^^^^^^^^ - -In order to list metadata of a file we do:: - -$ imeta ls -d test1.txt - -and to list a collection's metadata:: - -$ imeta ls -C dataExample - -Querying data -^^^^^^^^^^^^^ - -It is also possible to find all entities matching certain attribute values. The imeta command allows users to define simple queries:: - -$ imeta qu -d weight = 2 - -A more sophisticated search can be done using ``iquest``: this uses sql-like queries to find entities by AVUs and by information not stored in AVUs. For instance searching by name, id, size, checksum, owner,... - -With the following command we can fetch the data file, that has the attribute 'author' completed:: - - $ iquest "select COLL_NAME, DATA_NAME, META_DATA_ATTR_VALUE where META_DATA_ATTR_NAME like 'author'" - -We can filter for a specific attribute values and use wildcards ('%' and '_')::: - - $ iquest "select COLL_NAME, DATA_NAME where \ - META_DATA_ATTR_NAME like 'author' and META_DATA_ATTR_VALUE like 'Jan%'" - -We can find our text1.txt file by estimating its size in bytes::: - - $ iquest "select DATA_NAME,DATA_SIZE where DATA_SIZE BETWEEN '20' '30'" - - DATA_NAME = test1-restore.txt - DATA_SIZE = 26 - --------------------------------------- - DATA_NAME = test1.txt - DATA_SIZE = 26 - --------------------------------------- - -To see all searchable attributes, use -:: - - $ iquest attrs - -Cheat sheet of basic iCommands ------------------------------- - -A list of commands that is required for basic data operations is provided below. - -.. image:: iCommands/cheat_sheet.png - - - -.. include:: links.rst \ No newline at end of file diff --git a/source/data/introduction/irods_4competences.png b/source/data/introduction/irods_4competences.png deleted file mode 100644 index 4733fd88..00000000 Binary files a/source/data/introduction/irods_4competences.png and /dev/null differ diff --git a/source/data/introduction/irods_core.png b/source/data/introduction/irods_core.png deleted file mode 100644 index 409f1af7..00000000 Binary files a/source/data/introduction/irods_core.png and /dev/null differ diff --git a/source/data/introduction_to_irods.rst b/source/data/introduction_to_irods.rst deleted file mode 100644 index 3f3fd246..00000000 --- a/source/data/introduction_to_irods.rst +++ /dev/null @@ -1,26 +0,0 @@ -.. _introduction_to_irods: - -Introduction to iRODS -===================== - -The integrated Rule-Oriented Data System (iRODS) is an open source data management software middleware that manages a highly controlled collection of distributed digital objects (various data), while enforcing user/institution-defined Management Policies across multiple storage locations. - -It enables users to access, manage and share data across different storage systems as well as exercising precise control over their data with rules while maintaining security and user friendly approaches. - -iRODS gives users four core competencies: - -- **Virtualization** - It provides a logical representation of files stored in distributed physical storage locations. Regardless of differences of storage assets, the virtualization of iRODS present a unified namespace with the classical files and folders format. - -- **Data Discovery** - This is ensured through the use of descriptive, user-defined metadata in addition to traditional system metadata, such as filename, file size and creation date. - -- **Workflow Automation** - iRODS servers can execute event-triggered background process (rules) to execute defined actions when a particular system activity happens. iRODS event triggers are called Policy Enforcement Points (PEPs). The combination of PEPs and rules allows administrators and users to create powerful, customized workflows for automating procedures that help to save time and prevent human error. - -- **Secure Collaboration** - iRODS provides facilities to share data between users and user groups in a secure way. iRODS has a permissions model similar to Unix file system permissions. Other facilities offered are 'Tickets' and 'Federation', but these are not actively used on the Tier-1 Data Platform. - -.. image:: introduction/irods_4competences.png - -Other features that make iRODS a unique data management platform can be found on the next pages. \ No newline at end of file diff --git a/source/data/irods_clients_index.rst b/source/data/irods_clients_index.rst deleted file mode 100644 index 7255dcaf..00000000 --- a/source/data/irods_clients_index.rst +++ /dev/null @@ -1,10 +0,0 @@ -iRODS Clients -=================== - -.. toctree:: - :maxdepth: 3 - - iCommands - programming_clients_index - web_application_clients_index - network_file_system_clients_index \ No newline at end of file diff --git a/source/data/links.rst b/source/data/links.rst deleted file mode 100644 index b5dc3b0a..00000000 --- a/source/data/links.rst +++ /dev/null @@ -1,4 +0,0 @@ -.. _www.irods.org: https://irods.org/ -.. _iCommands: https://docs.irods.org/4.2.8/icommands/user/ -.. _Cyberduck: https://irods.org/2015/09/howtocyberduck/ -.. _here: .. _webdav_access_to_irods: \ No newline at end of file diff --git a/source/data/metalnx.rst b/source/data/metalnx.rst deleted file mode 100644 index 981f694d..00000000 --- a/source/data/metalnx.rst +++ /dev/null @@ -1,59 +0,0 @@ -.. _metalnx: - -Metalnx Portal -============== - -Metalnx is a graphical user interface and serves as a client to iRODS. It helps to simplify most administration, collection management, and metadata management tasks removing the need to memorize the long list of iCommands. It allows users to manage content and metadata associated with content. - -You can reach the Metalnx portal here: ``_. - -You will need to authenticate with your institutional account and then you automatically login on the interface with your VSC account. Users from some institutions might encounter a second login page, but the credentials should already be filled in. - -The Metalnx portal is mainly composed of two panes. The left pane shows tabs of the main functionalities and the right pane provides the selected tab's functionalities. - -.. image:: metalnx/metalnx_general.PNG - -**Collections**: Under this tab, we can perform all data object and collection-related activities. This tab and its functionalities are among the most used in Metalnx. - -- Inspect and browse through the logical path -- Creating collections, by clicking the folder icon -- View info, by clicking the 'i' icon -- Upload files, by clicking the cloud icon -- Under the Action dropdown menu, after ticking a check box: - - Move data objects/collections - - Copy fidata objectsles/collections - - Rename data objects/collections - - Apply metadata templates - - Download data objects - - Delete data objects and collections - -To the right of any collection or file, you can press 'View info' for the following: - -- Show the full logical path of the collection/file -- Under 'Action': Renaming/deleting files/collections -- Adding files/collections to your favorites, by clicking the star icon -- Download collections as a zip, or collections, by clicking the download icon -- View system-generated properties, under the tab 'Details' -- Add metadata to files/collections, under the tab 'Metadata' -- Set permissions to files/collections, under the tab 'Permissions' -- Show file previews (txt and csv), under the tab 'Preview' - -.. image:: metalnx/metalnx_view_info.png - -All Access Control functionality provided by iCommands is also offered by Metalnx. A user or group of users can be given Read/Write/Own rights on a specific file or collection. Rights can also be revoked by setting them to 'None'. To provide access rights to a certain file or collection for a certain user/group, one should apply the same or higher access rights to the parent collection as well. This is standard Unix permissions logic but this is not always apparent when working with a GUI application such as Metalnx. When applying rights to a collection it is possible to propagate ('Inheritance' in iRODS) them to child collections and files by selecting 'Apply to subcollections and files'. - -So this is one of the ways to share files/collections with other users. However, it may be hard for users to keep track of all shared files. For this reason, you can tag them as shared, by ticking the 'Shared Link' checkbox when adding user/group rights to a file. They will then appear in the 'Shared links' tab on the left hand tabs. - -.. image:: metalnx/metalnx_permissions.png - -**AVU Search**: Under this tab you can search either based on AVU metadata or system-generated properties. However, this search functionality is currently not available due to a bug in the software that should be fixed soon. - -**Templates**: Here we can create our own metadata templates or import an external template in json format. A template provides any number of predefined AVUs. When applying such a template to files or collections, it prompts to complete or modify the predefined AVU. When defining the template, only the attribute name is required, and this can be used to ensure that users make use of the same attribute to store the same type of information. The same attribute can occur multiple times in the schema and the AVUs. By adding a value and unit in the template, you help users in selecting a single default value or provide a list of options. - -**Shared Links**: Here you can see the links to data objects and collections shared by other users to you (as explained in the 'Permissions' tab). - -**Favorites**: Here you can see your bookmarked collections and files. You can mark a favorite on the top right of the View info page. - -**Public**: Here you reach the public area collections. These are stored in the /kuleuven_tier1_pilot/home/public/ Collection. However, this area should not be used. If you want to share data with colleagues, it is better to use your group collection for this. This collection can be found at //home/ - -**Trash**: Here you can see the files and collections moved to the trash bin after either a soft delete (without -f flag) with an iRODS iCommand/PRC function or via the delete functionality of Metalnx. \ No newline at end of file diff --git a/source/data/metalnx/Schermafbeelding 2022-09-21 155550.png:Zone.Identifier b/source/data/metalnx/Schermafbeelding 2022-09-21 155550.png:Zone.Identifier deleted file mode 100644 index a831910b..00000000 --- a/source/data/metalnx/Schermafbeelding 2022-09-21 155550.png:Zone.Identifier +++ /dev/null @@ -1,3 +0,0 @@ -[ZoneTransfer] -LastWriterPackageFamilyName=Microsoft.ScreenSketch_8wekyb3d8bbwe -ZoneId=3 diff --git a/source/data/metalnx/metalnx_general.PNG b/source/data/metalnx/metalnx_general.PNG deleted file mode 100644 index 0dba0cb4..00000000 Binary files a/source/data/metalnx/metalnx_general.PNG and /dev/null differ diff --git a/source/data/metalnx/metalnx_permissions.png b/source/data/metalnx/metalnx_permissions.png deleted file mode 100644 index d1e5b402..00000000 Binary files a/source/data/metalnx/metalnx_permissions.png and /dev/null differ diff --git a/source/data/metalnx/metalnx_view_info.png b/source/data/metalnx/metalnx_view_info.png deleted file mode 100644 index 03c0feee..00000000 Binary files a/source/data/metalnx/metalnx_view_info.png and /dev/null differ diff --git a/source/data/network_file_system_clients_index.rst b/source/data/network_file_system_clients_index.rst deleted file mode 100644 index c7e90f70..00000000 --- a/source/data/network_file_system_clients_index.rst +++ /dev/null @@ -1,10 +0,0 @@ -Network File System interfaces -=================== - -.. toctree:: - :maxdepth: 2 - - web_application_clients_file_index - desktop_clients_index - command_line_clients_index - windows_file_explorer \ No newline at end of file diff --git a/source/data/programming_clients_index.rst b/source/data/programming_clients_index.rst deleted file mode 100644 index ee8d3619..00000000 --- a/source/data/programming_clients_index.rst +++ /dev/null @@ -1,7 +0,0 @@ -Programming Clients -=================== - -.. toctree:: - :maxdepth: 2 - - python_client \ No newline at end of file diff --git a/source/data/python_client.rst b/source/data/python_client.rst deleted file mode 100644 index 385ce982..00000000 --- a/source/data/python_client.rst +++ /dev/null @@ -1,499 +0,0 @@ -.. _python_client: - -Python API Client - PRC -======================= - -The Python iRODS Client (PRC) is an API client implemented in python to access to iRODS. The main goal of the PRC is to offer researchers means to manage their data in python. With the help of this client, users can manage their research data. Currently supported operations with PRC are quite various and range from “put/get data objects“ to “execution of iRODS rules”. We will cover here basic ones in addition to VSC-PRC (the Vlaams Supercomputing Centrum (VSC) extensions to the Python iRODS Client). - -This client will be used inside the VSC like iCommands instead of reaching iRODS server from outside the VSC. - -Installing/Dependencies ------------------------ - -We recommend to set up PRC and VSC-PRC via the module system as follows: - -:: - - module use /apps/leuven/common/modules/all - module load vsc-python-irodsclient - -.. note:: PRC and VSC-PRC require Python 3 and hence cannot be used with Python 2 interpreters. - - -.. note:: There is also a module on the HPC clusters of KU Leuven with only the bare python-irodsclient (version 1.1.4). - You can use it as follows: - - :: - - module use /apps/leuven//2021a/modules/all - module load python-irodsclient/1.1.4-GCCcore-10.3.0 - - You should replace with the architecture of the (login) node you are on ('cascadelake', 'skylake' or 'broadwell'). - - However, in the following examples, the module 'vsc-python-irodsclient' will be used. - - -A Secure Connection Settings ----------------------------- - -We will connect to iRODS using environment files as shown below. - -:: - - import os - import ssl - from irods.session import iRODSSession - try: - env_file = os.environ['IRODS_ENVIRONMENT_FILE'] - except KeyError: - env_file = os.path.expanduser('~/.irods/irods_environment.json') - - ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH, cafile=None, capath=None, cadata=None) - ssl_settings = {'ssl_context': ssl_context} - - with iRODSSession(irods_env_file=env_file, **ssl_settings) as session: - pass - -Working with Collections ------------------------- - -A user can connect to a specific iRODS collection and see the some basic information with that. Also a user can reach to sub-collections and create a new collection. - -To reach a specific collection, we should specify its absolute path. - -:: - - coll = session.collections.get("/kuleuven_tier1_pilot/home/vsc33586") - -Once we get the response of a specific collection, we can simply check the id or path of this instance. - -:: - - coll.id - 11482 - -:: - - coll.path - '/kuleuven_tier1_pilot/home/vsc33586' - -In order to see sub-collections of a collection: - -:: - - for i in coll.subcollections: - print(i) - - - - - - -We can also see the data files (objects) of a collection with the following command: - -:: - - for i in coll.data_objects: - print(i) - - - - - - - - - -It is possible to create a collection under a specific location. - -:: - - new_coll = session.collections.create('/kuleuven_tier1_pilot/home/vsc33586/data_test') - new_coll.id - - 285438 - -.. note:: If a collection we want to create already exists, the PRC doesn't do anything, and neither complains nor overwrites the existed collection. - -We can create a collection even recursively: - -:: - - coll_rec = session.collections.create('/kuleuven_tier1_pilot/home/vsc33586/test_A/test_B') - coll_rec.name - - 'test_B' - -Working with data objects (files) ---------------------------------- - -To create a new data object, use the following code. - -:: - - obj_new = session.data_objects.create('/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj') - - obj_new.path - '/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj' - -To get an existing data object and to see the imported object's details: - -:: - - obj = session.data_objects.get('/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj') - - obj.id - 285450 - - obj.name - 'data_obj' - - obj.collection - - -If we use built-in python vars function with the argument of obj, we can see all values related to this data object in a dictionary. - -:: - - vars(obj) - - {'manager': , - 'collection': , - 'id': 285450, - 'collection_id': 285438, - 'name': 'data_obj', - 'replica_number': 0, - 'version': None, - 'type': 'generic', - 'size': 0, - 'resource_name': 'tier1-p-irods-posix-3-b', - 'path': '/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj', - 'owner_name': 'vsc33586', - 'owner_zone': 'kuleuven_tier1_pilot', - 'replica_status': '1', - 'status': None, - 'checksum': None, - 'expiry': '00000000000', - 'map_id': 0, - 'comments': None, - 'create_time': datetime.datetime(2020, 6, 29, 7, 8, 26), - 'modify_time': datetime.datetime(2020, 6, 29, 7, 8, 26), - 'resc_hier': 'default;tier1-p-irods-2020-pilot;tier1-p-irods-2020-pilot-replication;tier1-p-irods-posix;tier1-p-irods-posix-1-4;tier1-p-irods-posix-3-b-2-b;tier1-p-irods-posix-3-b-weight;tier1-p-irods-posix-3-b', - 'resc_id': '10087', - 'replicas': [], - '_meta': None} - -We can also upload an existing file (locally) as a new data object to iRODS. To do this we use "put" method. -First argument is the local file we want to upload and the second argument is the absolute path (collection + file name we have given) that well take the local data object.) - -:: - - session.data_objects.put('test1.txt','/kuleuven_tier1_pilot/home/vsc33586/data_test/test1.txt') - -To see the result we can get the uploaded data object. - -:: - - obj2 = session.data_objects.get('/kuleuven_tier1_pilot/home/vsc33586/data_test/test1.txt') - - obj2.id - 285684 - -If we would like to delete the data object, we use the code below. But notice that the force option is important, since a data object in the trash does still exist. - -:: - - obj.unlink(force=True) - -Reading and writing files -------------------------- - -The PRC provides file-like manipulations for data objects:. - -:: - - obj = session.data_objects.get('/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj') - - with obj.open('r+') as f: - f.write("Hello iRODS\n".encode()) - f.write("This is a test file".encode()) - f.seek(0) - for line in f: - print(line) - - b'Hello iRODS\n' - b'This is a test file' - -Working with metadata ---------------------- - -In order to work with metadata we first import the relevant class. - -:: - - from irods.meta import iRODSMeta - -If we try to check a file with no metadata attached, the result should be an empty list. - -:: - - obj = session.data_objects.get('/kuleuven_tier1_pilot/home/vsc33586/data_test/data_obj') - print(obj.metadata.items()) - - [] - -Let's now add some metadata. As we did with the iCommand, we can add multiple AVU's with the same name field. - -:: - - obj.metadata.add('key1', 'value1', 'units1') - obj.metadata.add('key1', 'value2') - obj.metadata.add('key2', 'value3') - - print(obj.metadata.items()) - [, , ] - -We can update any added metadata with Python's item indexing syntax referring an existing attribute to set all AVU's with name field "key2" to a single value and unit. - -:: - - meta_update = iRODSMeta('key2', 'python_API_training', 'version1') - obj.metadata[meta_update.name] = meta_update - - print(obj.metadata.items()) - [, , ] - -If we know an AVU key is present only once, we can use the get_one method as in the following example. This method returns an AVU for the given unique attribute. - -:: - - print(obj.metadata.get_one('key2')) - - -To remove a specific AVU from an object, use the following command. - -:: - - obj.metadata.remove('key1', 'value1', 'units1') - - print(obj.metadata.items()) - [, ] - -We can also use a for loop to remove all existing AVUs from a data object. - -:: - - for avu in obj.metadata.items(): - obj.metadata.remove(avu) - - print(obj.metadata.items()) - [] - -General queries with PRC ------------------------- - -We can collect all Collection and DataObject objects of all projects that we are assigned to with the following general query. We can then use the result list for further lookups. - -:: - - import os - from irods.session import iRODSSession - from irods.models import Collection, DataObject - - env_file = os.path.expanduser('~/.irods/irods_environment.json') - with iRODSSession(irods_env_file=env_file) as session: - query = session.query(Collection.name, DataObject.id, DataObject.name, DataObject.size, DataObject.create_time) - - for result in query: - print('{}/{}, size={}, create_time={}'.format(result[Collection.name], result[DataObject.name], result[DataObject.size], result[DataObject.create_time])) - - /kuleuven_tier1_pilot/home/vsc33586/test_AA, size=0, create_time=2020-06-30 12:26:30 - /kuleuven_tier1_pilot/home/vsc33586/user.sh, size=67, create_time=2020-04-17 12:25:57 - /kuleuven_tier1_pilot/home/vsc33586/UserCreationScript_Bash_IRODS.txt, size=274, create_time=2020-05-15 14:15:34 - /kuleuven_tier1_pilot/home/vsc33586/dataExample/test1-restore.txt, size=35, create_time=2020-05-14 07:41:30 - /kuleuven_tier1_pilot/home/vsc33586/dataExample/test1.txt, size=26, create_time=2020-05-11 08:26:23 - /kuleuven_tier1_pilot/home/vsc33586/data_test/test2.txt, size=59, create_time=2020-06-29 08:58:51 - /kuleuven_tier1_pilot/home/vsc33586/KULeuven/alice1.txt, size=74703, create_time=2020-04-27 14:09:31 - -It's also possible to search for specific data records based on the general metadata query by filtering with AVU info. - -:: - - from irods.column import Criterion - from irods.models import DataObject, DataObjectMeta, Collection, CollectionMeta - from irods.session import iRODSSession - import os - env_file = os.path.expanduser('~/.irods/irods_environment.json') - with iRODSSession(irods_env_file=env_file) as session: - results = session.query(Collection, CollectionMeta).filter( Criterion('like', CollectionMeta.value, '%chem%')) - for r in results: - print(r[Collection.name], r[CollectionMeta.name], r[CollectionMeta.value], r[CollectionMeta.units]) - - /kuleuven_tier1_pilot/home/vsc33586/dataExample 'book' 'chemistry' 'KuLeuven' - -We can query with aggregation(min, max, sum, avg, count) like the following example; - -:: - - with iRODSSession(irods_env_file=env_file) as session: - query = session.query(DataObject.owner_zone).max(DataObject.size) - print(next(query.get_results())) - - {: 'kuleuven_tier1_pilot', : 18672491605} - - -Instantiating iRODS objects from query results ----------------------------------------------- - -In addition to the general query that gets information out of the ICAT, we can instantiate certain iRODS objects mirroring the persisted entities (instances of Collection, DataObject, User, or Resource, etc.) of the ICAT. - -:: - - user = session.users.get('vsc33586') - print(user) - - - -We can do the same with creation, removal and unlink. - -The example below retrieves a reference to an existing collection using *get*. - -:: - - col = session.collections.get('/kuleuven_tier1_pilot/home/vsc33586/dataExample') - print(col) - - - -So, how can we know what properties variable *col*, a reference to an iRODS Collection object, has? -The following code gives us some useful information. - -:: - - [ x for x in dir(col) if not x.startswith('__') ] - - ['_meta', - 'data_objects', - 'id', - 'manager', - 'metadata', - 'move', - 'name', - 'path', - 'remove', - 'subcollections', - 'unregister', - 'walk'] - -Let's check now the metadata of this instance. To see the result properly, we will use here the "pretty-print" module. - -:: - - from pprint import pprint - - pprint((col.metadata.items())) - - [, - ] - -We can see the sub-collections of a specific collection by using the walk method of this instance. - -:: - - col = session.collections.get('/kuleuven_tier1_pilot/home/vsc33586') - - for sub_coll in col.walk(): - pprint( sub_coll ) - - < series of Python data structures giving the complete tree structure of *col* instance under collection 'vsc33586'> - -If we wish to enumerate all collections in the iRODS catalog, we can use, as an alternative approach, general queries and the capabilities afforded by the PRC's object-relational mapping. - -:: - - from irods.collection import iRODSCollection - from irods.models import Collection - - for result in session.query(Collection): - print(iRODSCollection(session.collections,result)) - - < all collections assigned to the user and their sub-collections in the iRODS catalog. > - -If you would like to see more details and examples, you can have a look at the following link of original PRC documentation, ``_. - -VSC Python iRODS Client (VSC-PRC) ---------------------------------- - -VSC-PRC's main goal is to make it easier for researchers to manage their data using iRODS, in particular on VSC's high performance computing infrastructure. - -To this end, VSC-PRC offers a Python module and associated command line scripts: - -* The ``vsc_irods`` Python module contains a ``VSCiRODSSession`` class - which represents an extension of the corresponding ``iRODSSession`` class - in PRC. - - A main feature is the possibility of using wildcards ("*") and tildes - ("~") for specifying iRODS data objects and collections. For example, - the following code will copy all files ending on '.txt' inside a - 'my_irods_collection' collection in your irods_home to the local working - directory: - - :: - - with VSCiRODSSession() as session: - session.bulk.get('~/my_irods_collection/*.txt', local_path='.') - - Other 'bulk' operations are available for: - - - uploading files and folders - - removing data objects and collections - - adding and modifying metadata - - listing the disk usage - - More advanced search capabilities (i.e. beyond the above glob patterns) - are also provided. For example, the following can be used to list all - data objects in your irods_home ending on '.txt' and which possess a - metadata entry with Attribute='Author' and Value='Me': - - :: - - with VSCiRODSSession() as session: - for item in session.search.find('~', pattern='*.txt', types='f', object_avu=('Author', 'Me')): - print(x) - - This can be used in conjunction with the 'bulk' operations, e.g.: - - :: - - with VSCiRODSSession() as session: - iterator = session.search.find('~', pattern='*.txt', types='f', object_avu=('Author', 'Me')) - session.bulk.get(iterator, local_path='.') - - -* VSC-PRC also comes with a set of scripts which make it easy to use the - Python module from a Unix shell: - - - vsc-prc-find - - vsc-prc-iget - - vsc-prc-iput - - vsc-prc-imkdir - - vsc-prc-irm - - vsc-prc-size - - vsc-prc-imeta - - vsc-prc-add-job-metadata - - Typing e.g. ``vsc-prc-find --help`` will show a description of the - recognized arguments. The command-line equivalents of the three Python - snippets above, for example, would look like this: - - :: - - vsc-prc-iget '~/my_irods_collection/*.txt' -d - vsc-prc-find '~' -n '*.txt' --object_avu='Author;Me' - vsc-prc-find '~' -n '*.txt' --object_avu='Author;Me' | xargs -i vsc-prc-iget {} -d - -VSC-PRC is a complementary module created for supporting PRC operations on VSC. - -In order to get a general overview of VSC-PRC, we recommend users to have a look at the “Introduction to VSC-PRC” tutorial at the following link, ``_. - -You can also find a HPC-specific example where the VSC-PRC is used in a jobscript at the following link, ``_. \ No newline at end of file diff --git a/source/data/tier1_data_architecture.rst b/source/data/tier1_data_architecture.rst deleted file mode 100644 index afcda525..00000000 --- a/source/data/tier1_data_architecture.rst +++ /dev/null @@ -1,17 +0,0 @@ -.. _tier1_data_architecture: - -Tier-1 Data Platform Architecture -================================= - - -The VSC Tier-1 Data component is based on the open source software iRODS. The following image shows the high level archirecture of the platform. - -.. image:: architecture/general_overview.png - -The current deployment is based in a unique iRODS zone (“kuleuven_tier1_pilot”) with a single iCAT database configured on High Availability. There are three distributed storage resources: 2 POSIX based systems and 1 Ceph Object Storage system). - -A user can access iRODS from a local computer and/or the VSC Tier-1 and Tier-2 systems using different types of user clients. At this moment there are available: programming clients such as iCommands and a Python client; web applications such as MetaLnx and (soon) the KU Leuven Data Portal; and various clients implementing WebDAV. - -iCommands is an utility that gives users a command-line interface to operate on data in iRODS. PRC is a Python Client API to establish a secure connection to iRODS and to be able to interoperate with iRODS from python programs. - -With the aid of the WebDAV protocol, Drag and Drop Access to iRODS is ensured by means of some apps/tools (e.g. WebDAV mapping, Cyberduck and WinSCP) that enable data transfer. diff --git a/source/data/tier1_data_main_index.rst b/source/data/tier1_data_main_index.rst deleted file mode 100644 index 8ae4dc7c..00000000 --- a/source/data/tier1_data_main_index.rst +++ /dev/null @@ -1,14 +0,0 @@ -Tier-1 Data Service -=================== - -.. toctree:: - :maxdepth: 3 - - tier1_data_service - introduction_to_irods - tier1_data_architecture - data_discovery - user_access - irods_clients_index - workflow_automation - glossary \ No newline at end of file diff --git a/source/data/tier1_data_service.rst b/source/data/tier1_data_service.rst deleted file mode 100644 index d9e45487..00000000 --- a/source/data/tier1_data_service.rst +++ /dev/null @@ -1,25 +0,0 @@ -.. _tier1_data_service: - -Tier-1 Data Service -=================== - -The Tier-1 supercomputing infrastructure in Flanders had until 2018 mainly been targeted at users with serious calculation issues (typical HPC/HTC workloads). Although this platform in its current form is already very successful, the current focus on compute no longer meets all the needs of many researchers. There is also a strong demand for more data storage capacity, data processing solutions and customized user environments. - -Therefore, VSC decided in 2018 to offer a new Tier-1 model: Supercomputing as a Service. In this model three infrastructure components (Compute, Data and Cloud) are tightly coupled to allow easy and fast transfers of data between the three systems and to offer a higher level of service to the VSC users. - -The Tier-1 Data component aims to offer the VSC users the required tools and infrastructure as well as human resources to help them to manage research data. - -Importance of Tier-1 Data Service ---------------------------------- - -More and more users have computational work that makes intensive use of large data sets. Migrating this data to and from the compute infrastructure whenever it is to be used for a calculation is very inefficient because of the scale. It is therefore necessary to add a data component where large data sets can be stored for a longer period of time and from there also be processed efficiently. - -This service has as primary goal to offer the users a platform to easy manage research data and help them to apply the **FAIR** principles to their research data from the very beginning of their projects. This should make it easier to transfer their research data and output at the end of the project to institutional or domain specific repositories for publication and preservation and when applicable ensure they are made publicly available (open access). - -.. image:: data_service/tier1_vsc_data.png - -The Tier-1 Data component provides a service to allow users to store research data during the active phase of the research data life cycle (that is, data that is being collected and analyzed and has not yet being published). This service for now is restricted to data of research projects that are using the VSC Tier-1 Compute infrastructure. - -This platform should also help the researchers to run their scientific workflows more efficiently by providing tools to automate data collection, data quality control and stage in a and stage out data from and to Tier-1 Compute system. - -This Tier-1 Data service is based on the Open Source software iRODS (`http://www.irods.org`_). How to gain access to the Tier-1 Data iRODS instance is explained in the article ':ref:`user_access`'. \ No newline at end of file diff --git a/source/data/user_access.rst b/source/data/user_access.rst deleted file mode 100644 index d16d950a..00000000 --- a/source/data/user_access.rst +++ /dev/null @@ -1,40 +0,0 @@ -.. _user_access: - -User Access to iRODS -==================== - -The Tier-1 Data Service at KU Leuven is currently in a pilot phase, so access to the system is strictly by invitation. If you have a use case that combines data and computing workflows and you are interested on testing the Tier-1 Data service you can contact us to discuss a possible collaboration by e-mail at the address: data@vscentrum.be - -To be able to log on and to use the Tier-1 Data platform, you need to have an active vsc-account and an approved Tier-1 Data project. During the pilot phase Tier-1 Data projects are granted by invitation only. - -Users can connect to the iRODS platform by using different clients (command line, WebDAV interface or web application) both from the VSC HPC systems (login and compute nodes) and from external systems (i.e: user's laptops). - -Before you can interact with iRODS, as a VSC user you will need to activate the service executing one of the following commands; - -- If you are logged in to the login nodes of Tier-1 or Tier-2 clusters of KU Leuven, you should use: - -:: - - $ irods-setup | bash - -- If you want to connect from any login nodes of other universities' HPC cluster, you should execute: - -:: - - $ ssh login.hpc.kuleuven.be irods-setup | bash - -In any case, any attempt to login to the Tier-1 or Tier-2 HPC clusters at KU Leuven will invite you to open the HPC Firewall url, which in turn forwards to your institutional login page. Therefore, please note that you can't login to iRODS in an automated fashion. More information can be found on the `HPC Firewall page `__. - -These commands will activate a temporary token for a period of 7 days. After the 7 days have passed you will need to reactivate your access by re-executing one of these commands again. - -The Tier-1 Data service has the following landing page: https://irods.hpc.kuleuven.be/. This provides the entrypoint to start working with iCommands, the Python iRODS client and the WebDAV client. - -It is also possible to launch iCommands directly from your local Linux (either native or via a VM or the Windows Subsystem for Linux) computer against the Tier-1 iRODS zone. For this you need to install iCommands and execute the snippet under 'iCommands Client on Linux' from the landing page. - -Once logged in iRODS users will have access to the following iRODS collections: - -- Your personal area: /kuleuven_tier1_pilot/home/vscXXXXX (where XXXXX is the number of your vsc-account). This area is by default only visible by your user account. - -- Your group area: /kuleuven_tier1_pilot/home/lt1_projectcode. The area is shared and visible by all the members of your group. - -- The public area: /kuleuven_tier1_pilot/home/public. This is an area accessible by everyone in the system. It could be even accessed by anonymous users from external sources if this is configured. Usage of this area is discouraged, and your group directory under home should be used for shared storage. \ No newline at end of file diff --git a/source/data/web_application_clients_file_index.rst b/source/data/web_application_clients_file_index.rst deleted file mode 100644 index 7103ee8d..00000000 --- a/source/data/web_application_clients_file_index.rst +++ /dev/null @@ -1,7 +0,0 @@ -Web application Clients -=================== - -.. toctree:: - :maxdepth: 2 - - webdav_access_to_irods \ No newline at end of file diff --git a/source/data/web_application_clients_index.rst b/source/data/web_application_clients_index.rst deleted file mode 100644 index 9b89b609..00000000 --- a/source/data/web_application_clients_index.rst +++ /dev/null @@ -1,7 +0,0 @@ -Web application Clients -=================== - -.. toctree:: - :maxdepth: 2 - - metalnx \ No newline at end of file diff --git a/source/data/webdav/davrods_access.png b/source/data/webdav/davrods_access.png deleted file mode 100644 index ebdd5ae0..00000000 Binary files a/source/data/webdav/davrods_access.png and /dev/null differ diff --git a/source/data/webdav/dir_index.png b/source/data/webdav/dir_index.png deleted file mode 100644 index fafcf11e..00000000 Binary files a/source/data/webdav/dir_index.png and /dev/null differ diff --git a/source/data/webdav/map1.png b/source/data/webdav/map1.png deleted file mode 100644 index 93d8b54b..00000000 Binary files a/source/data/webdav/map1.png and /dev/null differ diff --git a/source/data/webdav/map2.png b/source/data/webdav/map2.png deleted file mode 100644 index f3e3639a..00000000 Binary files a/source/data/webdav/map2.png and /dev/null differ diff --git a/source/data/webdav/map3.png b/source/data/webdav/map3.png deleted file mode 100644 index 0feb9d5c..00000000 Binary files a/source/data/webdav/map3.png and /dev/null differ diff --git a/source/data/webdav/map4.png b/source/data/webdav/map4.png deleted file mode 100644 index d0027a3e..00000000 Binary files a/source/data/webdav/map4.png and /dev/null differ diff --git a/source/data/webdav/map5.png b/source/data/webdav/map5.png deleted file mode 100644 index 18159fe7..00000000 Binary files a/source/data/webdav/map5.png and /dev/null differ diff --git a/source/data/webdav/pass_page.png b/source/data/webdav/pass_page.png deleted file mode 100644 index a49de348..00000000 Binary files a/source/data/webdav/pass_page.png and /dev/null differ diff --git a/source/data/webdav/pass_request.png b/source/data/webdav/pass_request.png deleted file mode 100644 index f4f9023f..00000000 Binary files a/source/data/webdav/pass_request.png and /dev/null differ diff --git a/source/data/webdav/password.png b/source/data/webdav/password.png deleted file mode 100644 index 85adc315..00000000 Binary files a/source/data/webdav/password.png and /dev/null differ diff --git a/source/data/webdav/vsc_authentication_page.png b/source/data/webdav/vsc_authentication_page.png deleted file mode 100644 index 8c584725..00000000 Binary files a/source/data/webdav/vsc_authentication_page.png and /dev/null differ diff --git a/source/data/webdav_access_to_irods.rst b/source/data/webdav_access_to_irods.rst deleted file mode 100644 index bdfddf3e..00000000 --- a/source/data/webdav_access_to_irods.rst +++ /dev/null @@ -1,37 +0,0 @@ -.. _webdav_access_to_irods: - -WebDAV Client Access (Davrods) -=============================== - -WebDAV stands for Web Distributed Authoring and Versioning, which is an extension to HTTP, the protocol that web browsers and web servers use to communicate with each other. -WebDAV can be used for remotely managing files over the internet. With WebDAV, we can access files stored in the VSC Tier-1 Data component by using the same interface as we do with our local files. - -The module that we use for our iRODS connection is called `Davrods `__. Davrods is an Apache WebDAV interface to iRODS and it provides access to iRODS servers using the WebDAV protocol. It is a bridge between the WebDAV protocol and iRODS. - -For an access using Graphical User Interface (GUI) clients to iRODS, we can and will use different tools. However in this page we will see how to use mapping a network drive. - -From a web browser side a simple directory index is used as an interface. Its purpose is only to list and to view. It is not intended to download and upload data. - -Web Browser-Directory Index ---------------------------- - -A connection through Davrods is available at https://irods.hpc.kuleuven.be:8443/ address. After you click the link you can login using your vsc-account and a password, which should be obtained at https://irods.hpc.kuleuven.be/. -To get the password you simply log in the mentioned address by passing VSC authentication layer. - -Therefore the first step is to acquire password. To do so, simply click the link provided above or copy it and then “paste and search for” on your favorite web browser. - -.. image:: webdav/vsc_authentication_page.png - -Once you reach the screen above, choose a login provider that is relevant to you. On the coming page you will see a very long password highlighted on the picture below. Copy this password correctly. - -.. image:: webdav/password.png - -After that you should go to the log in link https://irods.hpc.kuleuven.be:8443/ and see the screen below. - -.. image:: webdav/davrods_access.png - -Once you enter your user name and the password you saved, you will see the exact same directory structure as you see in your iRODS server. - -.. image:: webdav/dir_index.png - -You can now walk around directories and read the data object on your browser. \ No newline at end of file diff --git a/source/data/windows_file_explorer.rst b/source/data/windows_file_explorer.rst deleted file mode 100644 index 8f61e7b6..00000000 --- a/source/data/windows_file_explorer.rst +++ /dev/null @@ -1,35 +0,0 @@ -Mapping drive-WebDAV -==================== - -In order to perform more actions -drag and drop (install/download), rename, delete and modify- you can map your WebDAV share as a network drive. - -How to access iRODS using WebDAV on a Windows 10 pc: - -- Go to File Explorer and select This PC on the left hand pane. -- Select Computer from the top ribbon. -- Click on Map Network Drive. - -.. image:: webdav/map1.png - -.. note:: Since the password you obtained is temporary, you need to reprocess these steps after your password is expired. - -- Choose the drive name you want to use. -- Type “https://irods.hpc.kuleuven.be:8443/home/” in the "Folder" text box. This is the path that you can find in your browser directory_index. - -.. image:: webdav/map2.png - -- Click finish button. - -.. image:: webdav/map3.png - -- Enter your user name vscXXXXX. -- Paste the password you obtained and saved earlier. -- Click “Ok”. - -.. image:: webdav/map4.png - -- You should be able to see your connection in network locations. - -.. image:: webdav/map5.png - -Once you're connected, the WebDAV directory is mounted in your local pc. After you click on the WebDAV directory you will see your iRODS collections and data objects. You can now start adding, editing and deleting files in this directory from the comfort of your computer. \ No newline at end of file diff --git a/source/data/winscp/winscp1.png b/source/data/winscp/winscp1.png deleted file mode 100644 index 2492d273..00000000 Binary files a/source/data/winscp/winscp1.png and /dev/null differ diff --git a/source/data/winscp/winscp2.png b/source/data/winscp/winscp2.png deleted file mode 100644 index 8189fe24..00000000 Binary files a/source/data/winscp/winscp2.png and /dev/null differ diff --git a/source/data/winscp/winscp2_old.png b/source/data/winscp/winscp2_old.png deleted file mode 100644 index 9e3e8267..00000000 Binary files a/source/data/winscp/winscp2_old.png and /dev/null differ diff --git a/source/data/winscp/winscp3.png b/source/data/winscp/winscp3.png deleted file mode 100644 index a84ece35..00000000 Binary files a/source/data/winscp/winscp3.png and /dev/null differ diff --git a/source/data/winscp_access_irods.rst b/source/data/winscp_access_irods.rst deleted file mode 100644 index a5a8ddce..00000000 --- a/source/data/winscp_access_irods.rst +++ /dev/null @@ -1,47 +0,0 @@ -.. _winscp_access_irods.rst: - -Using WinSCP to access iRODS -=================================== - -Another tool to upload/download data to/from iRODS through a Graphical User Interface (GUI) is WinSCP (Windows Secure Copy). WinSCP is an open source free (SFTP client, FTP client, WebDAV client, S3 client and SCP client) for Windows. Its main function is file transfer between a local and a remote computer. - -Installing and First Time Configuration of WinSCp -------------------------------------------------- - -- Visit the official site at https://winscp.net/eng/index.php. - -- Click the download icon. - -.. image:: winscp/winscp1.png - -- Open the WinSCP.exe file and follow the installation procedure at the following link https://winscp.net/eng/docs/guide_install. Complete your install in accordance with your institution's application installation policy. - -- After you complete your installation, run the program. -- Click 'New session' to store a login -- Choose the required options and fill the blank fields with the corresponding information as you see on the screen below. - -.. image:: winscp/winscp2.png - -- Write the password down that you get at https://irods.hpc.kuleuven.be/. Do not enter the password in the Password text box. As it is quite long, it will be truncated by WinSCP and it is only valid for 4 hours anyway. - -.. note:: Since the password you obtained is temporary, you need to get a new one every time your password has expired. - -- The first time you make the connection, you will be prompted for the password and asked to ‘Continue connecting and add host key to the cache’; select ‘Yes’. - -- You can choose/set your remote directory for ease of use. - - -Upload/Download Data to/from iRODS using WinSCP ------------------------------------------------ - -- On the WinSCP screen, the right pane shows our connection to iRODS, and the left pane shows our local directories. - -- To upload data from local to iRODS, simply drag a file or a folder on the left pane and drop it in the place we want on the right pane. - -- To upload data from iRODS to our local directory, we drag data object(s) or collection(s) from the right pane and drop them in the place we want on the left pane. - -.. image:: winscp/winscp3.png - -- We can use WinSCP on both local and iRODS to create/delete/rename a file or folder. - -- It is also possible to edit a file with a GUI editor to easily change content. This is not possible with iCommands. \ No newline at end of file diff --git a/source/data/workflow_automation.rst b/source/data/workflow_automation.rst deleted file mode 100644 index ead9aba7..00000000 --- a/source/data/workflow_automation.rst +++ /dev/null @@ -1,231 +0,0 @@ -.. _workflow_automation.rst: - -Workflow automation -=================== - -iRODS provides a Rule System to automate data management tasks. Each organisation or project has its own policies and needs with regards to file housekeeping and metadata extraction. Examples of file operations are making regular backups, checking file integrity, cleaning up permissions, emptying the trash,... Metadata can be extracted from data files or accompanying text files to be stored to the iCAT catalog. This frees human users of having to apply repetitive actions, and ensures policies are applied consistently and without error. - -This is made possible by the Rule Engine Plugin Framework (REPF), which executes the rules and keeps track of the execution state of all active rules (as rules can be immediate, have a delay or a condition). The framework keeps track of both User-level rules and System-level rules. - -User-level rules are stored locally and manually invoked by any user using the ``irule`` command, which runs it in the iRODS server. They are meant for personal or group use (if the user is part of that group) and are typically simple 'one-shot' workflow operations. System-defined rules are stored server-side by an iRODS developer or administrator and automatically invoked by the Rule Engine when a certain condition is met. This condition is called an 'event' or more formally a Policy Enforcement Point (PEP). They are meant for consistent data management of a whole zone or for complex group/project data management tasks. - -Rules can be invoked in three ways: by directly calling the ``irule`` command on a rule file (only for User-level rules); by reaching a Policy Enforcement Point, triggering a System-level rule; or periodically via Delay rules (both for User-level and System-level rules). Resource-heavy, time-intensive processes are best executed as a system-level delay rules. - -The Python iRODS Client (PRC) is executed client-side, making it somewhat less efficient than iRODS rule execution. Although both offer overlapping functionality, the delay mechanism used by iRODS rules is more graceful and these rules get stored centrally in the REPF. It is also possible to invoke a rule via the PRC. - -This article focuses only on User-level rules. If you have more complex data processing pipelines, the Tier-1 Data Team (FOZ-RDM) at KU Leuven can create System-level rules for you. Please contact us at data@vscentrum.be. - -Rule syntax ------------ - -Rules can be written in Python, C++ or the iRODS Rule language. However, normal users can (for the time being) only execute rules written in the iRODS Rule language (either User-level or System-level rules). The iRODS Rule language is a domain specific scripting language composed of simple building blocks. It uses curly brackets, # for comments, variable names start with '*' and strings are enclosed by ' or ". The documentation for the language can be found here: https://docs.irods.org/4.2.10/plugins/irods_rule_language/. - -:: - - rulename{ # rulename, like 'extractMetaDataRule', followed by the body block - on(condition){ # condition is optional - delay(delay){ # delay is optional - list of actions - } - } # condition is optional - } - - input INPUT # input is optional - output OUTPUT - - -A rulefile can contain more than one rulename blocks, although only the first one gets executed. The second one can get called by the first one in the list of actions and its output can be saved as a variable for the caller. Some of the elements are optional. A rule without condition or delay gets executed immediately, once. 'Input' variables are optional (or can be set to null) and can be used to store a global variable for all rules inside the rulefile or prompt for user input. - -It is not illegal syntax to use a condition in a user-level rule, but the conditions can't be used to track system events similar to a what a PEP does. - -Executing rules ---------------- - -This rule prints 'Hello World' to the terminal: - -:: - - helloWorldRule{ - writeLine('stdout', 'Hello World!'); - } - - output ruleExecOut - - -``output ruleExecOut`` indicates that this rule is executable. Save this code in a file called myFirstRule.r and call it by issuing: - -:: - - $ irule -F myFirstRule.r - -As said, we can call rules like this, just as if they would be functions: - -:: - - firstRule{ - writeLine('stdout', 'This is my first rule'); - secondRule() - } - - secondRule{ - writeLine('stdout', 'This is my second rule'); - } - - output ruleExecOut - -Variables and user input ------------------------- - -Variables can be assigned to in the body block or in an input variable. You can concatenate string variables with the '++' operator. - -:: - - advancedHelloWorldRule{ - #you can define a variable here - *var1="Hello "; - writeLine('stdout', *var1 ++ *var2); - } - - #you can also define a variable here - input *var2="World" - output ruleExecOut - - -We can also prompt the user for input by assigning a variable to '$', with an optional default value. In the next example, the ``greeting1`` variable can be either set to what the user types or kept at its default value 'Hello' by hitting enter. When typing your prompted value, it should be enclosed with single or double quotes. Also note that variables are expanded in quoted strings. - -:: - - evenMoreAdvancedHelloWorldRule{ - writeLine("stdout","User says '*greeting1 *greeting2'") - } - input *greeting1 = $'Hello', *greeting2 = $'World' - output ruleExecOut - - -There are also session state variables, for instance to retrieve the active user: - -:: - - veryAdvancedHelloWorldRule{ - writeLine("stdout","$userNameClient says '*greeting1 *greeting2'") - } - input *greeting1 = $'Hello', *greeting2 = $'World' - output ruleExecOut - - -Another useful session state variable for User-level rules is ``$rodsZoneClient`` for the zone name. There are other session variables (like ``$collName``, ``$objPath``, ``$dataType``, ``$dataSize``, ``$chksum``,...) but these are only useful for System-level rules as they are out of scope in a User-level rule. - -Querying iRODS --------------- - -Just like in the iquest iCommand and with the PRC we can query iCAT and retrieve matching fields for entities (data objects or collections). These fields are called 'Persistent State Information'. Rules can also access 'Session state information', such as the ``$userNameClient`` variable above. To see which persistent fields are available, use ``iquest attrs``. - -The following rule prints all data objects whose logical path contains the word 'test'. Note that COLL_NAME is the whole path with the collection name at the end: - -:: - - queryRule{ - foreach(*i in SELECT COLL_NAME, DATA_NAME WHERE COLL_NAME like '%test%'){ - *coll = *i.COLL_NAME; - *data = *i.DATA_NAME; - writeLine("stdout", "*coll/*data"); - } - writeLine("stdout", "listing done"); - } - -Microservices and custom functions ----------------------------------- - -iRODS already provides a whole library of functions to interact with it via the Rule system, called microservices. Microservices are written in C within the iRODS source code. These can be called in the rule body as any other action. - -You can find an overview of all available microservices in the `iRODS documentation `__ under the tab `Doxygen `__. These pages also contain their function arguments and types. - -There are microservices for rule management, manipulating data objects, collections and their metadata, managing the iCAT database,... It also includes basic functions like email (``msiSendStdoutAsEmail``), string and key-value manipulation. The following example creates a new collection: - -:: - - createCollRule { - *path="/$rodsZoneClient/home/$userNameClient/newCollection"; - msiCollCreate(*path, 0, *Status); - writeLine("stdout", "Collection *path created"); - - } - output ruleExecOut - -You can of course also save data objects from a local disk with the ``msiDataObjPut`` microservice. As an input variable you should use the absolute path of a file. The second argument for ``msiDataObjPut`` is the iRODS resource where you want to save the file. A resource, or storage resource, is a software/hardware system that stores digital data. You can identify the available resources with the ``ilsresc`` command. - -:: - - createDORule { - *path="/$rodsZoneClient/home/$userNameClient/newCollection" - *destName="test.txt" - writeLine("stdout", "Saving file *path/*destName ...") - msiDataObjPut("*path/*destName","default","localPath=*file++++forceFlag=",*Status) - writeLine("stdout", "File *path/*file created") - } - - input *file="/home/x/y/z/test.txt" - output ruleExecOut - - -In your rulefile, you can define functions to contain oft-used functionality. Functions can be thought of as microservices written in the rule language and are called similarly. It's also possible to pass variables to a function, and let it return its result. - -:: - - functionRule { - *c = sq(5) - writeLine('stdout',*c) - } - - sq(*a){ - *b=*a * *a - *b - } - - input null - output ruleExecOut - - -Delayed execution rules ------------------------ - -A rule action can be executed (as a System-level rule or with ``irule``) at a certain point in the future by delaying it or scheduling it at a certain time. To express this, a timing syntax based on XML is provided: - - - ET: Absolute time when something should be performed, for instance at 8:00 PM: 20:00. - - PLUSET: Delay execution for a certain amount of time from now, for instance 10s or 1m. - - EF: Perform execution every n time units, for a certain amount of time. The default is forever. For instance, 1d for daily. - -The full syntax is provided `here `__. - -:: - - backupRule{ - delay("00:001d"){ - msiTarFileCreate(*file,*coll,*resource,*flag); - writeLine("stdout","Created tar file *file for collection *coll on resource *resc"); - } - } - input *file="/$rodsZoneClient/home/$userNameClient/backup_newCollection.tar", *coll="/$rodsZoneClient/home/$userNameClient/newCollection", *resource="default", *flag="force" - output ruleExecOut - -This backs up the provided collection daily at midnight. You can test this delay rule has been executed by replacing it with '00:001m' and calling ``ils -l`` to see the timestamp changing. - -The following example syncs between 2 collections in 10 seconds from now and repeats it hourly, forever: - -:: - - syncRule{ - delay("10s1h"){ - msiCollRsync(*srcColl,*destColl,*resource,"IRODS_TO_IRODS",*Status); - writeLine("stdout","Synchronized collection *srcColl with collection *destColl"); - } - } - - input *srcColl="/$rodsZoneClient/home/$userNameClient/newCollection", *destColl="/$rodsZoneClient/home/$userNameClient/newCollection_sync",*resource="default" - output RuleExecOut - -There are three useful iCommands to track the active delayed rules: - -- ``iqstat``: show the queue status of delayed rules, and note the id -- ``iqmod``: modify certain values in existing delayed rules (owned by you) -- ``iqdel``: remove a delayed rule (owned by you) from the queue, by giving the id \ No newline at end of file diff --git a/source/index.rst b/source/index.rst index ebe4bafd..550c435f 100644 --- a/source/index.rst +++ b/source/index.rst @@ -16,7 +16,7 @@ Welcome to VSC documentation software/software_development hardware globus/globus_main_index - data/tier1_data_main_index + tier1data/index.rst faq Feel free to contact :ref:`user support `. diff --git a/source/tier1data/basics.rst b/source/tier1data/basics.rst new file mode 100644 index 00000000..fded2d3f --- /dev/null +++ b/source/tier1data/basics.rst @@ -0,0 +1,9 @@ +Basics +============================= + +.. toctree:: + :maxdepth: 2 + + basics/introduction_to_tier1data + basics/architecture + basics/user_access diff --git a/source/tier1data/basics/architecture.rst b/source/tier1data/basics/architecture.rst new file mode 100644 index 00000000..6c3bbb78 --- /dev/null +++ b/source/tier1data/basics/architecture.rst @@ -0,0 +1,42 @@ +.. _architecture: + +Tier-1 Data Architecture +================== + +The Tier-1 Data platform is based on the open source software iRODS. +The image below shows the high level architecture of the platform: + +.. figure:: ../images/introduction/tier1data_architecture.png + :alt: A schematic representation of the architecture of Tier-1 Data with, from bottom to top: the storage layer, the layer of iRODS middleware, and the different clients users can use. + +An single installation of iRODS is called a 'zone'. +By default, new projects are creates in our general zone 'VSC'. +Separate zones can be created for confidentiality or efficiency reasons for different projects. + +Each zone has a Rule engine and an iCAT +database, which contains metadata, permissions and an index of all file locations. +Inside a zone, each project has their own folder. + +The data itself is synchronized on two separate hardware storage +systems, each 27 PB large, located at Leuven and at Heverlee (ICTS KU Leuven). +The data is protected against calamities at either site by synchronizing it in real-time at hardware level. +One system does not function as a backup for the other, so this is no protection against accidental instructions +(i.e. user mistakes) to delete data. +Snapshots are made at regular intervals (hourly, daily and monthly) in case data needs to be recovered. +For data recovery in case of emergency, please contact data@vscentrum.be. + +An iRODS zone can be accessed from different systems: + +- The HPC clusters of the Flemish Supercomputing Center (VSC) +- Your laptop +- Scientific instruments like scanners and microscopes +- ... + +From each of these systems, access is facilitated by a variety of clients: + +- Programming clients + + :ref:`iCommands` is a package that gives users a command-line interface to operate on data in iRODS. + + The :ref:`Python-iRODSClient (PRC)` is a Python client API that can make a secure connection to iRODS so that you can integrate your iRODS data interactions within your (existing) python programs. +- The :ref:`ManGO Portal `: the web front-end for Tier-1 Data, which provides a very user-friendly approach. + +Finally, data exchange between zones and externally is possible thanks to Globus. More information can be found in the :ref:`Globus ` documentation. diff --git a/source/tier1data/basics/introduction_to_tier1data.rst b/source/tier1data/basics/introduction_to_tier1data.rst new file mode 100644 index 00000000..3b36fe11 --- /dev/null +++ b/source/tier1data/basics/introduction_to_tier1data.rst @@ -0,0 +1,46 @@ +Introduction to Tier-1 Data +===================== + + +More and more VSC users have computational tasks that make intensive use of large data sets. +The Tier-1 Data platform of the VSC enables researchers to store their data close to the computing infrastructure during the active phase of their research projects. +Additionally, it gives them tools to manage data in a `FAIR `_ and efficient way from start to finish. + +The Tier-1 Data platform is based on the open source software iRODS. +Researchers can manage data via different clients, such as a command-line interface, a Python API or a web interface. + +Tier-1 Data provides four core competencies: + +- **Storage** + + Data is stored securely in the data centers of KU Leuven. + Of each file, two copies are stored: one in the datacenter of Heverlee, and one in the datacenter of Leuven. + + +- **Metadata** + + Data can be described by adding metadata, either on the file or folder level. + This metadata can serve multiple purposes: + + - Provide context to the data + - Make data findable via the search interface + - Play a role in workflow automation + + Metadata can be added manually, but also via automation (e.g. extraction of metadata from file headers) or by using metadata schema's in the web interface. + +- **Automation** + + Tier-1 Data has a a Python API and a command-line interface, which can be integrated easily into your existing code and jobscripts. + This way, you can easily integrated data movement and data management actions in your existing HPC workflow. + + Tier-1 Data's servers also have event triggers called Policy Enforcement Points (PEPs), which trigger every time a certain type of action is taken (e.g. a user uploads a file). + Administrators can define processes that run each time one of these PEPs is triggered. This allows us to work together with users and + create powerful, automated workflows that help to save time and prevent human errors. + +- **Secure Collaboration** + + In Tier-1 Data, you can share data with users and user groups via a system of permissions. + Permissions can be managed on file and folder level, allowing for detailed access control. + If you have collaborators without a VSC account, data in Tier-1 Data can be shared via the tool Globus. + + diff --git a/source/tier1data/basics/user_access.rst b/source/tier1data/basics/user_access.rst new file mode 100644 index 00000000..bd383bc6 --- /dev/null +++ b/source/tier1data/basics/user_access.rst @@ -0,0 +1,34 @@ +User Access to Tier-1 Data +==================== + +If your research group is interested in using Tier-1 Data, you need to request a Tier-1 Data project. +This is done by submitting a project application before one of the cut-off dates each year. +Importantly, each collaborator should have a VSC-account. + +For more information on this procedure, see https://www.vscentrum.be/data. + +Users can connect to Tier-1 Data by using different clients from any computer after logging in with their institutional account. +For example, VSC users from UAntwerpen will be forwarded to the UAntwerpen login page. + +The landing page for all Tier-1 Data clients is our web front-end, the `ManGO portal `_. +After logging in, you will get an overview of all zones you have access to. +By clicking on 'Enter portal', you will go to the ManGO portal for that zone. +If you prefer to access Tier-1 Data via a different client, you can find the necessary credentials or code under 'How to Connect'. + +For :ref:`iCommands`, you need a Linux client environment on a linux based operation system (Linux distros or :ref:`wsl`) +with iCommands installed. +This client has been installed on most of our HPC systems. + +For :ref:`the Python programming client (PRC) `, you need at least an installed Python release and the PRC itself. +This suffices for a connection with the default password duration of 60 hours. +However, it is also possible to log in with a password of long duration (7 days) if you also have a Linux client environment +with iCommands installed. + + +For more information on how to install and use each client, see the `clients <../clients.html>`_ section. + +Once logged in, Tier-1 Data users can find their group folder at ``//home/``. +This area is shared with and visible to all members of your group, but can be further subdivided in subfolders +with more specific permissions. + + diff --git a/source/tier1data/clients.rst b/source/tier1data/clients.rst new file mode 100644 index 00000000..0d47afd7 --- /dev/null +++ b/source/tier1data/clients.rst @@ -0,0 +1,10 @@ +Clients +============================= + +.. toctree:: + :maxdepth: 2 + + clients/icommands + clients/python_client + clients/mango_portal + diff --git a/source/tier1data/clients/icommands.rst b/source/tier1data/clients/icommands.rst new file mode 100644 index 00000000..b7b30286 --- /dev/null +++ b/source/tier1data/clients/icommands.rst @@ -0,0 +1,243 @@ +========= +iCommands +========= + + +iCommands is a command-line interface for iRODS, the open-source +software behind Tier-1 Data. For those who are familiar with Unix command-line +programs, it is a powerful and easy to use tool. + +Installation +============ + +iCommands are already installed on the following HPC clusters: + +- Genius +- wICE +- Hydra + +You can of course also install iCommands on your local system. +However, iCommands is only available for Linux environments. +To get one, Windows users might consider installing Windows Subsystem for Linux (WSL). + +You can install iCommands on different distributions as follows: + +Centos 7: +--------- + +.. code:: sh + + # Installing prerequisites + yum update + yum install wget sudo + + # Add the iRODS repository to your package manager (if you haven't done so already) + sudo rpm --import https://packages.irods.org/irods-signing-key.asc + wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo + + # Installing iCommands + yum install irods-icommands + +Almalinux 8/Rocky Linux 8: +-------------------------- + +.. code:: sh + + # Installing prerequisites + yum update + yum install wget sudo + + # Add the iRODS repository to your package manager (if you haven't done so already) + sudo rpm --import https://packages.irods.org/irods-signing-key.asc + wget -qO - https://packages.irods.org/renci-irods.yum.repo | sudo tee /etc/yum.repos.d/renci-irods.yum.repo + + # irods runtime needs to be installed manually because of https://github.com/k3s-io/k3s/issues/5588 + yum install irods-runtime + + # Installing iCommands + yum install irods-icommands + +Debian 11: +---------- + +.. code:: sh + + # Installing prerequisites + apt-get update + apt-get install wget lsb-release sudo gnupg + + # Add the iRODS repository to your package manager (if you haven't done so already) + wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - + echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/renci-irods.list + sudo apt-get update + + # Installing iCommands + apt-get install irods-icommands + +Ubuntu 18/20: +------------- + +.. code:: sh + + # Installing prerequisites + apt-get update + apt-get install wget lsb-core sudo + + # Add the iRODS repository to your package manager (if you haven't done so already) + wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - + echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/renci-irods.list + sudo apt-get update + + # Installing iCommands + apt-get install irods-icommands + +Ubuntu 22: +---------- + +.. code:: sh + + # Installing prerequisites + apt-get update + apt-get install gnupg wget sudo + wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2_amd64.deb + sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2_amd64.deb + + # Add the iRODS repository to your package manager (if you haven't done so already) + wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add - + echo "deb [arch=amd64] https://packages.irods.org/apt/ focal main" | sudo tee /etc/apt/sources.list.d/renci-irods.list + sudo apt-get update + + # Installing iCommands + apt-get install irods-icommands + +Authenticating +============== + +To authenticate, go to the `ManGO portal `__ +and log in. Click on ‘How to connect’ next to your zone, copy the code +under ‘iCommands for Linux’ and paste it into your terminal. This should +authenticate your for 168 hours. + +Getting help +============ + +iCommands has a built-in documentation, which you can access from the +command line. The command ``ihelp`` gives an overview of all commands, +with a brief description. + +To get the documentation for a specific command, you can either type +``ihelp `` or ``command -h``. + +Similarities with UNIX commands +=============================== + +To people who are used to working on a Linux command line, iCommands +will instantly feel familiar. Many unix commands have a direct Unix +counterpart. While the Unix commands work on the local workspace, the +iCommands work on the data in Tier-1 Data. + ++-------------------------------+-----------------------+-------------+ +| Unix command | iCommand | use | ++===============================+=======================+=============+ +| cd | icd | change | +| | | current | +| | | working | +| | | directory | +| | | /collection | ++-------------------------------+-----------------------+-------------+ +| pwd | ipwd | show the | +| | | current | +| | | working | +| | | directory | +| | | /collection | ++-------------------------------+-----------------------+-------------+ +| ls | ils | list the | +| | | current | +| | | working | +| | | directory | +| | | /collection | ++-------------------------------+-----------------------+-------------+ +| mkdir | imkdir | create | +| | | directory | +| | | /collection | ++-------------------------------+-----------------------+-------------+ +| cp | icp | copy a | +| | | file/data | +| | | object or | +| | | collectio | +| | | n/directory | ++-------------------------------+-----------------------+-------------+ +| mv | imv | move a | +| | | file/data | +| | | object or | +| | | collectio | +| | | n/directory | ++-------------------------------+-----------------------+-------------+ +| chmod | ichmod | changing | +| | | permissions | ++-------------------------------+-----------------------+-------------+ +| … | … | … | ++-------------------------------+-----------------------+-------------+ + +Just like Unix commands, iCommands work with both absolute and relative +paths. For example, to go from ``//home/`` to +``//home//raw_data`` you can use both of the following +options: + +.. code:: sh + + icd raw_data + + icd //home//raw_data + +Like with Unix commands, you can use ``.`` to refer to the current +working collection, and ``..`` to refer to one level above the current +collection. + +An important difference is that iCommands have no shell expansion. If +you try to use autocompletion with iCommands, or use wildcards (*), it +will be filled in based on the data in your local directory. This can +yield unexpected results. + +Uploading and downloading data +============================== + +To upload data from your local directory to Tier-1 Data, you can use the +command ``iput``. You can upload individual files but also whole +directories, by using the ``-r`` option, which stands for ‘recursive’. + +.. code:: sh + + iput my_file.txt + iput -r my_directory + +You can optionally specify a destination as second argument. If you +leave the destination blank, iput will take the current working +collection as destination by default. + +To download data objects or whole collections from Tier-1 Data to your local +directory, you can use the command ``iget``: + +.. code:: sh + + iget my_data_object.txt + iget -r my_directory + +``iget`` downloads data to your current working directory, unless you +specify another destination as second argument. + +It is also possible with iCommands to sync a local directory and a +collection in Tier-1 Data with the command ``irsync``. This command makes a +comparison between the data on both sides. Any data from the source +which is missing in the destination, is transferred. If files are +present in both the source and destination, ``irsync`` will calculate +checksums to see whether the version in the destination is still up to +date. + +.. code:: sh + + # syncronizing data from a local directory to a Tier-1 Data collection + irsync -r local_directory i:collection + + # syncronizing data from a Tier-1 Data collection to a local directory + irsync -r i:collection local_director diff --git a/source/tier1data/clients/mango_portal.rst b/source/tier1data/clients/mango_portal.rst new file mode 100644 index 00000000..6f069037 --- /dev/null +++ b/source/tier1data/clients/mango_portal.rst @@ -0,0 +1,175 @@ +.. _mango-portal: + +ManGO Portal +************* + +The `ManGO portal `_ is a graphical web interface for Tier-1 Data. +It allows users to manage their data in an intuitive way, without any installations, +with a strong focus on managing :ref:`metadata`. + +When logging in, you will be redirected to the login page of your institution. +This takes you to an overview with one or multiple zones with at least one of the following options: + +- Select 'Enter portal' to enter that zone via the ManGO portal. +- Select 'How to Connect' to get credentials for logging in to other clients, like :ref:`iCommands` or the :ref:`the PRC`. +- Selecting the downward arrow opens an overview of all projects you are member of in that zone. Clicking on the project name sends you to the project management page. + +If you select the first option, you will be sent to the ManGO Portal home page: + +.. image:: ../images/mango_portal/mango_portal_main_page.png + :width: 1000 + :alt: The main page of the ManGO portal + + +This main page has four main tabs: + +- **Collections**, where you can manage the data in your collections. +- **Search**, where you can search for data objects and collections based on different criteria. +- **Trash**, where you can inspect and manage your :ref:`trash collection `. +- **Metadata schemas**, where you can view and mana metadata schemas. + +Managing collections and data objects +===================================== + +The Collections tab provides an overview of the collections you have access to and lets you browse through them. +Clicking on the name of a subcollection shows you its contents under the "Content" tab: both subcollections and data objects. +If you click on a data object instead, you go to its page, which does not have such a "Content" tab. + +Above the name of the current collection or data object, a breadcrumb menu allows you to go back to a collection on top of it. +Next to the name, clicking on the pencil icon allows you to rename the active collection or data object. + +The page of a collection also includes buttons to create a new collection and upload files, +which are only available if you have the right :ref:`permissions`. +Also depending on your permissions you may see a dropdown menu of bulk actions, allowing you to move, copy or delete one or more +collections or data objects at a time. +To do so, click on the checkbox next to the names of the collections or data objects, select your desired action and click on "apply". +When copying or moving, a list of collections will appear where you can browse to select a destination. +Notice that only data objects, not collections, can be copied. + +Data objects do not have a "Content" tab, as they cannot contain any objects. +Instead, they have a "System properties" tab, which contains some basic information about the object. +They also provide a "Preview" tab in which, depending on the file type, you may be able to see the contents of your data object. + +Next to these specific tabs, both collections and data objects have a :ref:`"Metadata" ` and :ref:`"Permissions" ` tab +for inspection and management of metadata and permissions respectively. + + +Uploading and downloading data +------------------------------ + +Clicking on 'Upload files...' opens a white box, where you can put one or multiple files to be uploaded to the current collection: + +- By dragging files from your local pc into the white box. +- By clicking inside the white box, which opens your file explorer, where you can select your files. + +.. image:: ../images/mango_portal/mango_portal_upload.png + :width: 400 + :alt: Uploading files via the ManGO portal + +If you made a mistake, you can click on 'Remove file' under the file in question. +When you are ready, click on 'Start uploading files' to upload your selection. + +To download a data object, from the page of that object you click on the download-icon next to its name. +Alternatively, you can download it from the page of its parent collection by clicking on the download +icon next at the end of its row. + +.. image:: ../mango_portal/mango_portal_download.png + :width: 400 + :alt: Downloading files via the ManGO portal + +Uploads and downloads via the ManGO portal are limited to 1GB and 20GB per file respectively. +You can upload multiple files at the same time, but you can only download data objects individually. +It is currently not possible to upload or download entire folders at once. +If you want to transfer larger amounts of data via a graphical interface, you can use `Globus `_. + +.. _edit-permissions: + +Permissions +=========== + +To view the :ref:`permissions` on a collection or data object, click on it and then go to the tab 'Permissions'. + +.. image:: ../images/mango_portal/mango_portal_permissions.png + :width: 800 + :alt: An overview of the permissions on a collection + + +If you have 'own' permissions yourself, you can add new permissions at the bottom of the page, remove permissions by clicking on the trash bin icon, +and switch on/off the inheritance for collection permissions. + +You can give permissions to any group that you are member of. +To do so, select the group and the rights you want to give from their respective dropdown menus and click on 'apply.' +If you are applying permissions to a collection, you can also indicate whether to apply the permissions recursively. + +.. _edit-metadata: + +Metadata +======== + +Every collection or data object has its own :ref:`metadata` tab. +When you click on this tab, you can see all metadata which is added to the object. + +.. image:: ../images/mango_portal/mango_portal_metadata_overview.png + :alt: getting an overview of the metadata on an object + + +If only manual metadata is added, you get one overview of all the metadata. +However, if the metadata comes from multiple sources, the overview is split into tabs: + +- Metadata added via schemas can be found in the tab with the name of the respective schema +- Metadata added via automatic extraction can be found in the tab 'analysis' +- Manually added metadata can be found in the tab 'other' + + +On the right ride of each AVU, you may see the icons of a blue pencil and a red trashbin. +Clicking on the former allows you to overwrite the AVU, while the latter allows you to delete it. +If you do not have rights to edit/delete this metadata, these buttons may be absent. + + +Adding metadata manually +-------- + +To add metadata manually, click on 'Add metadata' under the list of existing AVUs. +This creates a window where you can freely add any AVU you want. + +.. image:: ../images/mango_portal/mango_portal_metadata_manual.png + :alt: Adding metadata manually + + +Adding metadata via schemas +---------------------------- + +If you or one of your colleagues has created and published a metadata schema, you can apply it to a collection or data object. +To do so, select the schema name from the dropdown under the metadata overview, and click on 'apply schema'. + + +For more information about creating schemas, see the section on :ref:`metadata schemas` + +.. image:: ../images/mango_portal/mango_portal_metadata_schema.png + :alt: Adding metadata via a schema + :width: 500 + +This will open a form where you can fill in the metadata that the creator specified. + +.. image:: ../images/mango_portal/mango_portal_metadata_schema_2.png + :alt: Adding metadata via a schema (2) + :width: 500 + +Metadata extraction +------------------- + +To extract metadata from inside a data object, go to the tab 'Metadata inspection and extraction' of that object. +When you click on 'Inspect with Tika', you will get an overview of all metadata which Apache Tika could find inside. + +To actually add this information as metadata to the object, click on the checkbox behind the elements you are interested in, and click on 'Add selected metadata items as regular metadata'. +Now this information will appear in the metadata overview, and will also be searcheable. +Note that you cannot edit metadata added via metadata extraction: you can only delete it. + +Analysis by Apache Tika may also give an OCR (Optical Character Recognition) reading, which is an overview of all text recognized in e.g. an image. +This feature is a proof of concept, and this information can currently not be added as metadata. + + +Searching +========= + +**Work in progress** diff --git a/source/tier1data/clients/python_client.rst b/source/tier1data/clients/python_client.rst new file mode 100644 index 00000000..1f10c002 --- /dev/null +++ b/source/tier1data/clients/python_client.rst @@ -0,0 +1,424 @@ +.. _python-client: + +Python-iRODSClient - PRC +======================= + +The Python-iRODSClient (PRC) is an API to iRODS, the underlying system behind Tier-1 Data. +The goal of the PRC is to offer researchers means to manage their data in Tier-1 Data through python. + + +Installation +------------ + +The Python-iRODSclient can be installed with pip as follows: + +:: + + pip install python-irodsclient + +On Genius and wICE, the Tier-2 HPC clusters of the KU Leuven, the Python-irodsclient (version 1.1.4.) is already installed as a module. You can install this module as follows: + +:: + + module use /apps/leuven//2021a/modules/all + module load python-irodsclient/1.1.4-GCCcore-10.3.0 + +You should replace with the architecture of the +(login) node you are on ('cascadelake', 'skylake' or 'broadwell'). + +On Hortense and Stevin, the HPC-clusters of UGent, you can load the module as follows: + +:: + + module load python-irodsclient/1.1.4-GCCcore-11.2.0 + + +Logging in +---------- + +There are three ways to authenticate with the Python-iRODSclient. + +1) Follow the instructions on the `KU Leuven ManGO portal `_ > 'How to Connect' > 'Python Client on Windows'. +This method, despit the title, should work for any operating system. + +2) Windows users can download `iinit.exe `_. +Double click on the file and enter your zone name in the window that pops up. +You might need to put the file in a folder that doesn't require administrator rights. + +3) Linux users can first authenticate with :ref:`icommands`. +Then, they create a second configuration file as follows: + +:: + + cp ~/.irods/irods_environment.json ~/.irods/irods_environment_python.json + sed -i 's/pam_password/PAM/g' ~/.irods/irods_environment_python.json + +Method 1 and 2 authenticate you for approximately 60 hours, and method 3 for approximately 7 days. + +Creating a session +------------------ + +To take actions in Tier-1 Data, you need to create an iRODSSession object. + +In a script, this can be done as follows: + +:: + + import os + import ssl + from irods.session import iRODSSession + try: + env_file = os.environ['IRODS_ENVIRONMENT_FILE'] + except KeyError: + env_file = os.path.expanduser('~/.irods/irods_environment.json') + + ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH, cafile=None, capath=None, cadata=None) + ssl_settings = {'ssl_context': ssl_context} + + with iRODSSession(irods_env_file=env_file, **ssl_settings) as session: + [your code here] + +Note: If you used the third login method, replace ``~/.irods/irods_environment.json`` with ``~/.irods/irods_environment_python.json``. + +In an interactive session you might want to replace the with statement above with: + +:: + + session = iRODSSession(irods_env_file=env_file, **ssl_settings) + +At the end of your session, you should clean up with: + +:: + + session.cleanup() + +Collections +----------- + +Via the PRC, you can retrieve any collection in Tier-1 Data as an iRODSCollection object. +This can be done as follows: + +:: + + coll = session.collections.get("/path/to/existing/collection") + + +You can also create a collection with the PRC. +This method will return an iRODSCollection object as well. + +:: + + coll = session.collections.create("/path/to/newCollection") + + +This iRODSCollection object contains serveral attributes with information about the collection: + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Attribute + - Result + - Example + * - ``coll.id`` + - The ID of the collection + - ``10074`` + * - ``coll.name`` + - the name of the collection + - ``'biology'`` + * - ``coll.path`` + - The full path of the collection + - ``'/set/home/biology'`` + * - ``coll.subcollections`` + - List of subcollections inside the collection (non-recursive) + - ``[, ]`` + * - ``coll.data_objects`` + - List of data objects inside the collection (non-recursive) + - ``[]`` + * - ``coll.metadata.items()`` + - List of metadata items attached to the collection + - ``[, ]`` + +It also contains some useful methods: + +.. list-table:: + :header-rows: 1 + :widths: 20 80 + + * - Method + - Result + * - ``coll.walk()`` + - Creates a generator object with all data objects and subcollections inside the collection (recursive) + * - ``coll.move(destination)`` + - Moves collection to the destination given as argument + * - ``coll.remove()`` + - Moves the collection and its contents to your trash collection + + + + +Data objects +--------------------------------- + +Similar to collections, data objects can be retrieved as iRODSDataObjects: + +:: + + obj = session.data_objects.get("/path/to/existing/data") + +Creating an empty data object will return an iRODSDataObject as well: + +:: + + new_obj = session.data_objects.create("/path/to/new/object") + +This iRODSDataObject object contains serveral attributes with information about the data object: + +.. list-table:: + :header-rows: 1 + :widths: 20 40 40 + + * - Attribute + - Result + - Example + * - ``obj.id`` + - The ID of the data object + - ``10074`` + * - ``obj.name`` + - The name of the data object + - ``'readme.md'`` + * - ``obj.path`` + - The full path of the data object + - ``'/set/home/biology/readme.md'`` + * - ``obj.size`` + - The size of the data object in bytes + - ``100`` + * - ``obj.metadata.items()`` + - List of metadata items attached to the collection + - ``[, ]`` + +It also contains some useful methods: + +.. list-table:: + :header-rows: 1 + :widths: 20 80 + + * - Method + - Result + * - ``obj.chksum()`` + - Calculates and stores the checksum of the object in the database + * - ``obj.open(mode)`` + - Opens the data object as a file object in Python in read ('r'), write ('w') or append 'a' mode + * - ``obj.unlink()`` + - Moves the data object and to your trash collection + +Please note that the 'open' method is not suited for heavy IO. + + +Uploading and downloading +--------------------------------- + +In most cases, users will not create empty data objects, but instead upload files from their local filesystem. +This can be done as follows: + +:: + + session.data_objects.put("/path/to/local/file", "/path/to/collection") + +If the destination refers to an (existing) collection, the PRC automatically appends the filename of the local file to the path. +However, you can also define a filename yourself, by appending it to the end of the path. + +Earlier, we saw that the function ``session.data_objects.get()`` is used to retrieve a python representation of a data object. +However, when you provide a path to the local destination as second argument, it also downloads the data object to your pc: + +:: + + session.data_objects.get('/path/to_existing/data_object', '/path/to/local/directory') + +Here as well, you can just provide the path to a directory, or specify a filename. + +Permissions +----------- + +In the PRC, you can create iRODSAccess objects, which represent a permission on a certain collection or data object. +Each iRODSAccess object has an access type, a path it applies to, and the user or group that gets access. +These permissions can be applied with ``session.acls.set()``. +If the object in question is a collection, you can apply the permission recursively by adding the argument ``recursive = True``. + +:: + + from irods.access import iRODSAccess + access = iRODSAccess("read", "/path/to/collection/or/data/object", "John") + session.acls.set(access, recursive = True) + + +You can also set or unset inheritance of a collection this way: + +:: + + # Turning inheritance on + access = iRODSAccess("inherit", "/path/to/collection") + session.acls.set(access) + + # Turning inheritance off + access = iRODSAccess("noinherit", "/path/to/collection") + session.acls.set(access) + + +You can retrieve the permissions on an object with ``session.permissions.get(object)``. +This will return a list of iRODSAccess objects: + +:: + + coll = session.collections.get("/path/to/collection") + permissions = session.acls.get(coll) + + +Lastly, you can give someone 'null' permissions to revoke their permissions on an object: + +:: + + access = iRODSAccess("null", "/path/to/collection/or/data/object", "Chris") + session.acls.set(access) + +Note that ``session.acls.set()`` and ``sessions.acls.get()`` only work for the most recent releases of the Python-iRODSclient. +For older releases, you should replace 'acls' with 'permissions'. + + +Metadata +--------------------- + +The following methods are available to work with metadata on collections and data objects: + + +.. list-table:: + :header-rows: 1 + :widths: 40 60 + + * - Method + - Result + * - ``obj.metadata.add(attribute, value, )`` + - Adds the AVU to the object. + * - ``obj.metadata.set(attribute, value, )`` + - Adds the AVU to the object. Overwrites previous AVUS with the same attribute name, if they exist. + * - ``obj.metadata.items()`` + - Returns a list of all AVUS on the object as iRODSMeta objects. + * - ``obj.metadata.remove(attribute, value, )`` + - Removes the AVU + + +If you want to add lots of metadata to the same object, it can take long to do this with one function call for each AVU. +To speed things up, the PRC offers a function that allows you to add or remove several AVU's from an object in one API call: + +:: + + from irods.meta import iRODSMeta, AVUOperation + obj.metadata.apply_atomic_operations(AVUOperation(operation='add', avu=iRODSMeta('attribute1','value1','unit1')), + AVUOperation(operation='add', avu=iRODSMeta('attribute2','value2','unit2')), + AVUOperation(operation='add', avu=iRODSMeta('attribute3','value3','unit3')) + ) + +The same can be used to remove several AVUs from an object in one call, but if you want to remove all of them there is a handier method. + +:: + + obj.metadata.remove_all() + + +Searching +------------------------ + +The PRC allows you to build queries, which are database searches for specific information about collections, data objects, metadata... +For example, to get the names and sizes of all the data objects you have access to, you can write the following query: + +:: + + from irods.models import DataObject + + query = session.query(DataObject) + for result in query: + print(result[DataObject.name], result[DataObject.size]) + + +Before you write your query, you should import the relevant classes from the module irods.models. +These are the most important classes, with some of their attributes: + + +.. list-table:: + :header-rows: 1 + :widths: 25 25 50 + + * - Class + - Represents + - Searchable attributes + * - ``irods.models.Collection`` + - A collection in iRODS + - id, name, parent_name, owner_name, inheritance, create_time, modify_time... + * - ``irods.models.DataObject`` + - A data object in iRODS + - id, collection_id, name, size, path, owner_name, status, checksum, create_time, modify_time... + * - ``irods.models.CollectionMeta`` + - A metadata AVU on a collection + - id, name, value, units, create_time, modify_time + * - ``irods.models.DataObjectMeta`` + - A metadata AVU on a data object + - id, name, value, units, create_time, modify_time + * - ``irods.models.User`` + - A user or group in iRODS + - id, name, type, zone, create_time, modify_time + +Unfortunately, Classes from iRODS.models have some attributes which can be confusing: + +- ``Collection.name`` contains the full path of the collection. +- ``DataObject.name`` contains only the name of the data object. +- ``DataObject.path`` contains the physical path of the data object, i.e. the location where the file physically is stored in the data centers. +- ``CollectionMeta.name`` and ``DataObjectMeta.name`` contain the attribute of the AVU. + +You can find the logical path of a data object by putting together its Collection.name and DataObject.name, with a slash in between. + + +You can combine different classes in one query. +For example, you can search for data objects and their parent collections as follows: + +:: + + from irods.models import Collection, DataObject + + query = session.query(Collection, DataObject) + for result in query: + print(f"{result[DataObject.name]} is part of collection {result[Collection.name]}") + +Of course, often you will want to restrict the results of your query based on some criteria. +This can be done via the `filter()` method; +for example, the following query searches for Data Objects with the AVU 'type: organic'. + +:: + + from irods.column import Criterion + from irods.models import DataObject, Collection, CollectionMeta + + query = session.query(DataObject, Collection) + query.filter(Criterion('=', DataObjectMeta.name, 'type')) + query.filter(Criterion('=', DataObjectMeta.value, 'organic')) + + for result in query: + print(result[Collection.name], result[DataObject.name]) + + +As comparison operators, for filtering, you can use: + +- '=' for exact matches +- '!=' for excluding certain terms +- 'like' for partial matches +- 'not like' for excluding certain patterns + +If you use 'like' and 'not like', you should use '%' as wildcard character. +For example, ``Criterion('like', Collection.name, '/set/home/biology%')`` will match the collection ``/set/home/biology`` and all its subcollections. +However, be aware that searching for partial matches has a higher performance cost than searching for exact matches. + +----- + +If you would like to see more details and examples, you can have a look +at the following link of original PRC documentation, +https://github.com/irods/python-irodsclient. + diff --git a/source/tier1data/favicon.ico b/source/tier1data/favicon.ico new file mode 100644 index 00000000..aa62c830 Binary files /dev/null and b/source/tier1data/favicon.ico differ diff --git a/source/tier1data/features.rst b/source/tier1data/features.rst new file mode 100644 index 00000000..1f359782 --- /dev/null +++ b/source/tier1data/features.rst @@ -0,0 +1,11 @@ +Features +============================= + +.. toctree:: + :maxdepth: 2 + + features/metadata + features/schemas + features/collaboration + features/automation + features/trash diff --git a/source/tier1data/features/automation.rst b/source/tier1data/features/automation.rst new file mode 100644 index 00000000..1c73b433 --- /dev/null +++ b/source/tier1data/features/automation.rst @@ -0,0 +1,82 @@ +=================== +Workflow automation +=================== + + +Why workflow automation? +======================== + +Research data management covers a lot of tasks, some of which are +repetitive. For example, a workflow might include pre-processing of +files, adding certain metadata, or emptying a certain directory +regularly. + +Automating repetitive tasks that are usually done manually, has two +goals. First of all, it can save users a lot of time. Secondly, it +eliminates human error. When a user needs to apply the same action to a +lot of files, mistakes are bound to happen. + +By using scripts, we can automate a lot of these processes, keeping our +hands free for other tasks. + +How to automate workflows +========================= + + +Client-side automation +---------------------- + +Users can automate workflows themselves on the client-side. +We have two clients which can be integrated easily into your HPC workflow: +- :ref:`iCommands` are bash commands and can thus be integrated into your job scripts +- Users who use Python in their HPC jobs can integrate the :ref:`Python client` in their script +This way, you can easily retrieve data to be used in calculations, store results, add metadata and do other tasks automatically when you submit a job. + +Of course, these clients can also be used outside of the HPC environment. + + + +Machine accounts +^^^^^^^^^^^^^^^^^ + +By default, authentication to Tier-1 Data is limited to 60 hours (via the Python client) or 168 hours (via iCommands). +However, some automated processes require longer authentication. + +For these cases, we created machine accounts: these are accounts that authenticate with a token, which lasts for 4 years. +To activate these machine accounts, go to the page of your project on the `ManGO portal `_, +either by clicking on the downwards arrow next to your zone name and select your project, +or by surfing to https://mango.vscentrum.be/data-platform/project/. + +At the bottom of the page, you can activate two machine accounts for your project: + +- _ingress is meant for writing operations. By default, it has write access on the top collection of your project. +- _egress is meant for reading operations. By default, it has read access on the top collection of your project. + +Click on 'retrieve API token' to get the token for one of the machine accounts. +The following page contains the token, as well as instructions to authenticate with it. +Be sure to store the token information safely, for example in a password manager. + +If you ever lose the token, or think it was compromised, you can generate a new token by clicking on 'Retrieve API token' again. +This will also invalidate the previous token. + +Server-side automation +---------------------- + +Client-side automation suits workflows when tasks can be triggered deliberately, or at specific times. +However, some workflow have tasks that should be executed when specific events happen: for example, every time a file is uploaded, every time metadata is added, every time a user downloads a file, ... +In these cases, server-side automation can prove a solution. + +iRODS, the software behind Tier-1 Data, has a rule engine which can execute processes based on events happening in the system. +Every event, from a file being uploaded to users logging in, is defined as a Policy Enforcement Point (PEP). +Administrators from the RDM team can add code to these PEPs. + +If you have a workflow which you think requires server-side automation, send your request to data@vscentrum.be. +From there, we can discuss how to automate the proposed workflow. + +Whether a proposal is accepted will depend on: + +- Complexity +- Feasability +- Potential to be generalized for use in other groups + + diff --git a/source/tier1data/features/collaboration.rst b/source/tier1data/features/collaboration.rst new file mode 100644 index 00000000..c721e59d --- /dev/null +++ b/source/tier1data/features/collaboration.rst @@ -0,0 +1,191 @@ +.. _collaboration: + +Sharing data and collaboration +============================== + +Depending on your research project, different people might need different access to the data stored in Tier-1 Data. + +Some projects are very open: everyone can work on all data in the project. +Other projects, especially those with sensitive data, might need to be more strict in who can access which data and in which manner. +Tier-1 Data allows you to manage this in detail by assigning different levels of access to *groups*, which can consist of any number of users. + + +Groups and users +---------------- +In Tier-1 Data, individual users can be put together in groups. +Currently, we distinguish between two groups: project groups and access groups. + +A **project group** is created when a research group applies for a project in Tier-1 Data. +It includes all collaborators on the project, some of which are **responsibles**, and has a shared collection. + +**Access groups** are subgroups of a project group and are used to assign specific permissions to certain users based on their roles. +For example, data providers and data analysts may need different permissions on the datasets of the project. By creating an access group for data providers +with the users that have this role you can define which permissions all data providers have based on that role (e.g. writing data to a certain collection). +The same goes for data analysts, who would need read-only access to the datasets but write-access to a collection for results. +A user that has both roles will have both sets of permissions. This way, permissions are not assigned to individuals based on who they are but based on their role in the project. + +For example: + +- Chemistry (project group) + - Mary + - John + - Chris +- Chemistry_data_providers (access group) + - John +- Chemistry_data_analysts (access group) + - Mary + - Chris + +Permissions in Tier-1 Data can be given to either individual users or groups. + +In the ManGO portal, permissions can only be applied at the group level. + + +Permissions +----------- + +In Tier-1 Data, you can apply permissions to both data objects and collection. +A permission gives a group or user a certain type of access. + +There are 4 types of access: null (no permissions), read, write, own. As the table below shows, +these permissions are cumulative: write permissions imply read permissions, and own permissions imply write permissions. + +.. list-table:: + :header-rows: 1 + + * - Action + - Read + - Write + - Own + * - View + - ✓ + - ✓ + - ✓ + * - Download (data object) + - ✓ + - ✓ + - ✓ + * - Copy + - ✓ + - ✓ + - ✓ + * - Edit/overwrite (data object) + - + - ✓ + - ✓ + * - Create new files/collections (collection) + - + - ✓ + - ✓ + * - Metadata + - View + - Edit + - Edit + * - Rename + - + - + - ✓ + * - Move + - + - + - ✓ + * - Delete + - + - + - ✓ + * - Change permissions + - + - + - ✓ + + +'Null' access allows no actions. Assigning 'null' permissions is equivalent to removing existing permissions. + +Each permission relates to a specific collection or data object. +For example, you can give a group access to a specific collection, but not to the subcollections or data objects under it. + +- Chemistry (Mary - read) + - ExperimentA (Mary - write) + - result1.txt + - result2.txt + - result3.txt + - ExperimentB + - result1.txt + - result2.txt + - result3.txt + +In this example, Mary will see the collection Chemistry and its subcollection ExperimentA, but not ExperimentB. +Furthermore, she will not be able to see the data objects inside of ExperimentA. +However, she can upload new files to this collection herself. + +It's important to stress that one object can have multiple permissions: + +- CollectionA + - GroupA: read access + - GroupB: read access + - GroupC: write access + - GroupD: own access + +This way, it is possible that a person can derive access to an object from multiple permissions. +If Mary is part of both GroupA and GroupC, she will have write access to CollectionA (because this is the highest level of access given to her). + + +Inheritance and recursiveness +----------- + +If we would have to manage permissions individually for every new collection or data object, this would take a long time. +To solve this problem, we can make use of inheritance and recursiveness. + +**Inheritance** is an attribute of a collection. If it is turned on, all collections or data objects that get created/uploaded under that collection inherit its permissions automatically: + +- Chemistry + - ExperimentA (Chemistry_data_providers - own, inheritance - on) + - Newfile.txt (Chemistry_data_providers - own) + - Newcollection (Chemistry_data_providers - own, inheritance - on) + +If inheritance is turned off, permissions from the parent collection are not applied. +The person who creates/uploads new data objects/collections gets own access by default, but no other permissions are added: + +- Chemistry + - ExperimentA (Chemistry_data_providers - own, inheritance - off) + - Newfile.txt (John - own) + - Newcollection (John - own) + +Inheritance only has an effect on data added *after* inheritance has been enabled. +If you enable inheritance for a collection, existing subcollections and data objects are not affected. + +**Recursiveness** is an attribute of an action. When you apply permissions to a collection, you can do so recursively: +in that case, the permission will be applied to all the existing contents of the collection as well. +Unlike inheritance, applying permissions recursively does not affect data which is added later. + +Access to parent collection +------------ + +In ManGO, if you want to share data with someone, they need access to all collections above it. Take the following example: + +- Chemistry + - ExperimentA + - Input + - Output + - results.csv + +If you want to share the data object results.csv with someone, they need read access to Chemistry, ExperimentA and Output in order to browse to your data object. +Without this read access, they can't even see that Chemistry and its subcollections exist. + +Some clients (like :ref:`the PRC `) allow you to access data by providing the absolute path of the data object, instead of browsing. +In this case, the user you want to share "results.csv" with only needs access to the parent collection of the data object (in this case, Output). + + +Ownership +--------- + +Every collection or data object has an owner defined in the database. +This is the user who created the collection or uploaded the data object in question. +In some cases, the owner can also be a group. + +While the terms seem similar, ownership and own permissions aren't related. +However, it should be noted that, for technical reasons, it's hard to deny the owner of an object access to it. + + + + diff --git a/source/tier1data/features/metadata.rst b/source/tier1data/features/metadata.rst new file mode 100644 index 00000000..de07ca16 --- /dev/null +++ b/source/tier1data/features/metadata.rst @@ -0,0 +1,111 @@ +.. _metadata: + +Metadata +============== + + +Tier-1 Data allows users to add descriptive information about data objects and collections in the form of metadata. +Contextualizing data with metadata has several advantages: + +- The context of the data is more clear for researchers that are not familiar with it (even yourself in the future!) +- Data can be searched back based on its metadata +- Metadata can be used to steer different processes, e.g. a collection could be tagged as 'to be archived'. + +What is metadata? +----------------- + +Metadata is often called "data about data". It describes the data in some +way, such as providing information about the content, origin, usage, +quality, condition, and associations to other data and +objects. Metadata can describe a collection, a file or a component of +file. Metadata can be embedded in a data object or stored in a database +and linked to the object it describes. Importantly, it can be used to facilitate +data discovery, improving search and retrieval. + +Metadata can be descriptive, structural, or administrative. +Descriptive metadata is intended to support data discovery and identification, +such as title, abstract and keywords. +Structural metadata describes the structure of an item, e.g. chapters, how +pages are ordered, number of pages, etc. +Administrative metadata is intended to facilitate the management and processing of the data. +Some examples are date and manner of creation, file type, resolution, +copyright information, licensing information and access privileges. + +Why metadata? +------------- + +Metadata serves a variety of purposes, with resource discovery one of +the most common. Here, it can be compared to effective cataloging, which +includes identifying resources, defining them by criteria, and categorizing them +based on these criteria. +It also facilites interoperability and the integration of resources. +Moreover, metadata can be read and understood by humas as well as machines. +In addition to supporting data discovery, metadata also organizes and provides contextual and +historical information about data objects, identifies structural +relationships within and between data objects. + +Metadata in Tier-1 Data +----------------- + + +In Tier-1 Data, metadata are stored as so-called AVUs (attribute-value-unit triples). +In many cases (when units are not used) they can be viewed as key-value pairs. + + +.. list-table:: + :header-rows: 1 + + * - Attribute name + - Value + - Units + * - Name + - Apple + - + * - Category + - Fruit + - + * - Colour + - Green + - + * - Size + - 10 + - cm + * - Sourness + - 3.5 + - Ph + +The attribute name represents the 'label' or name of some characteristic or property, +while the value, naturally, indicates the value that said property takes for the item in question. +The units are an optional field, meant for specifying measurement units. +For example, 'size 10' may not be informative enough; we may need to know in which units (meters, centimeters, parsecs...) this measurement is expressed. + +AVUs can be added both to collections and data objects. +All metadata in Tier-1 Data are stored in the database as case-sensitive strings. + +Adding metadata +--------------- + +Metadata can be added to data in Tier-1 Data in three ways: + +1) Manually adding an AVU to an object + +Adding AVUs to objects manually is the default method, and is possible via the ManGO portal, iCommands and the Python client. + +2) Automatic extraction of metadata from file headers + +Many types of files contain contextual information in their header. +In the ManGO Portal, you can inspect which contextual information is hidden in your data objects, select the information you find relevant, and add it as AVUs. + +3) Adding metadata via schemas + +Adding metadata manually can be time-consuming, and prone to human errors (like typos). +In the ManGO portal, you can create metadata schemas. +These schemas allow you to ask for specific fields and put restrictions on the answers users give. +Then, you and your colleagues can use apply these schemas to add metadata to objects via an online form. + + + + + + + diff --git a/source/tier1data/features/trash.rst b/source/tier1data/features/trash.rst new file mode 100644 index 00000000..47570c30 --- /dev/null +++ b/source/tier1data/features/trash.rst @@ -0,0 +1,69 @@ +.. _dataremoval: + +Data removal +============================== + +Removing data +---- +When a collection or data object does not need to be in Tier-1 Data any more, it can be removed. +This can only be done by an user with 'own' permission on the data object or collection in question. + +You can remove data in two ways: + +- Moving it to the :ref:`trash collection ` which is the default behavior. From there, you still have some time to recover the data, in case you made a mistake. + +- Some clients allow you to specify that you want to force the removal of the data: this means that it is removed permanently, without being moved to the trash collection, and it cannot be recovered. + +If you remove a collection, you need to do so recursively in order to remove all its contents as well. + +As mentioned :ref:`earlier `, Tier-1 Data keeps two copies of each data object (one in Heverlee and one in Leuven) as a protection against hardware failure. +However, if you remove a data object, all copies of it are removed. + +.. _trash: + +The trash collection +-------------------- + +When you remove data without force, they are moved to your personal trash collection. +This collection is located at ``//trash/home/``. + +Inside your trash, the path of the data object(s) or collection you removed is reproduced. +For example, if you remove ``//home/chemistry/results.csv``, you can find it back in ``//trash/home//chemistry/results.csv``. + +If you remove a data object or collection and an object with the same name and path is already in your trash, +ManGO appends a dot and a random number to the filename of the most recently removed object. + +ManGO automatically removes any objects that remain in trash for more than 14 days. +You can also clean out your trash collection yourself: + +- Most clients have a command to remove everything from your trash collection permanently. An example is the ``irmtrash`` command from iCommands. + +- You can also manually remove data from your trash collection as you would from any other collection. This is always considered a forceful removal. + +Restoring removed data +---------------------- + +If you want to retrieve data from your trash collection, you can move it from the trash to your destination of choice. +Both metadata and permissions will be preserved. + + +Troubleshooting +----- + +- To remove a collection, you need 'own' access on it and all its contents. + If you lack 'own' access on any of the contents, the collection cannot be removed entirely. + However, the process will still try to remove all parts you have 'own' access to. + + In the following example, since Chris doesn't have 'own' access on results.csv, + only the data object 'data.csv' and the collection 'input' will be removed: + + - chemistry (Chris - own) + + - input (Chris - own) + + - data.csv (Chris - own) + + - output (Chris - own) + + - results.csv (Chris - read) + diff --git a/source/data/iCommands/cheat_sheet.png b/source/tier1data/images/iCommands/cheat_sheet.png similarity index 100% rename from source/data/iCommands/cheat_sheet.png rename to source/tier1data/images/iCommands/cheat_sheet.png diff --git a/source/data/iCommands/nano.png b/source/tier1data/images/iCommands/nano.png similarity index 100% rename from source/data/iCommands/nano.png rename to source/tier1data/images/iCommands/nano.png diff --git a/source/tier1data/images/introduction/tier1data_architecture.png b/source/tier1data/images/introduction/tier1data_architecture.png new file mode 100644 index 00000000..90fef90c Binary files /dev/null and b/source/tier1data/images/introduction/tier1data_architecture.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_download.png b/source/tier1data/images/mango_portal/mango_portal_download.png new file mode 100644 index 00000000..5b13c0ed Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_download.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_main_page.png b/source/tier1data/images/mango_portal/mango_portal_main_page.png new file mode 100644 index 00000000..079b9be3 Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_main_page.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_metadata_manual.png b/source/tier1data/images/mango_portal/mango_portal_metadata_manual.png new file mode 100644 index 00000000..175c2076 Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_metadata_manual.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_metadata_overview.png b/source/tier1data/images/mango_portal/mango_portal_metadata_overview.png new file mode 100644 index 00000000..d0cab6cd Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_metadata_overview.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_metadata_schema.png b/source/tier1data/images/mango_portal/mango_portal_metadata_schema.png new file mode 100644 index 00000000..6cf1fc97 Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_metadata_schema.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_metadata_schema_2.png b/source/tier1data/images/mango_portal/mango_portal_metadata_schema_2.png new file mode 100644 index 00000000..0f4f0336 Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_metadata_schema_2.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_permissions.png b/source/tier1data/images/mango_portal/mango_portal_permissions.png new file mode 100644 index 00000000..092b2985 Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_permissions.png differ diff --git a/source/tier1data/images/mango_portal/mango_portal_upload.png b/source/tier1data/images/mango_portal/mango_portal_upload.png new file mode 100644 index 00000000..bf2cfafe Binary files /dev/null and b/source/tier1data/images/mango_portal/mango_portal_upload.png differ diff --git a/source/tier1data/images/metadata-schemas/001-schema-start.png b/source/tier1data/images/metadata-schemas/001-schema-start.png new file mode 100644 index 00000000..2249e00c Binary files /dev/null and b/source/tier1data/images/metadata-schemas/001-schema-start.png differ diff --git a/source/tier1data/images/metadata-schemas/002-fields-long.png b/source/tier1data/images/metadata-schemas/002-fields-long.png new file mode 100644 index 00000000..baf938f9 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/002-fields-long.png differ diff --git a/source/tier1data/images/metadata-schemas/003-editors.png b/source/tier1data/images/metadata-schemas/003-editors.png new file mode 100644 index 00000000..e66a3139 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/003-editors.png differ diff --git a/source/tier1data/images/metadata-schemas/01-realms.png b/source/tier1data/images/metadata-schemas/01-realms.png new file mode 100644 index 00000000..e5a66915 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/01-realms.png differ diff --git a/source/tier1data/images/metadata-schemas/02-no-schemas.png b/source/tier1data/images/metadata-schemas/02-no-schemas.png new file mode 100644 index 00000000..c07d95c6 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/02-no-schemas.png differ diff --git a/source/tier1data/images/metadata-schemas/03-empty-schema.png b/source/tier1data/images/metadata-schemas/03-empty-schema.png new file mode 100644 index 00000000..d4a7c2cd Binary files /dev/null and b/source/tier1data/images/metadata-schemas/03-empty-schema.png differ diff --git a/source/tier1data/images/metadata-schemas/04-fields-1.png b/source/tier1data/images/metadata-schemas/04-fields-1.png new file mode 100644 index 00000000..5a6204d3 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/04-fields-1.png differ diff --git a/source/tier1data/images/metadata-schemas/05-fields-2.png b/source/tier1data/images/metadata-schemas/05-fields-2.png new file mode 100644 index 00000000..f2398402 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/05-fields-2.png differ diff --git a/source/tier1data/images/metadata-schemas/06-add-simple-field.png b/source/tier1data/images/metadata-schemas/06-add-simple-field.png new file mode 100644 index 00000000..c12cad76 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/06-add-simple-field.png differ diff --git a/source/tier1data/images/metadata-schemas/08-title-simple-field.png b/source/tier1data/images/metadata-schemas/08-title-simple-field.png new file mode 100644 index 00000000..c53d4471 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/08-title-simple-field.png differ diff --git a/source/tier1data/images/metadata-schemas/08-title-simple-field.png.png b/source/tier1data/images/metadata-schemas/08-title-simple-field.png.png new file mode 100644 index 00000000..c53d4471 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/08-title-simple-field.png.png differ diff --git a/source/tier1data/images/metadata-schemas/09-after-adding-field1.png b/source/tier1data/images/metadata-schemas/09-after-adding-field1.png new file mode 100644 index 00000000..df1d3103 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/09-after-adding-field1.png differ diff --git a/source/tier1data/images/metadata-schemas/10-add-single-value-multiple.png b/source/tier1data/images/metadata-schemas/10-add-single-value-multiple.png new file mode 100644 index 00000000..4fa4b90a Binary files /dev/null and b/source/tier1data/images/metadata-schemas/10-add-single-value-multiple.png differ diff --git a/source/tier1data/images/metadata-schemas/11-add-new-option.png b/source/tier1data/images/metadata-schemas/11-add-new-option.png new file mode 100644 index 00000000..8b901d1a Binary files /dev/null and b/source/tier1data/images/metadata-schemas/11-add-new-option.png differ diff --git a/source/tier1data/images/metadata-schemas/15-publisher-svmc-field.png b/source/tier1data/images/metadata-schemas/15-publisher-svmc-field.png new file mode 100644 index 00000000..433e0b0f Binary files /dev/null and b/source/tier1data/images/metadata-schemas/15-publisher-svmc-field.png differ diff --git a/source/tier1data/images/metadata-schemas/16-after-adding-field2.png b/source/tier1data/images/metadata-schemas/16-after-adding-field2.png new file mode 100644 index 00000000..136602a2 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/16-after-adding-field2.png differ diff --git a/source/tier1data/images/metadata-schemas/17-add-multiple-value-multiple.png b/source/tier1data/images/metadata-schemas/17-add-multiple-value-multiple.png new file mode 100644 index 00000000..eef17457 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/17-add-multiple-value-multiple.png differ diff --git a/source/tier1data/images/metadata-schemas/22-save-draft1.png b/source/tier1data/images/metadata-schemas/22-save-draft1.png new file mode 100644 index 00000000..425ea094 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/22-save-draft1.png differ diff --git a/source/tier1data/images/metadata-schemas/24-add-composite-field.png b/source/tier1data/images/metadata-schemas/24-add-composite-field.png new file mode 100644 index 00000000..a5f36f90 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/24-add-composite-field.png differ diff --git a/source/tier1data/images/metadata-schemas/26-author-composite-field.png b/source/tier1data/images/metadata-schemas/26-author-composite-field.png new file mode 100644 index 00000000..68fab3b8 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/26-author-composite-field.png differ diff --git a/source/tier1data/images/metadata-schemas/27-view-composite.png b/source/tier1data/images/metadata-schemas/27-view-composite.png new file mode 100644 index 00000000..e3a4b5b5 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/27-view-composite.png differ diff --git a/source/tier1data/images/metadata-schemas/34-view-published.png b/source/tier1data/images/metadata-schemas/34-view-published.png new file mode 100644 index 00000000..6640617c Binary files /dev/null and b/source/tier1data/images/metadata-schemas/34-view-published.png differ diff --git a/source/tier1data/images/metadata-schemas/38-view-published2.png b/source/tier1data/images/metadata-schemas/38-view-published2.png new file mode 100644 index 00000000..c2c7a71b Binary files /dev/null and b/source/tier1data/images/metadata-schemas/38-view-published2.png differ diff --git a/source/tier1data/images/metadata-schemas/39-clone-published.png b/source/tier1data/images/metadata-schemas/39-clone-published.png new file mode 100644 index 00000000..1ca258b7 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/39-clone-published.png differ diff --git a/source/tier1data/images/metadata-schemas/40-save-clone.png b/source/tier1data/images/metadata-schemas/40-save-clone.png new file mode 100644 index 00000000..c14f4d0b Binary files /dev/null and b/source/tier1data/images/metadata-schemas/40-save-clone.png differ diff --git a/source/tier1data/images/metadata-schemas/41-apply.png b/source/tier1data/images/metadata-schemas/41-apply.png new file mode 100644 index 00000000..cb2fc95a Binary files /dev/null and b/source/tier1data/images/metadata-schemas/41-apply.png differ diff --git a/source/tier1data/images/metadata-schemas/42-apply-form.png b/source/tier1data/images/metadata-schemas/42-apply-form.png new file mode 100644 index 00000000..3fdf40f9 Binary files /dev/null and b/source/tier1data/images/metadata-schemas/42-apply-form.png differ diff --git a/source/tier1data/images/metadata-schemas/43-validation.png b/source/tier1data/images/metadata-schemas/43-validation.png new file mode 100644 index 00000000..0a15af8e Binary files /dev/null and b/source/tier1data/images/metadata-schemas/43-validation.png differ diff --git a/source/tier1data/images/metadata-schemas/44-repeatable-field.png b/source/tier1data/images/metadata-schemas/44-repeatable-field.png new file mode 100644 index 00000000..a351f27e Binary files /dev/null and b/source/tier1data/images/metadata-schemas/44-repeatable-field.png differ diff --git a/source/tier1data/images/metadata-schemas/45-view-annotation.png b/source/tier1data/images/metadata-schemas/45-view-annotation.png new file mode 100644 index 00000000..394e609e Binary files /dev/null and b/source/tier1data/images/metadata-schemas/45-view-annotation.png differ diff --git a/source/tier1data/index.rst b/source/tier1data/index.rst new file mode 100644 index 00000000..c3fb8aa9 --- /dev/null +++ b/source/tier1data/index.rst @@ -0,0 +1,10 @@ +Tier-1 Data platform +============================= + +.. toctree:: + :maxdepth: 2 + + basics + features + clients + schemas diff --git a/source/tier1data/schemas.rst b/source/tier1data/schemas.rst new file mode 100644 index 00000000..3cd82b14 --- /dev/null +++ b/source/tier1data/schemas.rst @@ -0,0 +1,11 @@ +Metadata schemas +============================= + +.. _schemas: + +.. toctree:: + :maxdepth: 2 + + schemas/metadata-schemas.rst + schemas/metadata-schemas-tech.rst + diff --git a/source/tier1data/schemas/metadata-schemas-tech.rst b/source/tier1data/schemas/metadata-schemas-tech.rst new file mode 100644 index 00000000..49f40f93 --- /dev/null +++ b/source/tier1data/schemas/metadata-schemas-tech.rst @@ -0,0 +1,531 @@ +Metadata schemas: technical specifications +################## + + +This article describes how the metadata schemas used in the ManGO portal +are stored and represented, from the folder structure that supports the +schemas lifecycle to the JSON format that codes the different fields and +their characteristics. + +`Section 1 <#sec-lifecycle>`__ gives a brief overview of the lifecycle +of a schema and how that is coded into the folder structure and filename +system. `Section 2 <#sec-json>`__ follows with a technical description +of the JSON file that represents each version of a specific schema, +followed by `Section 3 <#sec-items>`__, in which the different kinds of +fields are described. Finally, `Section 4 <#sec-full>`__ shows an +example of the JSON file for a draft version. + + If you would like to design your own metadata schema for Tier-1 Data + without using the Metadata Schema Manager, you should focus on + `Section 3 <#sec-items>`__ and create a JSON file that matches the + value of ``properties`` in the main JSON. On upload to Tier-1 Data, you + will be able to provide the name and title of your schema, and the + versioning will be taken care of in the backend. + +Before we go into these sections, here is some useful vocabulary: + +(metadata) schema + A set of rules to apply metadata in a systematic way; a collection of + *fields* with format instructions for a specific AVU. + +schema version + A specific version of a schema, with a given status. Multiple + versions of a given schema may co-exist, only one can be in a + ‘published’ status, meaning that it can be applied to data. + +field + A component of a schema with instructions for a specific AVU or for + multiple AVUs that have the same name (or prefix, in the case of a + composite field). + +to apply a schema / annotate with a schema + The action of adding or editing metadata of a data object or + collection based on a given schema. + +.. _sec-lifecycle: + +1. Lifecycle and folder structure +================================= + +In the Tier-1 Data infrastructure, schemas belong to “realms”, such as +projects (other implementations of this infrastructure could extend this +to personal collections). When designing a schema, one must first +select a realm; at the moment, a schema designed within a certain realm +can only be used to apply metadata to data of that realm. + +For each realm, there is a directory “schemas” that contains all the +schemas designed within it. Each schema has its own subdirectory, which +contains one JSON file for each existing version of the schema. In the +example used for illustration in this article, the directory would be +called “book”, which is also the ``schema_name`` attribute in the JSON +file of each version. + +There can be any number of versions of a schema, following `semantic +versioning `__ (although for now only major versions are supported), and each version can have +one of three states: “draft”, “published” or “archived”. + +The **draft** status is the first, default state of a schema, although +it is possible to publish a schema directly without saving the draft +first. It can be edited, viewed and deleted, but it cannot be applied. + +Once a draft is **published**, it cannot be edited or deleted anymore, +but it can be viewed and, more importantly, it can be applied. Attempts +to edit the published version of a schema will result in the creation of +a new draft with a higher version number. If this draft is then +published, the current published version is **archived**. +Moreover, the metadata schema manager also allows you to clone (or “copy”) a +published schema into a draft of a whole new schema with a +different name and version 1.0.0. + +Archived versions cannot be edited, deleted or used. At the moment they +cannot be viewed either, but this will be addressed in the future. A +published version can also be purposefully archived, without having to +publish a draft that replaces it. You can still have data with metadata +based on an archived version of a schema, but if you try to reapply the +metadata schema you will only be able to use the current published +version, overriding any differences between the version originally used +for annotation and the current version. + +`Table 1 <#tbl-lifecycle>`__ summarizes what can be done with a metadata +schema version depending on its stage. + +.. list-table:: Table 1: Summary of metadata schema version states in Tier-1 Data. + :name: tbl-lifecycle + :header-rows: 1 + + * -   + - draft + - published + - archived + * - when: + - on creation + - on publication + - by archiving + * - can be edited + - ✔ + - ❌ + - ❌ + * - can be viewed + - ✔ + - ✔ + - ✔ + * - can be applied + - ❌ + - ✔ + - ❌ + * - can be deleted + - ✔ + - ❌ + - ❌ + + +The name of a file corresponding to a version of a schema includes the +name, version and (unless archived) the status, accordign to the +following convention: ``{schema_name}-v{version}(-{status}).json``, with +``status`` being one of “draft” or “published”. For example, when we +first create the “book” schema used for illustration in this document, a +file will be created called “book-v1.0.0-draft.json”, which will be +stored inside the “book” subdirectory of the “schemas” directory. As +shown in `Section 2 <#sec-json>`__, the version and status are also +registered as attributes inside the JSON file. Once we are ready to make +it available for annotation, we can publish the version in the metadata +schema manager, which will update the status inside the JSON file and +rename the file itself as “book-v1.0.0-published.json”. If we want to +create a new version, this will generate a new file +“book-v2.0.0-draft.json”, which will have the same name, title and realm +as the previous version but a different version number and status. +Publishing this new version will change its status and rename the file +as “book-v2.0.0-published.json”, but it will also archive the first +version. This means that the older file will become “book-v1.0.0.json” +(without a suffix indicating the status) and change the ``status`` +inside the JSON file to “archived”. + +As mentioned above, if we have already annotated data using version +1.0.0 of the “book” schema, that metadata will remain unchanged unless +we try to update it. In that case, fields that have not changed between +versions will be untouched, whereas fields that were deleted in version +2.0.0 will be permanently deleted, and those that were added will become +available. + +.. _sec-json: + +2. JSON format +============== + +A specific version of a metadata schema will be stored in a json file +with a series of key-value pairs. + +.. code:: json + + { + "schema_name" : "book", + "version" : "1.0.0", + "status" : "draft", + "properties" : {...}, + "title" : "Book schema as an example", + "edited_by" : "username", + "realm" : "project_collection", + "parent" : "" + } + +The ``schema_name`` attributes indicates the name or ID of the schema, +i.e. the namespace of the AVUs assigned via this schema. In this +example, all the attribute names generated with this schema will be +prefixed with ``mgs.book.``, where ``mgs`` refers to “ManGO schema”. The +``status`` attribute refers to the state in the lifecycle as described +in `Section 1 <#sec-lifecycle>`__, and with ``version`` they constitute +the main characteristics to distinguish between versions of a schema. + +The ``title`` of a schema is used in the UI of the schema manager and +when implementing schemas as a the user-facing label. The ``edited-by`` +attribute is self-explanatory. As introduced above, ``realm`` refers to +the space (such as a project) to which the schema belongs and in which +it can be used. The ``parent`` attribute is relevant when a schema has +been initialized as clone of an existing schema; in that case, it +records the name and version of the schema it originated from. + +The value of the ``properties`` element is itself a series of key-value +pairs indicating fields of the metadata schema. The key is the ID of the +field (how it is defined in the namespace of the schema) and the value +is itself a series of key-value pairs describing the field. The format +of these objects is documented in `Section 3 <#sec-items>`__. + +The order of the attributes is not important, but the order of the +*fields* inside ``properties`` will define the order they take +when rendering the form used to assing metadata from the schema. + +.. _sec-items: + +3. Schema fields +================ + +There are three main kinds of fields that can be included in a metadata +schema: simple fields, multiple-choice fields and composite fields. +Simple fields, described in `Section 3.2 <#sec-simple>`__, include any +form of text or numeric input for which a pattern or range may be +defined but not, strictly speaking, the possible values. It also +includes single (boolean) checkboxes. Multiple-choice fields +(`Section 3.3 <#sec-multiple>`__) include any field that provides a +specific, limited selection of possible values. Finally, the composite +fields, described in `Section 3.4 <#sec-object>`__, are mini-schemas: +collections of fields of other kinds related to each other. + +Each field is represented by a key-value pair in the ``properties`` +element of the schema JSON. Before going through the specific +characteristics of each kind of field, `Section 3.1 <#sec-attr>`__ +offers an overview of their common attributes. + +.. _sec-attr: + +3.1 General Attributes +---------------------- + +The following attributes are used in at least two different kinds of +fields. + +title + All fields in a metadata schema must include the ``title`` attribute, + which provides a user-facing, human-readable label. While the ID or + name of the field is used in the AVU itself, the title is used in the + schema manager, during annotation and when we inspect the existing + metadata in the ManGO portal. + +type + All fields need a ``type`` attribute indicating the kind of field + they represent. The possible values are discussed in the sections + dedicated to each type of field. + +required + Simple fields and single-value multiple-choice fields may contain an + optional boolean ``required`` attribute indicating whether the field + is required when assigning metadata from the schema. A required field + needs to be filled for the metadata form to be submitted. If this + attribute is missing, it is read as “false”. + +default + Simple fields and single-value multiple-choice fields, *if required + is true*, may also contain a ``default`` attribute providing a + default value for the field. + +In the metadata schema manager, the ``title``, id and (if relevant) +``default`` attributes are provided via text input fields and +``required`` via a switch button. In contrast, ``type`` is defined by +the choice of field in the metadata schema manager, except for simple +fields, in which there is an additional dropdown to select among its +various subtypes. + +.. _sec-simple: + +3.2 Simple fields +----------------- + +The prototypical example of a simple field is a text field, such as the +example below. They key “title” indicates that, when assigning metadata +via this field, the name will be ``msg.book.title``. + +.. code:: json + + "title" : { + "type" : "text", + "title" : "Book title", + "required" : true + } + +The ``type`` attribute can have one of several different values, to be +selected from a dropdown menu when designing an instance of this field. +Next to the basic “text” value, other standard inputs are available that +provide minimal validation: “date”, “time”, “email”, or “url”. For a +longer-form, non-restricted text output, the “textarea” value is also an +option; in that case, it is not longer possible to provide a default +value. + +For numeric inputs, the possible types are “integer” or “float”. Fields +with these types also have two optional key-value pairs indicating the +range of allowed values: + +.. code:: json + + "copies_published": { + "type": "integer", + "title": "Number of copies published", + "minimum": "100" + }, + "market_price": { + "type": "float", + "title": "Market price (in euros)", + "minimum": "0.99", + "maximum": "999.99" + } + +Finally, it is also possible to create an individual checkbox (with +``type`` “checkbox”), which takes the value “true” when checked and no +value when unchecked. + +Except for the “checkbox”, all the other simple field types can +additionally have a ``repeatable`` attribute. If “true”, the field can +be copied when assinging the metadata to a collection or data object, in +order to generate multiple AVUs with the same attribute name and +different values. + +In the metadata schema manager, minimum and maximum values for numeric +types can be provided via numeric input fields, whereas the +``repeatable`` attribute is selected via a switch button. + +.. _sec-multiple: + +3.3 Multiple-choice +------------------- + +Multiple-choice fields are indicated by providing the “select” value to +the ``type`` attribute. They are characterized by a restricted selection +of possible values for the metadata field they define. These values are +indicated as a list in the ``values`` attribute: + +.. code:: json + + "ebook": { + "type": "select", + "multiple": false, + "ui": "radio", + "values": [ + "Available", + "Unavailable" + ], + "title": "Is there an e-book?", + "required": true + } + +The metadata schema manager offers two types of multiple-choice fields: +single-value and multiple-value. The former represents radio buttons and +classic dropdowns in which the user must choose up to one of the +possible options. The latter, in contrast, represents checkboxes and +dropdowns in which the user may choose more than one of the possible +options. This choice is coded in the ``multiple`` attribute, which takes +the “false” value in the first case and “true” in the second. + +In addition, the ``ui`` attribute indicates what the field will look +like in the form used to apply the schema. Its value can be “dropdown”, +“checkbox” (if ``multiple`` is “true”) or “radio” (if ``multiple`` is +“false”). This choice is made via a switch button in the metadata schema +manager. + +In the metadata schema manager, each value of the list of options must +be provided manually and then can be edited, deleted or reordered. It is +not yet possible to import a list of values from an external source. + +.. _sec-object: + +3.4 Composite field +------------------- + +Composite fields are miniature schemas nested inside schemas (or other +composite fields) and are meant to bring together multiple fields that +conceptually come together. They take the ``type`` “object”, which is +assigned when the composite field is selected in the metadata schema +manager. Like for schemas, they have a ``properties`` attribute +describing the fields it is composed of. + +.. code:: json + + "author": { + "type": "object", + "title": "Author", + "properties": { + "name": { + "type": "text", + "title": "Name and Surname", + "required": true + }, + "age": { + "type": "integer", + "title": "Age", + "minimum": "12", + "maximum": "99" + }, + "email": { + "type": "email", + "title": "Email address", + "required": true, + "repeatable": true + } + } + } + +Composite fields cannot be required: this is a property of their +components. Currently, they cannot be repeatable either, but that might +change in the future. + +In practical terms, composite fields generate a nested namespace for the +AVUs they contain. As an example, the fields shown in +`Section 3.2 <#sec-simple>`__ would be coded with the names +``msg.book.title``, ``msg.book.copies_published`` and +``msg.book.market_price``, and the one shown in +`Section 3.3 <#sec-multiple>`__ as ``msg.book.ebook``. In contrast, the +composite field shown above results in AVUs with attribute names +``msg.book.author.name``, ``msg.book.author.age`` and +``msg.book.author.email``. + +.. _sec-full: + +4. Full example +=============== + +This section contains the full example of a JSON file that represents a +schema draft. + +.. code:: json + + { + "schema_name": "book", + "version" : "1.0.0", + "status" : "draft", + "properties": { + "title": { + "type": "text", + "title": "Book title", + "required": true + }, + "cover_colors": { + "type": "select", + "multiple": true, + "ui": "checkbox", + "title": "Colors in the cover", + "values": [ + "red", + "blue", + "green", + "yellow" + ] + }, + "publisher": { + "type": "select", + "multiple": false, + "ui": "dropdown", + "values": [ + "Penguin House", + "Tor", + "Corgi", + "Nightshade books" + ], + "title": "Publishing house", + "required": true + }, + "author": { + "type": "object", + "title": "Author", + "properties": { + "name": { + "type": "text", + "title": "Name and Surname", + "required": true + }, + "age": { + "type": "integer", + "title": "Age", + "minimum": "12", + "maximum": "99" + }, + "email": { + "type": "email", + "title": "Email address", + "required": true, + "repeatable": true + } + } + }, + "ebook": { + "type": "select", + "multiple": false, + "ui": "radio", + "values": [ + "Available", + "Unavailable" + ], + "title": "Is there an e-book?", + "required": true + }, + "genre": { + "type": "select", + "multiple": true, + "ui": "dropdown", + "values": [ + "Speculative fiction", + "Mystery", + "Non-fiction", + "Encyclopaedia", + "Memoir", + "Literary fiction" + ], + "title": "Genre" + }, + "publishing_date": { + "type": "date", + "title": "Publishing date", + "required": true, + "repeatable": true + }, + "copies_published": { + "type": "integer", + "title": "Number of copies published", + "minimum": "100" + }, + "market_price": { + "type": "float", + "title": "Market price (in euros)", + "minimum": "0.99", + "maximum": "999.99" + }, + "website": { + "type": "url", + "title": "Website" + }, + "synopsis": { + "type": "textarea", + "title": "Synopsis" + } + }, + "title": "Book schema as an example", + "edited_by" : "username", + "realm" : "project_collection", + "parent" : "" + } diff --git a/source/tier1data/schemas/metadata-schemas.rst b/source/tier1data/schemas/metadata-schemas.rst new file mode 100644 index 00000000..3bbc8f8d --- /dev/null +++ b/source/tier1data/schemas/metadata-schemas.rst @@ -0,0 +1,451 @@ +.. _metadata-schemas: + +Metadata schemas in the ManGO portal +#################################### + + + +This article describes the ManGO portal functionalities related to +metadata schemas: how to design them and how to apply them. Users who +might want to design their own schemas independently and load them via +JSON, as well as developers interested in implemented this feature +outside the portal, are directed to `the technical +specifications `__. + +One crucial principle of the metadata schema functionality in the ManGO +portal is that schemas that can be used to apply metadata cannot be +modified. In other words, for a schema to be used in metadata annotation +it must be fixed and stable; a schema that is undergoing changes cannot +reliably be used for annotation. However, during the course of a +research project it might be necessary to update a schema, and it would +be impractical to create whole new schemas with different names for +every such change. In order to tackle this issue, the ManGO portal +implements a lifecycle via versioning and tags. In short, a schema can +evolve and its evolutions can be registered as new versions of the same +schema. Each version can have one of three statuses: (1) ‘draft’, while +it is being designed and edited; (2) ‘published’, when it is ready to be +implemented; and (3) ‘archived’, when it should not be used anymore, +maybe because a new version has been published. + +In this context, the rest of this document will walk you through the +process of creating, managing and applying metadata schemas. First, +`Section 1 <#sec-draft>`__ will illustrate how to design a new schema +from scratch and save a draft. Then, `Section 2 <#sec-lifecycle>`__ will +briefly discuss the stages of a schema in more detail, including how +they can be managed in the Metadata Schema Manager (the “Metadata +schemas” tab in the ManGO portal). Finally, +`Section 3 <#sec-application>`__ will show how a published schema can be +used to annotate metadata. + +.. _sec-draft: + +1. Schema design +================ + +In order to design a new schema, we need to go to the “Metadata schemas” +section via the left sidebar menu. This will first show a selection of +projects (“realms”) to which your schemas may belong. After choosing +one, the schema manager per se is shown (see `Figure 1 <#fig-start>`__): +it has a button to create a new schema, under which any existing schemas +are listed. Clicking on the button opens up a form on which the schema +can be designed. As shown in `Figure 1 (b) <#fig-schemaeditor>`__, it +includes a field to provide a name, one for a user-facing title and a +button to add a field. + +.. container:: + :name: figs-start + + .. figure:: ../images/metadata-schemas/001-schema-start.png + :alt: Accordion with a list of realms, one of which is enframed in a blue square with a cursor icon on top. Underneath and connected via an arrow, an empty metadata schema manager that just has the title and a button that reads 'new schema'. + :width: 400 + :name: fig-start-a + + \(a) Choose a realm. + + .. figure:: ../images/metadata-schemas/03-empty-schema.png + :alt: Initial form for an empty schema under the 'New schema' button, with fields to insert the schema ID and label, a button to add a new element and buttons to submit the form. + :width: 400 + :name: fig-schemaeditor + + \(b) An empty schema to start with. + + Figure 1: A new schema from scratch. + +The “Schema ID” field is meant for the unique ID or name of a schema +and, like the “Schema label”, is shared by all the versions of the same +schema. Given an ID ``book``, the attribute names of all the metadata +fields belonging to this schema will be prefixed with ``mgs.book.``. +This is why some restrictions apply to the format of the ID: it can only +contain lowercase letters, numbers, hyphens and underscores. In +contrast, the label can be any type of name or title and will be used to +represent the schema in user-facing interactions, as will be shown +later. In this documentation, we will exemplify with a schema named +``book`` with the label “Book schema as an example”. + +The “Add element” button opens a modal that offers the different kinds +of fields that can be added to a schema with examples +(`Figure 2 <#fig-fields>`__): + +- Simple fields result in input fields for texts, numbers, dates and + similar formats or a single checkbox. + +- Single-value multiple fields result in a dropdown or radio buttons + and are used when the metadata value must be one of a selection of + possible values. + +- Multiple-value multiple fields result in a dropdown or checkboxes and + are used when the metadata value may be many of a selection of + possible values. As a result, the same attribute name is repeated + with different attribute values. + +- Composite fields are like nested schemas: groups of other kinds of + fields that describe the same concept. + +.. figure:: ../images/metadata-schemas/002-fields-long.png + :alt: Four cards with examples of the different kinds of fields and buttons that open editors to instantiate each field. + :name: fig-fields + :width: 600 + + Figure 2: Options to select a type of input field. + +The blue buttons with the names of the types of fields open new modals +with forms that can be used to design an instance of this field (see +`Figure 3 <#fig-editors>`__). All these forms start with two text fields +to define an ID and a label for the field and end with a button to add +the new field to the schema. In between there are more specific input +fields used to refine the characteristics of the field you want to +design as well as up to two switch buttons to implement optional +properties. + +.. container:: + :name: fig-editors + + .. figure:: ../images/metadata-schemas/06-add-simple-field.png + :alt: Form to create a simple field. + :width: 400 + :name: fig-simple + + \(a) Design a simple field. + + .. figure:: ../images/metadata-schemas/10-add-single-value-multiple.png + :alt: Form to create a single-value multiple-choice field. + :width: 400 + :name: fig-radio + + \(b) Design a single-value multiple-choice field. + + .. figure:: ../images/metadata-schemas/17-add-multiple-value-multiple.png + :alt: Form to create a multiple-value multiple-choice field. + :width: 400 + :name: fig-checkbox + + \(c) Design a multiple-value multiple-choice field. + + .. figure:: ../images/metadata-schemas/24-add-composite-field.png + :alt: Form to create a composite field, which looks like an empty schema. + :width: 400 + :name: fig-composite + + \(d) Design a composite field. + + Figure 3: Forms to design a new field. + +For example, clicking on “Simple field” will open the form in `Figure 3 +(a) <#fig-simple>`__. After the common fields for ID and label, we see a +dropdown menu that offers different kinds of simple fields: text, +textbox, email, url, integer, float, date, time, datetime… If “integer” +or “float” are chosen, two new fields show up to provide minimum and +maximum thresholds for the value of the field. Via the switches at the +bottom the field can be made required (that is, when filling the +metadata, this field *has* to be provided) and/or repeatable (when +filling the metadata, we can create multiple copies with the same +attribute name and different values). If it is required, we can also +provide a default value. + +`Figure 4 (a) <#fig-simplefull>`__ shows the same form as in `Figure 3 +(a) <#fig-simple>`__ after filling in some choices. The ID is now +``title``, which means that when applying the schema this will create an +attribute name ``mgs.book.title``. The label is “Book title”, so that +the form to apply the metadata schema and the table used to inspect the +existing metadata will show this label. The field is also required, but +has no default value, and of type “text”. + +Once we add this new field to the schema, a box for it is added to the +schema editor, as shown in `Figure 4 (b) <#fig-simpleview>`__. The title +–but not the ID– is shown, followed by an asterisk to indicate that the +field is required. Underneath we see the input field as it would look +like in the final form with a small clarification of the type of input +field it is. On the top right corner fo the box some options are +provided to further manipulate the field and its position in the form. +The arrows allow us to move the field up and down, but they are disabled +at the moment because there are no other fields. The third button +creates a quick copy of the field as an aid to create a similar one. The +pencil reopens the editing modal if you want to change anything in the +field, and the trash bin can be used to delete the field altogether. + +.. container:: + :name: fig-new-simple + + .. figure:: ../images/metadata-schemas/08-title-simple-field.png + :alt: Filled form to create a new simple field. + :width: 400 + :name: fig-simplefull + + \(a) A filled form for a simple field. + + .. figure:: ../images/metadata-schemas/09-after-adding-field1.png + :alt: View of an editing form for a schema to which the book title field has been added. + :width: 400 + :name: fig-simpleview + + \(b) View of a designed simple field. + + Figure 4: Designing a simple field. + +You can also see that in `Figure 4 (b) <#fig-simpleview>`__ the box +representing the new field now has two “Add element” buttons: one to add +a field right before, and one to add a field right after. On clicking +one of these buttons we open again the modal shown in +`Figure 2 <#fig-fields>`__ and we can choose again the type of field we +want to add. + +`Figure 5 <#fig-new-radio>`__ shows how we can edit a multiple-choice +field. As seen in `Figure 3 <#fig-editors>`__, the only differences +between the editors for single-value and multiple-value multiple-choice +fields are in the title of the modal and the possibility of defining a +default value for the former type. However, the results are also +different. If the “As dropdown” switch is activated (as shown in +`Figure 5 (a) <#fig-radiofull>`__), the input field will look like a +dropdown, but the number of options that can be selected from it depend +on whether it’s a single-value or multiple-value field. If it is not +activated, single-value fields will be rendered as radio buttons, +whereas multiple-value ones will be rendered as checkboxes. In any case, +the middle part of the editors works in the same way in `Figure 3 +(b) <#fig-radio>`__ and `Figure 3 (c) <#fig-checkbox>`__: we start we +two empty fields labeled “Select option” with three buttons to their +right: two arrows and a trash bin. The arrows allow us to reorder the +options, whereas the trash bin lets us remove one of the fields (but +there cannot be fewer than two). The big “Add option” button creates a +new input field for a new option, which must be either filled or +deleted. + +`Figure 5 (b) <#fig-radioview>`__ shows how the dropdown created in +`Figure 5 (a) <#fig-radiofull>`__ is rendered along the other field in +the schema editor. Again, it is labeled as “Publishing house”, although +metadata assigned via this field will have the name +``mgs.book.publisher``. + +.. container:: + :name: fig-new-radio + + .. figure:: ../images/metadata-schemas/15-publisher-svmc-field.png + :alt: Filled form to design a single-value multiple-choice field. + :width: 400 + :name: fig-radiofull + + \(a) A filled form for a single-value multiple-choice field. + + .. figure:: ../images/metadata-schemas/16-after-adding-field2.png + :alt: View of an editing form for a schema to which a field with a dropdown has been added. + :width: 400 + :name: fig-radioview + + \(b) View of a designed single-value multiple-choice field. + + Figure 5: Designing a single-value multiple-choice field. + +`Figure 4 (b) <#fig-simpleview>`__ and `Figure 5 (b) <#fig-radioview>`__ +also show, at the bottom, two buttons: a green one labeled “Save draft” +and a yellow one that reads “Publish”. These buttons are also present in +`Figure 1 <#fig-start>`__, although in this case the “Publish” button is +disabled. This is because it is possible to create a draft that has no +fields yet, but not to publish it. Once we save a draft, a new accordion +item is created for the new schema in the page, with a tab for the draft +version. `Figure 6 <#fig-saved>`__ shows this tab after also adding a +non-required checkbox field between “Book title” and “Publishing house” +and saving the draft. The tab itself shows the version number and status +of this version and contains three buttons: one to view the form as it +will be shown when applying metadata, one to edit it, which opens a tab +that looks just like the editor we were working on, and one to discard +the draft. By clicking on “Discard” a modal pops up to ask for +confirmation: if we accept, all traces of this schema will be removed, +because the draft is its only existing version. + +.. figure:: ../images/metadata-schemas/22-save-draft1.png + :alt: View of the draft of a schema as only tab in the accordion item of that schema. + :name: fig-saved + + Figure 6: New draft. + +While the draft has not been published, we can still edit it: add new +fields, change them, reorder them, remove them… It is also possible to +change the title or label of the schema itself, but not to change the +ID. If we want to add a composite field, `Figure 3 +(d) <#fig-composite>`__ shows that the editor starts like the editors of +other fields, but then just has an “Add element” button, which behaves +exactly like the “Add element” button of a schema: it oppens the modal +in `Figure 2 <#fig-fields>`__, which in turn opens the modal of the +chosen field type. `Figure 7 (a) <#fig-compositefull>`__ shows an editor +for a composite field to which we have added three simple fields: a +required “Name and surname” of type text, an “Age” of type integer with +a minimum value of 12 and a maximum value of 99, and a required, +repeatable “Email address” of type email. Once we add the composite +field to the schema, its editing box (`Figure 7 +(b) <#fig-compositeview>`__) shows its components as they will appear in +the final form; in order to edit them, we first need to edit the +composite field itself. + +.. container:: + :name: fig-new-composite + + .. figure:: ../images/metadata-schemas/26-author-composite-field.png + :alt: Filled form to edit a composite field, which looks like an editor for a schema, after adding three subfields. + :width: 400 + :name: fig-compositefull + + \(a) A filled form for a single-value multiple-choice field. + + .. figure:: ../images/metadata-schemas/27-view-composite.png + :alt: View of the box corresponding to a composite field after adding it to a draft schema. + :width: 400 + :name: fig-compositeview + + \(b) View of a designed single-value multiple-choice field. + + Figure 7: Designing a composite field. + +.. _sec-lifecycle: + +2. Versioning and lifecycle +=========================== + +Once you are satisfied with your draft is ready to be applied, you can +publish it. This will update the tab so that the orange badge “draft” is +replaced with a green one labeled “published”, and change the options +provided in the top right buttons (`Figure 8 <#fig-published>`__). The +“View” tab, which shows the form as it will appear when applying the +metadata schema, is the same as for a draft version, but the rest of the +buttons have changed. + +.. figure:: ../images/metadata-schemas/34-view-published.png + :alt: View of a published version of a schema as only tab in the accordion item of a schema. + :name: fig-published + + Figure 8: A published version of a schema. + +“New (draft) version” and “Copy to new schema” open editors like “Edit” +did for the draft schemas. The difference between these two editors is +that the former creates a new draft for the same schema and the latter +starts a whole other schema with the same contents. Saving a new draft +will create a new version (in this case 2.0.0) and show it in a second +tab next to the published version, as shown in `Figure 9 +(a) <#fig-published2>`__. While a draft version exists, the “New (draft) +version” button is absent. When creating this draft, the Schema ID and +and label are fixed and cannot be edited. In contrast, in the editor in +“Copy to new schema” (shown in `Figure 9 (b) <#fig-clone>`__) these +fields are empty and, in fact, it is not possible to reuse the same +Schema ID we had before. Use cases for this feature are derived schemas, +i.e. schemas that share many fields with another schema but represent a +different thing. The name and version of the published schema it +originated from is recorded, but nothing is done with this information +yet. + +.. container:: + :name: fig-from-published + + .. figure:: ../images/metadata-schemas/38-view-published2.png + :alt: View of a published version of a schema as first tab in the accordion item, with a draft as the second tab. + :width: 400 + :name: fig-published2 + + \(a) A published version of a schema when a draft exists. + + .. figure:: ../images/metadata-schemas/39-clone-published.png + :alt: Editor of a schema to create a copy or clone from a published schema. + :width: 400 + :name: fig-clone + + \(b) Draft a new schema from a published version of a schema. + + Figure 9: Create new drafts from a published version. + +When this copy is saved, a new schema is created, like when we edit one +from scratch in “New schema”. This generates a new accordion item with +its own “draft” tab containing the version that was just created. Note +that it is also possible to publish a version of a schema, even a copy +from a published schema, without saving it as a draft first. In +`Figure 10 <#fig-clone2>`__ we could decide to view and edit this new +schema or the previous one by clicking on their name, which expands the +appropriate tabs. If we click on “Book schema as an example”, we’ll see +that the “Copy to new schema” section has been reset to the original +contents of the published version of this schema. + +.. figure:: ../images/metadata-schemas/40-save-clone.png + :alt: View of the draft tab in the accordion of the new schema copied from a published schema, under the closed accordion item of the previous schema. + :name: fig-clone2 + :width: 700 + + Figure 10: Saved draft of copied schema. + +Archiving a published version of a schema will prevent it from being +implemented, but won’t delete it. In the current version of the Metadata +Schema Manager, archived versions are not visible either. However, they +still exist, and it is not possible to create a new schema with the same +ID. + +.. _sec-application: + +3. Apply metadata with a schema +=============================== + +In order to apply a metadata schema, we first have to move to the +“Collections” tab of the ManGO portal and select the collection or data +object to which we want to add metadata. In the “Metadata” tab, a +dropdown will appear with the selection of published schemas, as shown +in `Figure 11 (a) <#fig-selectschema>`__. Clicking on “Edit” will open a +page with the form shown in `Figure 11 (b) <#fig-apply>`__, which is +very similar to what we could see in the “View” tab of the published +schema (`Figure 8 <#fig-published>`__). Required fields have an asterisk +next to their name, simple fields have a short description under their +input fields and repeatable fields have a button that can be used to +duplicate them. + +.. container:: + :name: fig-annotationform + + .. figure:: ../images/metadata-schemas/41-apply.png + :alt: View of the metadata tab of a data-object frankenstein.txt with focus on the dropbox showing the published metadata schemas. + :width: 400 + :name: fig-selectschema + + \(a) Select a published metadata schema. + + .. figure:: ../images/metadata-schemas/42-apply-form.png + :alt: Form with empty fields corresponding to the metadata schema including information on the name of the object and name and version of the schema. + :width: 400 + :name: fig-apply + + \(b) Empty form to apply a metadata schema + + Figure 11: Apply a metadata schema. + +If a required field is not filled, it won’t be possible to save the +metadata. Once we do save it, we can see the results in a tab inside the +“Metadata” tab of the object. `Figure 12 <#fig-viewann>`__ shows that +the user-facing label of the schema, not its name, is used to name the +tab, and that the labels of the different fields are used in the table +that shows the current annotation. Hovering over the labels will show a +small popover with the name that the AVU takes in Tier-1 Data, +e.g. ``mgs.book.title`` for the book title, ``mgs.book.author.email`` +for the email address inside the Author composite field, etc. Moreover, +fields for which no values have been provided can still be seen as +empty, to indicate that the schema has not been completely implemented. + +.. figure:: ../images/metadata-schemas/45-view-annotation.png + :alt: View of the metadata tab of the data-object frankenstein.txt with the filled-in data of the metadata schema. + :name: fig-viewann + :width: 600 + + Figure 12: All metadata fields are shown, with or without values. + + diff --git a/source/tier1data/wsl.rst b/source/tier1data/wsl.rst new file mode 100644 index 00000000..c47e4bac --- /dev/null +++ b/source/tier1data/wsl.rst @@ -0,0 +1,45 @@ +.. _wsl: + +Installing WSL2 on Windows +========================== + +As a Windows user if you don't already use any virtualisation system to operate Linux you can install Windows Subsystem for Linux (WSL2). + +To be able to install WSL 2 on your Windows 10, you need the following: + +- Windows 10 May 2020 (2004), Windows 10 May 2019 (1903), or Windows 10 November 2019 (1909) or later +- Hyper-V Virtualization support + +Users who are using a system managed by KU Leuven should fulfill these requirements. + +The requirements can be checked as follows: + +To know your Windows version, type ``winver`` on your search bar, a informative popup appears. + +https://support.microsoft.com/en-us/topic/c75c6a43-9c87-e412-9a9e-10a0dabac4d5Anyone who cannot see 2004 should look at this link. + +The installation of WSL2 will consist of the following steps: + +Enable WSL 2, +Enable ‘Virtual Machine Platform', +Set WSL 2 as default, +Install a Linux distro. +We will complete all steps by using Power Shell of Windows. However you can do some of the steps by graphical screens as an option. Here you can find all steps: + +Run Windows PowerShell as administrator, +Type the following to enable WSL: +dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart + +To enable Virtual Machine Platform on Windows 10 (2004), execute the following command: +dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart + +To set WSL 2 as default execute the command below (You might need to restart your PC): +wsl --set-default-version 2 + +To install your Linux distribution of choice (Ubuntu 18.04 LTS is recommended to install the following iRODS packages easily) on Windows 10, open the Microsoft Store app, search for it, and click the “Get” button. +The first time you launch a newly installed Linux distribution, a console window will open and you'll be asked to wait for a minute or two. +You will then need to create a user account and password for your new Linux distribution. This password will give you ‘sudo' rights when asked. +If you see ‘WSLRegisterDistribution Failed with Error:' or you may find that things don't work as intended you should restart your system at this point. +After all these steps when you type ‘wsl' to your Windows PowerShell, you will be directed to your Ubuntu machine mounted on your Windows' C drive. From now on, you can execute all Linux commands. It is advised to use the home directory instead of your Windows drives. So if you type ‘cd‘ you will be forwarded to your Ubuntu home. + +You can also install (optional) the Windows Terminal app, which enables multiple tabs operation, search feature, and custom themes etc. \ No newline at end of file