Data Transfers

In the following examples we assume that local directories and remote collections have already been created. Otherwise the operations will fail with an error message. To create local directories, use pathlib.Path.mkdir(parents=True, exist_ok=True). For remote collections, IrodsPath.create_collection() can be used.

Note

By default, no data will be overwritten. If you want to overwrite data, you can set overwrite=True. Beware that you can also overwrite newer data with older data this way. If a file and a dataobject are exactly the same, iBridges will skip the transfer and print a warning, thereby saving time.

For all operations, iBridges will check that the transfer has been completed without error. If a local file is different from a remote file, you will get an error message. If this occurs, you can transfer the file again. If the problem persists, you should contact your local iRODS administrator.

Upload

To upload files or folders from your local file system to iRODS use the upload() function.

In the example below we transfer a file or a folder to a new collection new_coll. If the transfer concerned a folder, a new collection with the folder name will be created.

from ibridges import upload
from ibridges import IrodsPath
from pathlib import Path

local_path = Path("/path/to/the/data/to/upload")
irods_path = IrodsPath(session, '~', 'new_coll')
upload(local_path, irods_path)

Note

All of the data transfer functions return an Operations object, which can be used to execute all operations. With the option dry_run=True you can retrieve these operations before executing them. This enables you to check what will be transferred before the actual transfer using the Operations.print_summary() method.

Download

The download() function works similar to the upload() function. Simply define your iRODS path you would like to download and a local destination path.

from ibridges import download
from ibridges import IrodsPath
from pathlib import Path

local_path = Path("/destination/location/for/the/data")
irods_path = IrodsPath(session, '~', 'new_coll')
download(irods_path, local_path)

Synchronisation

The iBridges function sync synchronises data between your local file system and the iRODS server.

The function works in both directions: synchronisation of data from the client’s local file system to iRODS, or from iRODS to the local file system. The direction is given by the type of path and the order. This is a case where remote paths have to be encoded using ibridges.path.IrodsPath, since iBridges otherwise has no way of knowing which of the two paths is remote and which is local.

Synchronise from local to remote

The code below shows how to synchronise from your local file system to iRODS. The data in iRODS will be updated.

from pathlib import Path
from ibridges.path import IrodsPath
from ibridges.data_operations import sync

target = IrodsPath(session, "~", "<irods path>")
source = Path.home() / "<local path>"

# Synchronise the data
sync(source=source, target=target)

Synchronize from remote to local

The code below shows how to synchronise from your iRODS instance to your local file system. Your local data will be updated.

from pathlib import Path
from ibridges.path import IrodsPath
from ibridges.data_operations import sync
target = Path.home() / "<local path>"
source = IrodsPath(session, "~", "<irods path>")

# call the synchronisation
sync(source=source, target=target)

Streaming data objects

With the python-irodsclient which iBridges is built on, we can open the file inside of a data object as a stream and process the content without downloading the data. That works without any problems for textual data.

from ibridges import IrodsPath

obj_path = IrodsPath(session, "path", "to", "object")
with obj_path.open('r') as stream:
    content = stream.read().decode()

Some python libraries allow to be instantiated directly from such a stream. This is supported by e.g. pandas, polars and whisper.

import pandas as pd

with obj_path.open('r') as stream:
    df = pd.read_csv(stream)

print(df)