Datasets¶
NomadDataset
¶
Represents a dataset within the NOMAD system.
This class defines the structure of a dataset object used in the NOMAD application, including its metadata and associated user information. It is designed to be immutable, with all attributes set at the time of instantiation.
Attributes:
Name | Type | Description |
---|---|---|
dataset_id |
str
|
Unique identifier for the dataset. |
dataset_create_time |
datetime
|
The creation time of the dataset. |
dataset_name |
str
|
The name of the dataset. |
dataset_type |
Optional[str]
|
The type of the dataset, if specified. |
dataset_modified_time |
Optional[datetime]
|
The last modification time of the dataset, if any. |
user |
Optional[NomadUser]
|
The user associated with the dataset, if any. |
doi |
Optional[str]
|
The Digital Object Identifier (DOI) of the dataset, if any. |
pid |
Optional[int]
|
The persistent identifier (PID) of the dataset, if any. |
m_annotations |
Optional[dict]
|
A dictionary of metadata annotations associated with the dataset, if any. |
Source code in martignac/nomad/datasets.py
create_dataset(dataset_name, use_prod=False, timeout_in_sec=10)
¶
Creates a new dataset in the NOMAD system with the specified name.
This function sends a POST request to the NOMAD system to create a new dataset. The request includes the dataset
name and is sent to either the production or test environment based on the use_prod
flag. The function waits for
a response for a specified timeout period.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_name |
str
|
The name of the dataset to be created. |
required |
use_prod |
bool
|
Flag indicating whether to use the production environment. Defaults to False, indicating that the test environment is used by default. |
False
|
timeout_in_sec |
int
|
The maximum time in seconds to wait for a response from the server. Defaults to 10 seconds. |
10
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The unique identifier of the newly created dataset, as returned by the NOMAD system. |
Raises:
Type | Description |
---|---|
HTTPError
|
If the request fails or the NOMAD system returns an error response. |
Source code in martignac/nomad/datasets.py
delete_dataset(dataset_id, use_prod=False, timeout_in_sec=10)
¶
Deletes a dataset from the NOMAD system by its dataset ID.
This function sends a DELETE request to the NOMAD system to remove a dataset identified by its unique ID. The
operation can be directed to either the production or test environment, as specified by the use_prod
flag. The
function allows specifying a timeout for the request.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_id |
str
|
The unique identifier of the dataset to be deleted. |
required |
use_prod |
bool
|
Flag indicating whether to use the production environment. Defaults to False, indicating that the test environment is used by default. |
False
|
timeout_in_sec |
int
|
The maximum time in seconds to wait for a response from the server. Defaults to 10 seconds. |
10
|
Note
This function logs the outcome of the deletion operation, reporting success or failure through the logging system.
Source code in martignac/nomad/datasets.py
get_dataset_by_id(dataset_id, use_prod=True)
¶
Retrieves a single NomadDataset object by its dataset ID.
This function queries the NOMAD system for a dataset with the specified ID. It leverages the retrieve_datasets
function to perform the query, ensuring that only one dataset is returned. If the query returns more or fewer
than one dataset, it raises a ValueError indicating an issue with the retrieval process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_id |
str
|
The unique identifier of the dataset to retrieve. |
required |
use_prod |
bool
|
Flag to use the production environment. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
NomadDataset |
NomadDataset
|
The dataset object corresponding to the provided ID. |
Raises:
Type | Description |
---|---|
ValueError
|
If no dataset is found with the provided ID, or if multiple datasets are returned. |
Source code in martignac/nomad/datasets.py
retrieve_datasets(dataset_id=None, dataset_name=None, user_id=None, page_size=10, max_datasets=50, use_prod=True)
¶
Retrieves a list of NomadDataset objects based on the provided filters.
This function queries the NOMAD system for datasets, optionally filtering by dataset ID, dataset name, or user ID.
It supports pagination through the page_size
parameter and allows limiting the total number of datasets returned
with max_datasets
. The use_prod
flag determines whether to query the production or test environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_id |
str
|
The unique identifier of the dataset to retrieve. Defaults to None. |
None
|
dataset_name |
str
|
The name of the dataset to filter by. Defaults to None. |
None
|
user_id |
str
|
The user ID to filter datasets by. Defaults to None. |
None
|
page_size |
int
|
The number of datasets to return per page. Defaults to 10. |
10
|
max_datasets |
int
|
The maximum number of datasets to retrieve. Defaults to 50. |
50
|
use_prod |
bool
|
Flag to use the production environment. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
list[NomadDataset]
|
list[NomadDataset]: A list of NomadDataset objects matching the query. |