Added: Exporter documentation

This commit is contained in:
David Juarez
2025-06-23 14:19:33 +02:00
parent a689a2f808
commit bb267e179c

View File

@ -1,15 +1,268 @@
# Exporter and Reporter # Exporter and Reporter
---
## Exporter ## Exporter
The exporter is a REST API that exports the data of the different databases of the bridgehead in a set of tables.
It can accept different output formats as CSV, Excel, JSON or XML. It can also export data into Opal. **GitHub:** [https://github.com/samply/exporter](https://github.com/samply/exporter)
The Exporter is a **REST API** that allows exporting data from various **databases of the bridgehead** as **structured tables**. It supports multiple output formats including **CSV, Excel, JSON, and XML**. Additionally, it can export data directly into **Opal (DataSHIELD)**.
### How it works
The **user** submits a **query** and specifies the desired **export template** and **output format**. The **query** acts like the `WHERE` clause in SQL, filtering data, while the **template** defines what data to select and how to format it, similar to the `SELECT` clause. The Exporter then processes this to generate the export files.
### Environment Variables
Below is a list of configurable environment variables used by the Exporter:
| Variable | Default | Description |
| --------------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------------- |
| APPLICATION\_PORT | 8092 | Port on which the application runs. |
| ARCHIVE\_EXPIRED\_QUERIES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for archiving expired queries. |
| CLEAN\_TEMP\_FILES\_CRON\_EXPRESSION | `0 0 1 * * *` | Cron expression for cleaning temporary files. |
| CLEAN\_WRITE\_FILES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for cleaning written files. |
| CONVERTER\_TEMPLATE\_DIRECTORY | | Directory containing conversion templates. |
| CONVERTER\_XML\_APPLICATION\_CONTEXT\_PATH | | Path to the XML application context used by the converter. |
| CROSS\_ORIGINS | | Allowed CORS origins (comma-separated). |
| CSV\_SEPARATOR\_REPLACEMENT | | Character to replace CSV separators within values. |
| EXCEL\_WORKBOOK\_WINDOW | 30000000 | Memory window size for Excel workbook processing. |
| EXPORTER\_API\_KEY | | API key for authenticating access to the exporter. |
| EXPORTER\_DB\_FLYWAY\_MIGRATION\_ENABLED | true | Enable Flyway DB migrations on startup. |
| EXPORTER\_DB\_PASSWORD | | Password for exporter database. |
| EXPORTER\_DB\_URL | `jdbc:postgresql://localhost:5432/exporter` | JDBC URL for exporter DB. |
| EXPORTER\_DB\_USER | | Username for exporter DB. |
| FHIR\_PACKAGES\_DIRECTORY | | Directory where FHIR packages are stored. |
| HAPI\_FHIR\_CLIENT\_LOG\_LEVEL | OFF | Log level for HAPI FHIR client. |
| HIBERNATE\_LOG | false | Enable Hibernate SQL logging. |
| HTTP\_RELATIVE\_PATH | | Relative base path for HTTP endpoints. |
| HTTP\_SERVLET\_REQUEST\_SCHEME | http | Default HTTP scheme. |
| LOG\_FHIR\_VALIDATION | | Enable logging of FHIR validation results. |
| LOG\_LEVEL | INFO | Application log level. |
| MAX\_NUMBER\_OF\_EXCEL\_ROWS\_IN\_A\_SHEET | 100000 | Max rows per Excel sheet. |
| MAX\_NUMBER\_OF\_RETRIES | 10 | Max retry attempts. |
| MERGE\_FILENAME | | Name of merged output file. |
| SITE | | Site identifier for filenames/logs. |
| TEMP\_FILES\_LIFETIME\_IN\_DAYS | 1 | Lifetime of temporary files (days). |
| TEMPORAL\_FILE\_DIRECTORY | | Directory for temporary files. |
| TIMEOUT\_IN\_SECONDS | 10 | Default timeout (seconds). |
| TIMESTAMP\_FORMAT | | Timestamp format string. |
| WEBCLIENT\_BUFFER\_SIZE\_IN\_BYTES | 8192 | Buffer size for web client. |
| WEBCLIENT\_CONNECTION\_TIMEOUT\_IN\_SECONDS | 5 | Connection timeout (seconds). |
| WEBCLIENT\_MAX\_NUMBER\_OF\_RETRIES | 10 | Max retries for web client. |
| WEBCLIENT\_REQUEST\_TIMEOUT\_IN\_SECONDS | 10 | Request timeout (seconds). |
| WEBCLIENT\_TCP\_KEEP\_CONNECTION\_NUMBER\_OF\_TRIES | 3 | TCP keepalive retry attempts. |
| WEBCLIENT\_TCP\_KEEP\_IDLE\_IN\_SECONDS | 30 | TCP keepalive idle time (seconds). |
| WEBCLIENT\_TCP\_KEEP\_INTERVAL\_IN\_SECONDS | 10 | TCP keepalive probe interval (seconds). |
| WEBCLIENT\_TIME\_IN\_SECONDS\_AFTER\_RETRY\_WITH\_FAILURE | 1 | Wait time after failed retry (seconds). |
| WRITE\_FILE\_DIRECTORY | | Directory for final output files. |
| WRITE\_FILES\_LIFETIME\_IN\_DAYS | 30 | Lifetime of written files (days). |
| XML\_FILE\_MERGER\_ROOT\_ELEMENT | Containers | Root element for XML file merging. |
| ZIP\_FILENAME | `exporter-files-${SITE}-${TIMESTAMP}.zip` | Pattern for ZIP archive naming. |
---
### About Cron Expressions in Spring
Cron expressions configure scheduled tasks and consist of six space-separated fields representing second, minute, hour, day of month, month, and day of week. For example, the default `0 0 2 * * *` means “at 2:00 AM every day.” These expressions allow precise scheduling for maintenance tasks such as cleaning files or archiving data.
---
## Exporter-DB ## Exporter-DB
It is a database to save queries for its execution in the exporter.
The exporter manages also the different executions of the same query in through the database. **GitHub:** [https://github.com/samply/exporter-db](https://github.com/samply/exporter-db) (If exists; if not, just remove or adjust accordingly)
The Exporter-DB stores queries for execution by the Exporter and tracks multiple executions of the same query, managing versioning and scheduling.
---
## Reporter ## Reporter
This component is a plugin of the exporter that allows to create more complex Excel reports described in templates.
It is compatible with different template engines as Groovy, Thymeleaf,... **GitHub:** [https://github.com/samply/reporter](https://github.com/samply/reporter)
It is perfect to generate a document as our traditional CCP quality report.
The Reporter is a **plugin for the Exporter** designed for generating **complex Excel reports** based on **customizable templates**. It supports various template engines like **Groovy** and **Thymeleaf**, making it ideal for producing detailed documents such as the traditional CCP **data quality report**.
---
## Exporter Templates
An exporter template describes the **structure** and **content** of the **export output**.
### Main Elements
* **converter**: Defines the export job, specifying output files and data sources.
* **container**: Represents a logical grouping of data rows (like a table).
* **attribute**: Defines individual data fields/columns extracted from the data source.
### Other Elements
* **cql**: Contains Clinical Quality Language queries used to enrich or filter data.
* **fhir-rev-include**: Defines FHIR reverse includes to fetch related resources.
* **fhir-package**: (To be detailed)
* **fhir-terminology-server**: (To be detailed)
### Example Snippet
```xml
<converter id="ccp" excel-filename="Export-${SITE}-${TIMESTAMP}.xlsx" source-id="blaze-store">
<container id="Patient" csv-filename="Patient-${SITE}-${TIMESTAMP}.csv" excel-sheet="Patient" ...>
<attribute id="Patient-ID" default-name="PatientID" val-fhir-path="Patient.id.value" anonym="Pat" op="EXTRACT_RELATIVE_ID"/>
...
</container>
...
</converter>
```
---
### 1. **Converter**
Main tag of an exporter template grouping converters to find the best chain for data conversion.
| Tag | Description |
| ------------- | --------------------------------------------------------------------------------------------- |
| `<converter>` | Main tag for exporter template containing sources, metadata, and additional query information |
| Attribute | Description | Example | Default |
| ------------------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------- | ------- |
| id | ID to reference a template | `id="ccp-opal"` | — |
| default-name | Default name when output is in a single file format (no extension; added automatically) | — | — |
| ignore | Deactivate template but keep accessible | `ignore="true"` | false |
| excel-filename | Name of the Excel output file (supports variables `${SITE}`, `${TIMESTAMP}`) | `excel-filename="Export-${SITE}-${TIMESTAMP}.xlsx"` | — |
| csv-separator | CSV separator character | — | `"\t"` |
| source-id | ID of the data source | `source-id="blaze-store"` | — |
| target-id | ID of a target server for file transfer (e.g., Opal for DataSHIELD) | `target-id="opal"` | — |
| opal-project | Opal-specific: name of project | — | — |
| opal-permission-type | Opal permission type (`user` or `group`) | — | — |
| opal-permission-subjects | Opal permission subjects | — | — |
| opal-permission | Opal permission (`administrate` or `use`) | — | — |
**Notes:**
* Variables like `${SITE}`, `${TIMESTAMP}`, and environment variables (base64 encoded) can be used inside tags.
**Allowed child elements:**
* `<container>`, `<cql>`, `<fhir-rev-include>`, `<fhir-package>`, `<fhir-terminology-server>`
---
### 2. **Container**
Represents a data table with columns (attributes).
| Tag | Description |
| ------------- | --------------------------------------------------- |
| `<container>` | Defines a container/table with attributes (columns) |
| Attribute | Description | Example | Default |
| ---------------- | ------------------------------------------------------------ | --------------------------------------------- | ------- |
| id | Container ID to reference | — | — |
| default-name | Name of Excel sheet/file (no extension, added automatically) | — | — |
| csv-filename | Name of CSV file | `csv-filename="Diagnosis-${TIMESTAMP}.csv"` | — |
| json-filename | Name of JSON file | `json-filename="diagnosis-${TIMESTAMP}.json"` | — |
| xml-filename | Name of XML file | `xml-filename="diagnosis-${TIMESTAMP}.xml"` | — |
| xml-root-element | Root element name in XML | `xml-root-element="diagnoses"` | — |
| xml-element | Element name for each entry in XML | `xml-element="diagnosis"` | — |
| excel-sheet | Excel sheet name | `excel-sheet="diagnosis-${TIMESTAMP}.xlsx"` | — |
| opal-table | Opal table name | `opal-name="Diagnosis"` | — |
| opal-entity-type | Opal entity type | — | — |
---
### 3. **Attribute**
Represents a column in a container/table.
| Tag | Description |
| ------------- | --------------------------- |
| `<attribute>` | Defines an attribute/column |
| Attribute | Description | Example | Default |
| ------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ------- |
| id | Attribute ID | `id="Patient-ID"` | — |
| default-name | Default name of the attribute (used if no output-specific name provided) | — | — |
| link | Reference to an attribute of another container (format: `<container-name>.<attribute-id>`) | `link="Patient.Patient-ID"` | — |
| csv-column | Name of the CSV column | — | — |
| excel-column | Name of the Excel column | — | — |
| json-key | JSON key | — | — |
| xml-element | XML element name | — | — |
| opal-value-type | Opal-specific value type | — | — |
| opal-script | Script to be applied to the field in Opal | — | — |
| primary-key | Marks attribute as primary key | `primary-key="true"` | false |
| validation | Marks attribute as syntactic validation field (ends with `-Validation` in DKTK/BBMRI reporter) | `validation="true"` | false |
| val-fhir-path | FHIR path to extract value (if source is a FHIR server) | `val-fhir-path="Patient.gender.value"` | — |
| join-fhir-path | FHIR path for joining secondary resources to main resource | `join-fhir-path="/AdverseEvent.suspectEntity.instance.reference.where(value.startsWith('Procedure')).value"` | — |
| condition-value-fhir-path | Condition filtering for complex value extraction (FHIR path syntax) | `condition-value-fhir-path="Patient.birthDate <= today() - 18 'years'"` | — |
| anonym | Anonymization prefix; replaces real value with `anonym` + number | `anonym="Pat"` | — |
| mdr | Metadata repository ID in DKTK context | `mdr="dktk:dataelement:20:3"` | — |
| op | Operation applied on value (e.g., `EXTRACT_RELATIVE_ID`) | `op="EXTRACT_RELATIVE_ID"` | — |
---
### Notes on **join-fhir-path**
* Used to join resources in FHIR queries when container references multiple resources.
* Two join types:
* **Direct:** main resource points to secondary resource.
* **Indirect:** secondary resource points back to main resource (path begins with `/`).
* Joins can chain multiple resources, e.g., `R1 -> R2 -> R3`, with commas separating joins.
---
### 4. **CQL**
Contains metadata and details important for handling CQL queries.
| Tag | Description |
| ------- | ---------------------------------------------------------------- |
| `<cql>` | Container for CQL query metadata including tokens and parameters |
---
### 5. **Token (CQL)**
Replaces keys in CQL queries with specific values (commonly used for stratifiers).
| Tag | Description |
| --------- | ------------------------------------- |
| `<token>` | Contains `key` and `value` attributes |
| Attribute | Description | Example |
| --------- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| key | Key to replace in CQL | `key="DKTK_STRAT_MEDICATION_STRATIFIER"` |
| value | CQL code snippet that replaces key | `value="define MedicationStatement: if InInitialPopulation then [MedicationStatement] else {} as List <MedicationStatement>"` |
---
### 6. **Measure Parameters (CQL)**
Parameters for a CQL measure query, typically in JSON format.
| Tag | Description |
| ---------------------- | ----------------------------------------------------------- |
| `<measure-parameters>` | Parameters such as `periodStart`, `periodEnd`, `reportType` |
---
### 7. **Default FHIR Search Query (CQL)**
FHIR search query applied after obtaining measure reports from CQL.
| Tag | Description | Example |
| ----------------------------- | ----------------------------------------------------- | --------- |
| `<default-fhir-search-query>` | Defines a FHIR resource type to query (e.g., Patient) | `Patient` |
---
### 8. **FHIR Reverse Include**
Defines which resources should be reverse-included when using FHIR search as input or CQL\_DATA.
| Tag | Description |
| -------------------- | ------------------------------------------------------------ |
| `<fhir-rev-include>` | Specifies reverse include resources to simplify FHIR queries |
---