mirror of
https://github.com/samply/bridgehead.git
synced 2025-06-23 23:20:21 +02:00
Added: Exporter documentation
This commit is contained in:
@ -1,15 +1,268 @@
|
||||
# Exporter and Reporter
|
||||
|
||||
---
|
||||
|
||||
## Exporter
|
||||
The exporter is a REST API that exports the data of the different databases of the bridgehead in a set of tables.
|
||||
It can accept different output formats as CSV, Excel, JSON or XML. It can also export data into Opal.
|
||||
|
||||
**GitHub:** [https://github.com/samply/exporter](https://github.com/samply/exporter)
|
||||
|
||||
The Exporter is a **REST API** that allows exporting data from various **databases of the bridgehead** as **structured tables**. It supports multiple output formats including **CSV, Excel, JSON, and XML**. Additionally, it can export data directly into **Opal (DataSHIELD)**.
|
||||
|
||||
### How it works
|
||||
|
||||
The **user** submits a **query** and specifies the desired **export template** and **output format**. The **query** acts like the `WHERE` clause in SQL, filtering data, while the **template** defines what data to select and how to format it, similar to the `SELECT` clause. The Exporter then processes this to generate the export files.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Below is a list of configurable environment variables used by the Exporter:
|
||||
|
||||
| Variable | Default | Description |
|
||||
| --------------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------------- |
|
||||
| APPLICATION\_PORT | 8092 | Port on which the application runs. |
|
||||
| ARCHIVE\_EXPIRED\_QUERIES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for archiving expired queries. |
|
||||
| CLEAN\_TEMP\_FILES\_CRON\_EXPRESSION | `0 0 1 * * *` | Cron expression for cleaning temporary files. |
|
||||
| CLEAN\_WRITE\_FILES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for cleaning written files. |
|
||||
| CONVERTER\_TEMPLATE\_DIRECTORY | | Directory containing conversion templates. |
|
||||
| CONVERTER\_XML\_APPLICATION\_CONTEXT\_PATH | | Path to the XML application context used by the converter. |
|
||||
| CROSS\_ORIGINS | | Allowed CORS origins (comma-separated). |
|
||||
| CSV\_SEPARATOR\_REPLACEMENT | | Character to replace CSV separators within values. |
|
||||
| EXCEL\_WORKBOOK\_WINDOW | 30000000 | Memory window size for Excel workbook processing. |
|
||||
| EXPORTER\_API\_KEY | | API key for authenticating access to the exporter. |
|
||||
| EXPORTER\_DB\_FLYWAY\_MIGRATION\_ENABLED | true | Enable Flyway DB migrations on startup. |
|
||||
| EXPORTER\_DB\_PASSWORD | | Password for exporter database. |
|
||||
| EXPORTER\_DB\_URL | `jdbc:postgresql://localhost:5432/exporter` | JDBC URL for exporter DB. |
|
||||
| EXPORTER\_DB\_USER | | Username for exporter DB. |
|
||||
| FHIR\_PACKAGES\_DIRECTORY | | Directory where FHIR packages are stored. |
|
||||
| HAPI\_FHIR\_CLIENT\_LOG\_LEVEL | OFF | Log level for HAPI FHIR client. |
|
||||
| HIBERNATE\_LOG | false | Enable Hibernate SQL logging. |
|
||||
| HTTP\_RELATIVE\_PATH | | Relative base path for HTTP endpoints. |
|
||||
| HTTP\_SERVLET\_REQUEST\_SCHEME | http | Default HTTP scheme. |
|
||||
| LOG\_FHIR\_VALIDATION | | Enable logging of FHIR validation results. |
|
||||
| LOG\_LEVEL | INFO | Application log level. |
|
||||
| MAX\_NUMBER\_OF\_EXCEL\_ROWS\_IN\_A\_SHEET | 100000 | Max rows per Excel sheet. |
|
||||
| MAX\_NUMBER\_OF\_RETRIES | 10 | Max retry attempts. |
|
||||
| MERGE\_FILENAME | | Name of merged output file. |
|
||||
| SITE | | Site identifier for filenames/logs. |
|
||||
| TEMP\_FILES\_LIFETIME\_IN\_DAYS | 1 | Lifetime of temporary files (days). |
|
||||
| TEMPORAL\_FILE\_DIRECTORY | | Directory for temporary files. |
|
||||
| TIMEOUT\_IN\_SECONDS | 10 | Default timeout (seconds). |
|
||||
| TIMESTAMP\_FORMAT | | Timestamp format string. |
|
||||
| WEBCLIENT\_BUFFER\_SIZE\_IN\_BYTES | 8192 | Buffer size for web client. |
|
||||
| WEBCLIENT\_CONNECTION\_TIMEOUT\_IN\_SECONDS | 5 | Connection timeout (seconds). |
|
||||
| WEBCLIENT\_MAX\_NUMBER\_OF\_RETRIES | 10 | Max retries for web client. |
|
||||
| WEBCLIENT\_REQUEST\_TIMEOUT\_IN\_SECONDS | 10 | Request timeout (seconds). |
|
||||
| WEBCLIENT\_TCP\_KEEP\_CONNECTION\_NUMBER\_OF\_TRIES | 3 | TCP keepalive retry attempts. |
|
||||
| WEBCLIENT\_TCP\_KEEP\_IDLE\_IN\_SECONDS | 30 | TCP keepalive idle time (seconds). |
|
||||
| WEBCLIENT\_TCP\_KEEP\_INTERVAL\_IN\_SECONDS | 10 | TCP keepalive probe interval (seconds). |
|
||||
| WEBCLIENT\_TIME\_IN\_SECONDS\_AFTER\_RETRY\_WITH\_FAILURE | 1 | Wait time after failed retry (seconds). |
|
||||
| WRITE\_FILE\_DIRECTORY | | Directory for final output files. |
|
||||
| WRITE\_FILES\_LIFETIME\_IN\_DAYS | 30 | Lifetime of written files (days). |
|
||||
| XML\_FILE\_MERGER\_ROOT\_ELEMENT | Containers | Root element for XML file merging. |
|
||||
| ZIP\_FILENAME | `exporter-files-${SITE}-${TIMESTAMP}.zip` | Pattern for ZIP archive naming. |
|
||||
|
||||
---
|
||||
|
||||
### About Cron Expressions in Spring
|
||||
|
||||
Cron expressions configure scheduled tasks and consist of six space-separated fields representing second, minute, hour, day of month, month, and day of week. For example, the default `0 0 2 * * *` means “at 2:00 AM every day.” These expressions allow precise scheduling for maintenance tasks such as cleaning files or archiving data.
|
||||
|
||||
---
|
||||
|
||||
## Exporter-DB
|
||||
It is a database to save queries for its execution in the exporter.
|
||||
The exporter manages also the different executions of the same query in through the database.
|
||||
|
||||
**GitHub:** [https://github.com/samply/exporter-db](https://github.com/samply/exporter-db) (If exists; if not, just remove or adjust accordingly)
|
||||
|
||||
The Exporter-DB stores queries for execution by the Exporter and tracks multiple executions of the same query, managing versioning and scheduling.
|
||||
|
||||
---
|
||||
|
||||
## Reporter
|
||||
This component is a plugin of the exporter that allows to create more complex Excel reports described in templates.
|
||||
It is compatible with different template engines as Groovy, Thymeleaf,...
|
||||
It is perfect to generate a document as our traditional CCP quality report.
|
||||
|
||||
**GitHub:** [https://github.com/samply/reporter](https://github.com/samply/reporter)
|
||||
|
||||
The Reporter is a **plugin for the Exporter** designed for generating **complex Excel reports** based on **customizable templates**. It supports various template engines like **Groovy** and **Thymeleaf**, making it ideal for producing detailed documents such as the traditional CCP **data quality report**.
|
||||
|
||||
---
|
||||
|
||||
## Exporter Templates
|
||||
|
||||
An exporter template describes the **structure** and **content** of the **export output**.
|
||||
|
||||
### Main Elements
|
||||
|
||||
* **converter**: Defines the export job, specifying output files and data sources.
|
||||
* **container**: Represents a logical grouping of data rows (like a table).
|
||||
* **attribute**: Defines individual data fields/columns extracted from the data source.
|
||||
|
||||
### Other Elements
|
||||
|
||||
* **cql**: Contains Clinical Quality Language queries used to enrich or filter data.
|
||||
* **fhir-rev-include**: Defines FHIR reverse includes to fetch related resources.
|
||||
* **fhir-package**: (To be detailed)
|
||||
* **fhir-terminology-server**: (To be detailed)
|
||||
|
||||
### Example Snippet
|
||||
|
||||
```xml
|
||||
<converter id="ccp" excel-filename="Export-${SITE}-${TIMESTAMP}.xlsx" source-id="blaze-store">
|
||||
<container id="Patient" csv-filename="Patient-${SITE}-${TIMESTAMP}.csv" excel-sheet="Patient" ...>
|
||||
<attribute id="Patient-ID" default-name="PatientID" val-fhir-path="Patient.id.value" anonym="Pat" op="EXTRACT_RELATIVE_ID"/>
|
||||
...
|
||||
</container>
|
||||
...
|
||||
</converter>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 1. **Converter**
|
||||
|
||||
Main tag of an exporter template grouping converters to find the best chain for data conversion.
|
||||
|
||||
| Tag | Description |
|
||||
| ------------- | --------------------------------------------------------------------------------------------- |
|
||||
| `<converter>` | Main tag for exporter template containing sources, metadata, and additional query information |
|
||||
|
||||
| Attribute | Description | Example | Default |
|
||||
| ------------------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------- | ------- |
|
||||
| id | ID to reference a template | `id="ccp-opal"` | — |
|
||||
| default-name | Default name when output is in a single file format (no extension; added automatically) | — | — |
|
||||
| ignore | Deactivate template but keep accessible | `ignore="true"` | false |
|
||||
| excel-filename | Name of the Excel output file (supports variables `${SITE}`, `${TIMESTAMP}`) | `excel-filename="Export-${SITE}-${TIMESTAMP}.xlsx"` | — |
|
||||
| csv-separator | CSV separator character | — | `"\t"` |
|
||||
| source-id | ID of the data source | `source-id="blaze-store"` | — |
|
||||
| target-id | ID of a target server for file transfer (e.g., Opal for DataSHIELD) | `target-id="opal"` | — |
|
||||
| opal-project | Opal-specific: name of project | — | — |
|
||||
| opal-permission-type | Opal permission type (`user` or `group`) | — | — |
|
||||
| opal-permission-subjects | Opal permission subjects | — | — |
|
||||
| opal-permission | Opal permission (`administrate` or `use`) | — | — |
|
||||
|
||||
**Notes:**
|
||||
|
||||
* Variables like `${SITE}`, `${TIMESTAMP}`, and environment variables (base64 encoded) can be used inside tags.
|
||||
|
||||
**Allowed child elements:**
|
||||
|
||||
* `<container>`, `<cql>`, `<fhir-rev-include>`, `<fhir-package>`, `<fhir-terminology-server>`
|
||||
|
||||
---
|
||||
|
||||
### 2. **Container**
|
||||
|
||||
Represents a data table with columns (attributes).
|
||||
|
||||
| Tag | Description |
|
||||
| ------------- | --------------------------------------------------- |
|
||||
| `<container>` | Defines a container/table with attributes (columns) |
|
||||
|
||||
| Attribute | Description | Example | Default |
|
||||
| ---------------- | ------------------------------------------------------------ | --------------------------------------------- | ------- |
|
||||
| id | Container ID to reference | — | — |
|
||||
| default-name | Name of Excel sheet/file (no extension, added automatically) | — | — |
|
||||
| csv-filename | Name of CSV file | `csv-filename="Diagnosis-${TIMESTAMP}.csv"` | — |
|
||||
| json-filename | Name of JSON file | `json-filename="diagnosis-${TIMESTAMP}.json"` | — |
|
||||
| xml-filename | Name of XML file | `xml-filename="diagnosis-${TIMESTAMP}.xml"` | — |
|
||||
| xml-root-element | Root element name in XML | `xml-root-element="diagnoses"` | — |
|
||||
| xml-element | Element name for each entry in XML | `xml-element="diagnosis"` | — |
|
||||
| excel-sheet | Excel sheet name | `excel-sheet="diagnosis-${TIMESTAMP}.xlsx"` | — |
|
||||
| opal-table | Opal table name | `opal-name="Diagnosis"` | — |
|
||||
| opal-entity-type | Opal entity type | — | — |
|
||||
|
||||
---
|
||||
|
||||
### 3. **Attribute**
|
||||
|
||||
Represents a column in a container/table.
|
||||
|
||||
| Tag | Description |
|
||||
| ------------- | --------------------------- |
|
||||
| `<attribute>` | Defines an attribute/column |
|
||||
|
||||
| Attribute | Description | Example | Default |
|
||||
| ------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ------- |
|
||||
| id | Attribute ID | `id="Patient-ID"` | — |
|
||||
| default-name | Default name of the attribute (used if no output-specific name provided) | — | — |
|
||||
| link | Reference to an attribute of another container (format: `<container-name>.<attribute-id>`) | `link="Patient.Patient-ID"` | — |
|
||||
| csv-column | Name of the CSV column | — | — |
|
||||
| excel-column | Name of the Excel column | — | — |
|
||||
| json-key | JSON key | — | — |
|
||||
| xml-element | XML element name | — | — |
|
||||
| opal-value-type | Opal-specific value type | — | — |
|
||||
| opal-script | Script to be applied to the field in Opal | — | — |
|
||||
| primary-key | Marks attribute as primary key | `primary-key="true"` | false |
|
||||
| validation | Marks attribute as syntactic validation field (ends with `-Validation` in DKTK/BBMRI reporter) | `validation="true"` | false |
|
||||
| val-fhir-path | FHIR path to extract value (if source is a FHIR server) | `val-fhir-path="Patient.gender.value"` | — |
|
||||
| join-fhir-path | FHIR path for joining secondary resources to main resource | `join-fhir-path="/AdverseEvent.suspectEntity.instance.reference.where(value.startsWith('Procedure')).value"` | — |
|
||||
| condition-value-fhir-path | Condition filtering for complex value extraction (FHIR path syntax) | `condition-value-fhir-path="Patient.birthDate <= today() - 18 'years'"` | — |
|
||||
| anonym | Anonymization prefix; replaces real value with `anonym` + number | `anonym="Pat"` | — |
|
||||
| mdr | Metadata repository ID in DKTK context | `mdr="dktk:dataelement:20:3"` | — |
|
||||
| op | Operation applied on value (e.g., `EXTRACT_RELATIVE_ID`) | `op="EXTRACT_RELATIVE_ID"` | — |
|
||||
|
||||
---
|
||||
|
||||
### Notes on **join-fhir-path**
|
||||
|
||||
* Used to join resources in FHIR queries when container references multiple resources.
|
||||
* Two join types:
|
||||
|
||||
* **Direct:** main resource points to secondary resource.
|
||||
* **Indirect:** secondary resource points back to main resource (path begins with `/`).
|
||||
* Joins can chain multiple resources, e.g., `R1 -> R2 -> R3`, with commas separating joins.
|
||||
|
||||
---
|
||||
|
||||
### 4. **CQL**
|
||||
|
||||
Contains metadata and details important for handling CQL queries.
|
||||
|
||||
| Tag | Description |
|
||||
| ------- | ---------------------------------------------------------------- |
|
||||
| `<cql>` | Container for CQL query metadata including tokens and parameters |
|
||||
|
||||
---
|
||||
|
||||
### 5. **Token (CQL)**
|
||||
|
||||
Replaces keys in CQL queries with specific values (commonly used for stratifiers).
|
||||
|
||||
| Tag | Description |
|
||||
| --------- | ------------------------------------- |
|
||||
| `<token>` | Contains `key` and `value` attributes |
|
||||
|
||||
| Attribute | Description | Example |
|
||||
| --------- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
|
||||
| key | Key to replace in CQL | `key="DKTK_STRAT_MEDICATION_STRATIFIER"` |
|
||||
| value | CQL code snippet that replaces key | `value="define MedicationStatement: if InInitialPopulation then [MedicationStatement] else {} as List <MedicationStatement>"` |
|
||||
|
||||
---
|
||||
|
||||
### 6. **Measure Parameters (CQL)**
|
||||
|
||||
Parameters for a CQL measure query, typically in JSON format.
|
||||
|
||||
| Tag | Description |
|
||||
| ---------------------- | ----------------------------------------------------------- |
|
||||
| `<measure-parameters>` | Parameters such as `periodStart`, `periodEnd`, `reportType` |
|
||||
|
||||
---
|
||||
|
||||
### 7. **Default FHIR Search Query (CQL)**
|
||||
|
||||
FHIR search query applied after obtaining measure reports from CQL.
|
||||
|
||||
| Tag | Description | Example |
|
||||
| ----------------------------- | ----------------------------------------------------- | --------- |
|
||||
| `<default-fhir-search-query>` | Defines a FHIR resource type to query (e.g., Patient) | `Patient` |
|
||||
|
||||
---
|
||||
|
||||
### 8. **FHIR Reverse Include**
|
||||
|
||||
Defines which resources should be reverse-included when using FHIR search as input or CQL\_DATA.
|
||||
|
||||
| Tag | Description |
|
||||
| -------------------- | ------------------------------------------------------------ |
|
||||
| `<fhir-rev-include>` | Specifies reverse include resources to simplify FHIR queries |
|
||||
|
||||
---
|
||||
|
||||
|
Reference in New Issue
Block a user