diff --git a/ccp/modules/exporter.md b/ccp/modules/exporter.md index 24e81b0..c4bd4cd 100644 --- a/ccp/modules/exporter.md +++ b/ccp/modules/exporter.md @@ -1,15 +1,268 @@ # Exporter and Reporter +--- ## Exporter -The exporter is a REST API that exports the data of the different databases of the bridgehead in a set of tables. -It can accept different output formats as CSV, Excel, JSON or XML. It can also export data into Opal. + +**GitHub:** [https://github.com/samply/exporter](https://github.com/samply/exporter) + +The Exporter is a **REST API** that allows exporting data from various **databases of the bridgehead** as **structured tables**. It supports multiple output formats including **CSV, Excel, JSON, and XML**. Additionally, it can export data directly into **Opal (DataSHIELD)**. + +### How it works + +The **user** submits a **query** and specifies the desired **export template** and **output format**. The **query** acts like the `WHERE` clause in SQL, filtering data, while the **template** defines what data to select and how to format it, similar to the `SELECT` clause. The Exporter then processes this to generate the export files. + +### Environment Variables + +Below is a list of configurable environment variables used by the Exporter: + +| Variable | Default | Description | +| --------------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------------- | +| APPLICATION\_PORT | 8092 | Port on which the application runs. | +| ARCHIVE\_EXPIRED\_QUERIES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for archiving expired queries. | +| CLEAN\_TEMP\_FILES\_CRON\_EXPRESSION | `0 0 1 * * *` | Cron expression for cleaning temporary files. | +| CLEAN\_WRITE\_FILES\_CRON\_EXPRESSION | `0 0 2 * * *` | Cron expression for cleaning written files. | +| CONVERTER\_TEMPLATE\_DIRECTORY | | Directory containing conversion templates. | +| CONVERTER\_XML\_APPLICATION\_CONTEXT\_PATH | | Path to the XML application context used by the converter. | +| CROSS\_ORIGINS | | Allowed CORS origins (comma-separated). | +| CSV\_SEPARATOR\_REPLACEMENT | | Character to replace CSV separators within values. | +| EXCEL\_WORKBOOK\_WINDOW | 30000000 | Memory window size for Excel workbook processing. | +| EXPORTER\_API\_KEY | | API key for authenticating access to the exporter. | +| EXPORTER\_DB\_FLYWAY\_MIGRATION\_ENABLED | true | Enable Flyway DB migrations on startup. | +| EXPORTER\_DB\_PASSWORD | | Password for exporter database. | +| EXPORTER\_DB\_URL | `jdbc:postgresql://localhost:5432/exporter` | JDBC URL for exporter DB. | +| EXPORTER\_DB\_USER | | Username for exporter DB. | +| FHIR\_PACKAGES\_DIRECTORY | | Directory where FHIR packages are stored. | +| HAPI\_FHIR\_CLIENT\_LOG\_LEVEL | OFF | Log level for HAPI FHIR client. | +| HIBERNATE\_LOG | false | Enable Hibernate SQL logging. | +| HTTP\_RELATIVE\_PATH | | Relative base path for HTTP endpoints. | +| HTTP\_SERVLET\_REQUEST\_SCHEME | http | Default HTTP scheme. | +| LOG\_FHIR\_VALIDATION | | Enable logging of FHIR validation results. | +| LOG\_LEVEL | INFO | Application log level. | +| MAX\_NUMBER\_OF\_EXCEL\_ROWS\_IN\_A\_SHEET | 100000 | Max rows per Excel sheet. | +| MAX\_NUMBER\_OF\_RETRIES | 10 | Max retry attempts. | +| MERGE\_FILENAME | | Name of merged output file. | +| SITE | | Site identifier for filenames/logs. | +| TEMP\_FILES\_LIFETIME\_IN\_DAYS | 1 | Lifetime of temporary files (days). | +| TEMPORAL\_FILE\_DIRECTORY | | Directory for temporary files. | +| TIMEOUT\_IN\_SECONDS | 10 | Default timeout (seconds). | +| TIMESTAMP\_FORMAT | | Timestamp format string. | +| WEBCLIENT\_BUFFER\_SIZE\_IN\_BYTES | 8192 | Buffer size for web client. | +| WEBCLIENT\_CONNECTION\_TIMEOUT\_IN\_SECONDS | 5 | Connection timeout (seconds). | +| WEBCLIENT\_MAX\_NUMBER\_OF\_RETRIES | 10 | Max retries for web client. | +| WEBCLIENT\_REQUEST\_TIMEOUT\_IN\_SECONDS | 10 | Request timeout (seconds). | +| WEBCLIENT\_TCP\_KEEP\_CONNECTION\_NUMBER\_OF\_TRIES | 3 | TCP keepalive retry attempts. | +| WEBCLIENT\_TCP\_KEEP\_IDLE\_IN\_SECONDS | 30 | TCP keepalive idle time (seconds). | +| WEBCLIENT\_TCP\_KEEP\_INTERVAL\_IN\_SECONDS | 10 | TCP keepalive probe interval (seconds). | +| WEBCLIENT\_TIME\_IN\_SECONDS\_AFTER\_RETRY\_WITH\_FAILURE | 1 | Wait time after failed retry (seconds). | +| WRITE\_FILE\_DIRECTORY | | Directory for final output files. | +| WRITE\_FILES\_LIFETIME\_IN\_DAYS | 30 | Lifetime of written files (days). | +| XML\_FILE\_MERGER\_ROOT\_ELEMENT | Containers | Root element for XML file merging. | +| ZIP\_FILENAME | `exporter-files-${SITE}-${TIMESTAMP}.zip` | Pattern for ZIP archive naming. | + +--- + +### About Cron Expressions in Spring + +Cron expressions configure scheduled tasks and consist of six space-separated fields representing second, minute, hour, day of month, month, and day of week. For example, the default `0 0 2 * * *` means “at 2:00 AM every day.” These expressions allow precise scheduling for maintenance tasks such as cleaning files or archiving data. + +--- ## Exporter-DB -It is a database to save queries for its execution in the exporter. -The exporter manages also the different executions of the same query in through the database. + +**GitHub:** [https://github.com/samply/exporter-db](https://github.com/samply/exporter-db) (If exists; if not, just remove or adjust accordingly) + +The Exporter-DB stores queries for execution by the Exporter and tracks multiple executions of the same query, managing versioning and scheduling. + +--- ## Reporter -This component is a plugin of the exporter that allows to create more complex Excel reports described in templates. -It is compatible with different template engines as Groovy, Thymeleaf,... -It is perfect to generate a document as our traditional CCP quality report. + +**GitHub:** [https://github.com/samply/reporter](https://github.com/samply/reporter) + +The Reporter is a **plugin for the Exporter** designed for generating **complex Excel reports** based on **customizable templates**. It supports various template engines like **Groovy** and **Thymeleaf**, making it ideal for producing detailed documents such as the traditional CCP **data quality report**. + +--- + +## Exporter Templates + +An exporter template describes the **structure** and **content** of the **export output**. + +### Main Elements + +* **converter**: Defines the export job, specifying output files and data sources. +* **container**: Represents a logical grouping of data rows (like a table). +* **attribute**: Defines individual data fields/columns extracted from the data source. + +### Other Elements + +* **cql**: Contains Clinical Quality Language queries used to enrich or filter data. +* **fhir-rev-include**: Defines FHIR reverse includes to fetch related resources. +* **fhir-package**: (To be detailed) +* **fhir-terminology-server**: (To be detailed) + +### Example Snippet + +```xml + + + + ... + + ... + +``` + +--- + +### 1. **Converter** + +Main tag of an exporter template grouping converters to find the best chain for data conversion. + +| Tag | Description | +| ------------- | --------------------------------------------------------------------------------------------- | +| `` | Main tag for exporter template containing sources, metadata, and additional query information | + +| Attribute | Description | Example | Default | +| ------------------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------- | ------- | +| id | ID to reference a template | `id="ccp-opal"` | — | +| default-name | Default name when output is in a single file format (no extension; added automatically) | — | — | +| ignore | Deactivate template but keep accessible | `ignore="true"` | false | +| excel-filename | Name of the Excel output file (supports variables `${SITE}`, `${TIMESTAMP}`) | `excel-filename="Export-${SITE}-${TIMESTAMP}.xlsx"` | — | +| csv-separator | CSV separator character | — | `"\t"` | +| source-id | ID of the data source | `source-id="blaze-store"` | — | +| target-id | ID of a target server for file transfer (e.g., Opal for DataSHIELD) | `target-id="opal"` | — | +| opal-project | Opal-specific: name of project | — | — | +| opal-permission-type | Opal permission type (`user` or `group`) | — | — | +| opal-permission-subjects | Opal permission subjects | — | — | +| opal-permission | Opal permission (`administrate` or `use`) | — | — | + +**Notes:** + +* Variables like `${SITE}`, `${TIMESTAMP}`, and environment variables (base64 encoded) can be used inside tags. + +**Allowed child elements:** + +* ``, ``, ``, ``, `` + +--- + +### 2. **Container** + +Represents a data table with columns (attributes). + +| Tag | Description | +| ------------- | --------------------------------------------------- | +| `` | Defines a container/table with attributes (columns) | + +| Attribute | Description | Example | Default | +| ---------------- | ------------------------------------------------------------ | --------------------------------------------- | ------- | +| id | Container ID to reference | — | — | +| default-name | Name of Excel sheet/file (no extension, added automatically) | — | — | +| csv-filename | Name of CSV file | `csv-filename="Diagnosis-${TIMESTAMP}.csv"` | — | +| json-filename | Name of JSON file | `json-filename="diagnosis-${TIMESTAMP}.json"` | — | +| xml-filename | Name of XML file | `xml-filename="diagnosis-${TIMESTAMP}.xml"` | — | +| xml-root-element | Root element name in XML | `xml-root-element="diagnoses"` | — | +| xml-element | Element name for each entry in XML | `xml-element="diagnosis"` | — | +| excel-sheet | Excel sheet name | `excel-sheet="diagnosis-${TIMESTAMP}.xlsx"` | — | +| opal-table | Opal table name | `opal-name="Diagnosis"` | — | +| opal-entity-type | Opal entity type | — | — | + +--- + +### 3. **Attribute** + +Represents a column in a container/table. + +| Tag | Description | +| ------------- | --------------------------- | +| `` | Defines an attribute/column | + +| Attribute | Description | Example | Default | +| ------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | ------- | +| id | Attribute ID | `id="Patient-ID"` | — | +| default-name | Default name of the attribute (used if no output-specific name provided) | — | — | +| link | Reference to an attribute of another container (format: `.`) | `link="Patient.Patient-ID"` | — | +| csv-column | Name of the CSV column | — | — | +| excel-column | Name of the Excel column | — | — | +| json-key | JSON key | — | — | +| xml-element | XML element name | — | — | +| opal-value-type | Opal-specific value type | — | — | +| opal-script | Script to be applied to the field in Opal | — | — | +| primary-key | Marks attribute as primary key | `primary-key="true"` | false | +| validation | Marks attribute as syntactic validation field (ends with `-Validation` in DKTK/BBMRI reporter) | `validation="true"` | false | +| val-fhir-path | FHIR path to extract value (if source is a FHIR server) | `val-fhir-path="Patient.gender.value"` | — | +| join-fhir-path | FHIR path for joining secondary resources to main resource | `join-fhir-path="/AdverseEvent.suspectEntity.instance.reference.where(value.startsWith('Procedure')).value"` | — | +| condition-value-fhir-path | Condition filtering for complex value extraction (FHIR path syntax) | `condition-value-fhir-path="Patient.birthDate <= today() - 18 'years'"` | — | +| anonym | Anonymization prefix; replaces real value with `anonym` + number | `anonym="Pat"` | — | +| mdr | Metadata repository ID in DKTK context | `mdr="dktk:dataelement:20:3"` | — | +| op | Operation applied on value (e.g., `EXTRACT_RELATIVE_ID`) | `op="EXTRACT_RELATIVE_ID"` | — | + +--- + +### Notes on **join-fhir-path** + +* Used to join resources in FHIR queries when container references multiple resources. +* Two join types: + + * **Direct:** main resource points to secondary resource. + * **Indirect:** secondary resource points back to main resource (path begins with `/`). +* Joins can chain multiple resources, e.g., `R1 -> R2 -> R3`, with commas separating joins. + +--- + +### 4. **CQL** + +Contains metadata and details important for handling CQL queries. + +| Tag | Description | +| ------- | ---------------------------------------------------------------- | +| `` | Container for CQL query metadata including tokens and parameters | + +--- + +### 5. **Token (CQL)** + +Replaces keys in CQL queries with specific values (commonly used for stratifiers). + +| Tag | Description | +| --------- | ------------------------------------- | +| `` | Contains `key` and `value` attributes | + +| Attribute | Description | Example | +| --------- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | +| key | Key to replace in CQL | `key="DKTK_STRAT_MEDICATION_STRATIFIER"` | +| value | CQL code snippet that replaces key | `value="define MedicationStatement: if InInitialPopulation then [MedicationStatement] else {} as List "` | + +--- + +### 6. **Measure Parameters (CQL)** + +Parameters for a CQL measure query, typically in JSON format. + +| Tag | Description | +| ---------------------- | ----------------------------------------------------------- | +| `` | Parameters such as `periodStart`, `periodEnd`, `reportType` | + +--- + +### 7. **Default FHIR Search Query (CQL)** + +FHIR search query applied after obtaining measure reports from CQL. + +| Tag | Description | Example | +| ----------------------------- | ----------------------------------------------------- | --------- | +| `` | Defines a FHIR resource type to query (e.g., Patient) | `Patient` | + +--- + +### 8. **FHIR Reverse Include** + +Defines which resources should be reverse-included when using FHIR search as input or CQL\_DATA. + +| Tag | Description | +| -------------------- | ------------------------------------------------------------ | +| `` | Specifies reverse include resources to simplify FHIR queries | + +--- +