Transparency EKG Requirements Specification, Architecture and Semantic Model

Last updated: 28-09-2022

Authors:
Vladimir Alexiev, Viktor Ribchev, Miroslav Chervenski, Nikola Tulechki, Mihail Radkov, Antoniy Kunchev, Radostin Nanov

Transparency Energy Knowledge Graph

Developed by:	Ontotext (Sirma AI)
Based on data from:	ENTSO-E Transparency Platform
Powered by:

This project has received funding from the European Union’s Horizon 2020 research and innovation programme
under grant agreement No 824330: INTERRFACE Open Call (cascade funding)

0.1 Document Revision History

Version	Date	Changes Made
M4	2022-06-10	Final Version
M4	2022-04-08	V1 of the TEKG Refinement of validation rules
M3.1	2022-03-23	Started tracking revison histiory Review comment addressed in installedCapacity-Aggregated-vs-Per-Unit
М3	2022-03-08	M3 Deliverable corresponding to V1 of the TEKG

1 Intro

The ENTSO-E Transparency Platform provides information that is crucial for the efficient and fair operation of the EU energy market. It includes a large number of data items (time series) that are strictly defined in EUreg Transparency and further elaborated in MoP DDD (see Project Glossary on where to find these references).

Knowledge Graphs (KG) have numerous benefits for data integration across enterprises and disciplines. The Energy Identification Code (EIC) is a global identifier of energy resources (objects) and parties (domains/areas, market participants, exchanges, etc).

With this project we hope to make a step in the direction of Energy KGs by creating a Transparency Energy KG (TEKG) from ENTSOE Transparency data. We use GraphDB, the Ontotext Platform, and semantic data integration. We demonstrate the benefits of KG for:

Data quality, uncovering a number of Data Quality problems in ENTSOE data
Integrating of external data
Data analytics, showing GraphDB-Elasticsearch-Kibana data flows

This living document specifies the TEKG:

M1 (2022-01-14) specifies the project Scope and some draft requirements
M2 (2022-02-02) specifies all Business Requirements including mockups
M3 (2022-03-02) specifies Semantic Models, Software Architecture (and incorporates a Test Plan)
M4 (2022-06-01) specifies the final version of the TEKG

The demonstrator is availble at https://transparency.ontotext.com/

1.1 Project Glossary

We have created and will maintain a comprehensive project glossary. Every special term and abbreviation that we encounter is added to the glossary.

It also includes a list of Sources:

EC regulations
Manual of Procedures (MoP) and its parts, including DDD Detailed Data Descriptions
Other ENTSOE documents and pages, amongst them:
- doc Free Reuse: Data Available for Free Re-Use
- doc Functions: List of allowed functions for the EIC codes
Scientific Papers

1.2 Areas

The constituency of ENTSOE is broken up into a number of Domain/Area "meshes" according to different principles. See glossary#areas for a description of all kinds of Areas.

The following kinds of Areas are most important for Transparency because they are used in Data Items:

Bidding Zone, BZN: largest geographical area in which there is a uniform spot price, in which Market Participants can exchange energy without Capacity Allocation.
Control Area, CA=CTA: coherent part of the interconnected system, operated by a single system operator and shall include connected physical loads and/or generation units
Member State (Country), CTY: EU member state or a neighboring state
Market Balance Area, MBA: geographic area in which there is a uniform balancing energy price. Consists of one or more Metering Grid Areas with common market rules for which the settlement responsible party carries out a balance settlement and which has the same price for imbalance. May also be defined due to bottlenecks.
Scheduling Area, SCA: same as Bidding Zone, except if there is more than one Responsibility Area within this Bidding Zone. In the latter case, the Scheduling Area equals Responsibility Area or a group of Responsibility Areas.

Resources (Eg Production and Generation Units) of these Areas can be requested from the Transparency portal and are used as key request parameters in the REST API. For example:

Actual Generation per Production Type is applicable to the 3 kinds CTY, CTA, BZN
Whereas Balancing data items use MBA (eg for Cross-Border Balancing), SCA (eg for Procured Capacity):

The following query finds 198 relevant Areas of the above kinds in the EIC file, and returns them with all functions:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
    values ?fun {"Member State" "Control Area" "Bidding Zone" "Market Balance Area" "Scheduling Area"}
    ?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
    optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)

We get from EIC the 3 critical kinds CTY, CTA, BZN that are of interest to us (111 such Areas):

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
    values ?fun {"Member State" "Control Area" "Bidding Zone"}
    ?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
    optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)

Unfortunately there are discrepancies, see data/areas.tsv that has the following columns (with count shown):

name: area name (121)
co: country code (49, 29 unique)
eic: EIC code (121)
funcs: which of the 3 functions BZN, CTA, CTY are listed for the area (121)
inEIC: whether it's present in the EIC file (111)
inDoc: whether it's present in the documentation REST API Guide#Areas (89)
inAPI: whether it's accepted by the REST API request master_data i.e. Installed Capacity Per Production Unit (87)
inVIES: whether VAT numbers of that country can be validated in VIES. see External VAT Validation

We have the following combinations:

inEIC	inDoc	inAPI	count
0	1	1	10
1	0	0	31
1	0	1	1
1	1	0	3
1	1	1	76

34 areas are listed in the EIC file but rejected by the API
3 areas are documented but rejected by the API:

not	eic	funcs	comment
DE	10Y1001A1001A83F	Member State	Instead, use BZN (CZ-DE-SK, DE-AT-LU, DE-LU) or CTA (50hertz, Amprion, Tennet GER, TransnetBW) are used
DK	10Y1001A1001A65H	Member State	Instead, use BZN (DK-1, DK-2) is used
UK	10Y1001A1001A92E	Member State	Instead, use BZN (GB National Grid, IE(SEM)) or CTA (National Grid, NIE) are used

1 area is accepted by the API but not documented:

not	eic	funcs	comment
GB-NI	10Y1001A1001A016	Control Area	NIE?

10 areas are missing from the EIC XML file but are documented and accepted by the API: we added their data to areas.tsv and to a manually crafted turtle/eic-extra.ttl

notation	co	eic	funcs
IT-BRINDISI	IT	10Y1001A1001A699	Bidding Zone
IT-FOGGIA	IT	10Y1001A1001A72K	Bidding Zone
IT-PRIOLO	IT	10Y1001A1001A76C	Bidding Zone
IT-ROSSANO	IT	10Y1001A1001A77A	Bidding Zone
BY	BY	10Y1001A1001A51S	Control Area, Bidding Zone, Market Balance Area
MD	MD	10Y1001A1001A990	Control Area, Bidding Zone, Market Balance Area
RU	RU	10Y1001A1001A49F	Control Area, Bidding Zone, Market Balance Area
KALININGRAD	RU	10Y1001A1001A50U	Control Area, Bidding Zone, Market Balance Area
PL-CZ		10YDOM-1001A082L	Control Area, Bidding Zone
CZ+DE+SK		10YDOM-CZ-DE-SKK	Bidding Zone

1.3 Countries

We find some interesting discrepancies of "Member State" areas:

Many are missing country code (tr:countryCode): BE, CZ, DE, ES, FR, ICELAND, IT, LU, NL, NO, SE, SK, UA, UK
All tr:name are country code except "ICELAND" which is a full name
LV lists 4 "Member States": "LV" but also "END_USERS_LV", "DISTRIBUTION_LV", "VTP_LV"

In other to join external power plant datasets, we need a list of ENTSOE countries with ISO2 and ISO3 codes.

The following query finds 36 countries that are members of ENTSOE. We use a Federated query to Wikidata:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?eic ?iso2 ?iso3 ?name ?wd_name where {
  ?x tr:function "Member State"; tr:eic ?eic; tr:notation ?n; tr:name ?name.
  bind(if(?n="ICELAND","IS",?n) as ?iso2)
  service <https://query.wikidata.org/sparql> {
    ?y wdt:P297 ?iso2; wdt:P298 ?iso3; rdfs:label ?wd_name
    filter(lang(?wd_name)="en")
  }
} order by ?iso2

We replace dynamically "ICELAND" with "IS" which is its proper iso2 code
The join to Wikidata by iso2 eliminates the 3 extraneous LV "Member States"
In the result data/countries.csv, we merge the two names of NL to one row: "Netherlands; Kingdom of the Netherlands"
We resolve a difference of United Kingdom vs Great Britain
Finally, we add the 3 countries missing from the EIC file (RU, BY, MD)

2 Data Items

The ENTSOE Transparency portal includes about 80-135 data items (depending on how you count). The items cover 7 domains:

Load: power consumption forecasts and actuals
Generation: production installed capacities (configuration), forecasts and actuals
Transmission: power transfers over borders between areas
Balancing: regulation energy used to keep the electrical transmission grid in balance: bids (price & volume), capacity, imbalance prices and volume
Outages: planned maintenances and unplanned failures inside the electrical grid: transmission, generation, consumption, offshore grid. The most popular domain
Congestion Management: actions taken to relieve overloaded parts of the electrical transmission grid
System Operations: Operational Agreements (on Synchronous Areas, LFC Blocks), Measurements of frequency quality (PDFs)

Data items are described in various documents:

EUreg Transparency: Commission Regulation (EU) No 543/2013
data-items-sitemap.txt: page Sitemap: 7 domains, 84 items
data-items-kb.txt: page Knowledge Base: 7 domains, 84 items
- Includes ECreg Transparency item definitions and clause references, as well as more detailed item descriptions, sometimes with illustrations
data-items-sftp.txt: page SFTP: 6 domains (excludes System Operations), 100 items
- Includes ECreg Transparency clause references
- Includes column descriptions of the 156 fields that appear in these 100 tables. But fields are not always explained well, only examples are provided
- Has some important omissions, eg ActualGenerationOutputPerGenerationUnit shows PowerSystemResourceName, but the respective CSV file also has GenerationUnitEIC
doc Free Reuse: Data Available for Free Re-Use (2019-11).
- Describes 35 data items that are available for Free Reuse (and are therefore our first target).
- For data items that are not in the list, one needs to seek the consent of the primary data owner (see Primary Owner of Data for each row of MoP DDD, most often the TSO)

2.1 Data Item Description

We have reconciled the various descriptions of data items and integrated them in this Google Sheet .

From it we generate a semantic description in file data/turtle/small/kb.ttl using the query in etl_scripts/dataItems.ru which includes the following properties (examples given for item <data/load/ActualTotalLoad_6.1.A>):

tr:name: item name, eg "Actual Total Load"
tr:file: base file name of XML (REST API) or CSV (SFTP), eg "ACTUAL_TOTAL_LOAD" or "ActualTotalLoad"
tr:dataDomain: parent data domain, eg <data/load>
tr:linkDescription: link to detailed description (see "knowledge base" above), eg Total Load - Day Ahead - Actual
tr:linkPortal: link to ENTSOE portal where the item can be viewed/downloaded, eg totalLoadR2/show
tr:linkDownload: download link, applies only to "static" files:
- EIC: XML or CSV, see section EIC above
- Codelist: EDI/Library/CodelistV80.zip (09/12/2021)
tr:link: applies only to "external" sources
tr:regArticle: article of ECreg Transparency describing the item, eg:
- 6.1.A for Actual Total Load
- 12.3.A.d for Explicit Allocations - Auction Revenue (daily)
- 12.3.A.i for Explicit Allocations - Auction Revenue (intraday)
- 16.1.B and 16.1.C for Aggregated Generation per Type
tr:isFreeReuse: whether the item is available for free reuse
tr:ekgCheckDataQuality: whether TEKG will implement Data Validations over the item
tr:ekgImplementAnalytics: whether TEKG will implement Analytics over the item

2.2 Data Items to be Integrated

This is the full list of data items that will be integrated. It includes items to be validated (ekgCheckDataQuality) and items to implement analytics for (ekgImplementAnalytics):

(Basic) Energy Identification Code file (EIC)
(Basic) Codelists
(External) Open Street Map (OSM)
(External) Other external datasets of power plants and generators
(Load) Actual Total Load (Cancelled)
(Load) Day-ahead Total Load Forecast (Cancelled)
(Load) Month-ahead Total Load Forecast (Cancelled)
(Load) Week-ahead Total Load Forecast (Cancelled)
(Load) Year-ahead Total Load Forecast (Cancelled)
(Generation) Installed Capacity Per Production Unit
(Generation) Aggregated Generation per Type
(Generation) Current Generation Forecasts for Wind and Solar
(Outages) Planned Unavailability and Changes in Actual Availability of Generation Units
(Outages) Planned Unavailability and Changes in Actual Availability of Production Units
(Balancing) Accepted Aggregated Offers
(Balancing) Prices Of Activated Balancing Energy
(Balancing) Activated Balancing Energy

The following subsections provide detailed description and analysis of each item:

Research the data items
Research data availability. We take:
- The "Basic" items from REST API as XML
- The "transactional" items (Load, Generation, Outages) from SFTP as CSV
Analyze XML schemas XSDs, take and analyze XML and CSV examples
Add to Data Validation
Create semantic models showing which data fields are mapped to what RDF constructs
This will be the basis of semantic conversions

2.3 Historical Data Ingestion

Historical data for Balancing is ingested for 6 months in the past.
- e.g. for ActivatedBalancingEnergy on 2022-03-01, a total of 12 CSV files need to be processed, prefixed from 2022_03 to 2021_02
Historical data for Generation is ingested 1 full month in the past
- e.g for AggregatedGenerationPerType on 2022-03-01, 2 csv files need to be processed, prefixed 2022_02 and 2022_01
DayAheadGenerationForecastForWindAndSolar is also ingested 1 month in the past
All available future data is always ingested and processed.

Exception: DayAheadGenerationForecastForWindAndSolar for CTA 10YAL-KESH-----5 has over 1 year of null forecasts with 0.00 values. For this reason we will limit future data for this data item to 1 month.

2.4 Temporal Aggregation

Temporal aggregation is required for producing analytics where the diagrams require a coarser level of aggregation than the raw data. This section specifies the temporal aspects of the time-series data.

2.4.1 Generation of Aggregated Data

Temporal aggregation is provided by creating synthetic data items where the amounts are aggregated at the desired temporal resolution. Eg the Balancing Energy Timeline requires hourly or daily aggregates of the Prices Of Activated Balancing Energy and Activated Balancing Energy data items.

Depending on the source, these data items are reported on different temporal resolutions from 15 min to 1h (PT15M, PT30M and PT60M) These values are harmonised at:

PT1H (hourly) resolution stored as data item PricesOfActivatedBalancingEnergy_HOURLY
P1D (daily) resolution stored as data item PricesOfActivatedBalancingEnergy_DAILY

Similarly, Activated Balancing Energy is aggregated in ActivatedBalancingEnergy_HOURLY and ActivatedBalancingEnergy_DAILY

Hourly aggregations are defined by the timestamp at the whole hour preceding the measurement.
Daily aggregations are defined by the timestamp at midnight preceding the measurement.
These synthetic data items are produced using arithmetical operations in SPARQL update queries.
They run on the entire data and overwrite the aggregated values at each execution.

Note: A similar procedure is used for spatial aggregation of individual capacities in a given area, see InstalledGenerationCapacityComputed

2.4.2 Summary Operations

Summary operations differ according to the values being aggregated:

PricesOfActivatedBalancingEnergy: the amounts are averaged over the time period
ActivatedBalancingEnergy: the amounts are summed over the time period

2.5 Semantic Model Diagrams

We visualize semantic models (RDF mappings) using the rdfpuml tool from https://github.com/VladimirAlexiev/rdf2rml . These are graph models that show:

Colored circles to indicate the data item (eg (C)=Codelist, (E)=EIC file, (P)=Production and Generation Units)
The XML path or CSV file used to source data for each node, as "..." right after the class name
- This may also use XPath conditions (as used in Codelist Mapping)
XML or CSV field names in brackets (round brackets in URLs and square brackets in literal values)
The datatype used for each literal

2.6 XML Items and XML Schemas

We obtained XML schemas from CIM_xsd_package.zip (and a few others) and saved to folder xsd

This zip includes multiple versions of the same schema. We examined the XML data items selected for integration and selected the actual schema versions used in them
We converted selected schemas to Relax NG (rng) and Relax NG Compact (rnc) because the latter format is much easier to understand than XSD.
We used tooling from https://github.com/VladimirAlexiev/rnc
We analyzed the following schemas of selected items, and cite the relevant parts in subsections:
- Codelists: data/code-lists/urn-entsoe-eu-wgedi-codelists.xsd as the codelist are embedded in this XSD
- EIC: data/rnc/iec62325-451-n-eiccode_v1_0.rnc
- Production and Generation Units: data/rnc/iec62325-451-6-configuration.rnc

2.6.1 Codelists

The codelists describe the basic lookups used on the Transparency platform.

We obtained CodelistV80.zip and saved data/code-lists/urn-entsoe-eu-wgedi-codelists.xsd. The codelists are embedded in this XSD. We use only "Standard" TypeLists, eg:

  <xsd:simpleType name="StandardAssetTypeList">
    <xsd:annotation>
      <xsd:documentation>
        <Uid>ET0031</Uid>
        <Definition>The identification of the type of asset.</Definition>
      </xsd:documentation>
    </xsd:annotation>
    <xsd:restriction base="xsd:NMTOKEN">
      <xsd:enumeration value="A01">
        <xsd:annotation>
          <xsd:documentation>
            <CodeDescription>
              <Title>Tieline</Title>
              <Definition>A high voltage line used for cross border energy interconnections.</Definition>
            </CodeDescription>
          </xsd:documentation>
        </xsd:annotation>
      </xsd:enumeration>

2.6.1.1 Codelist Mapping

We convert XML codelists to this simple RDF representation (alternatively, we could use SKOS):

@base <https://transparency.ontotext.com/resource/> .

<type/Asset> a tr:CodeList;
  tr:name "Asset";
  tr:notation "ET0031";
  tr:description "The identification of the type of asset.".

<type/Asset/A01> a tr:CodeValue;
  tr:codeList <type/Asset>;
  tr:name "Tieline";
  tr:notation "A01";
  tr:description "A high voltage line used for cross border energy interconnections." .

A general model looks like data/model/codelist.ttl:

In order to match string values in CSV files to the codelists, we add nameAlt to some code values. For example, the code value for "FCR" (a type of balancing reserve) looks like data/turtle/small/codelists-extra.ttl:

To facilitate faceted search/display, we have added a hierarchy to <type/Asset> using the tr:fuelTypeClassification predicate. Тhe different varieties of Hydro powered assets under a generic Hydro asset typeare meterilized in data/model/codelist-eg.ttl. We also add some matching info in order to match fuel type from other databases to the ENTSOE codelist.

2.6.2 EIC File

The EIC file provides basic information about Energy Resources.

EIC was devised by ENTSOE but is also used by ENTSOG.
While ENTSOE allocates some EIC codes (in its role as CIO), most are issued by national authorities (LIO) in a distributed way. The important EIC codes are sent back to ENTSOE
The third char of EIC determines the kind of resource according to the table shown in (*) below. We populate a field eicType, see Add eicType

The ENTSOE EIC file is available from several sources:

XML allocated-eic-codes.xml (namespace urn:iec62325.351:tc57wg16:451-n:eicdocument:1:0), 2021-12-31, has grown by 3.3% in 7 months
CSV: page eic-approved-codes that offers browsing and several CSV downloads:

curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/A_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/T_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/V_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/W_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/X_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Y_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Z_eiccodes.csv

Counting the number of records:

XML total and breakdown per type:

grep -c "<EICCode_MarketDocument>" allocated-eic-codes.xml
perl -lne 'print $1 if m{<mRID>..(.).............</mRID>}' allocated-eic-codes.xml|sort|uniq -c

After transforming XML to RDF and loading to GraphDB we Add eicType
CSV total and breakdown per type (need to subtract 1 from each result to account for the header line)

wc -l *.csv

(*) Counts for XML and CSV:

char	type	XML	CSV
"A"	"Substation"	2447	2457
"T"	"Tieline/Transformer"	9985	10104
"V"	"Location"	516	522
"W"	"Resource Object"	20116	20195
"X"	"Party"	10115	10138
"Y"	"Area or Domain"	1140	1143
"Z"	"Measurement point"	1841	1842
	TOTAL	46160	46401

So the CSV has 241 records more than the XML.

The CSV has field EicStatus and we guessed that maybe the extra resources have status Passive. While trying to get statistics for this field, we found that the CSV is malformed: it is semicolon-separated but includes fields with embedded semicolon and no quoting. For example:

X_eiccodes.csv: GASINDUR; S.L.
Y_eiccodes.csv: Enson tutkimustehdas; Imatra

csvtk summary -d ; -f EicCode:count -g EicStatus X_eiccodes.csv
[ERRO] record on line 2731: wrong number of fields

head -2731 X_eiccodes.csv |tail -1
18X0000000000KCL;INDUR;GASINDUR; S.L.;;;Active;47012;ES;ESB34041400;Trade Responsible Party;X

head -1051 Y_eiccodes.csv |tail -1
44Y-00000000246A;FI_EGTU00;Enson tutkimustehdas; Imatra;;44X-00000000100F;Active;;FI;;Metering Grid Area;Y

We guessed the opposite status is Passive but found no resources with this word:

grep -c Passive *.csv

Judging from the count, the CSV is a superset of the XML. But we double-checked the particular EIC ids for the critical type "Area or Domain", and indeed CSV has 3 extra records (namely Cut Areas/Corridors):

cut -f 1 -d \; Y_eiccodes.csv | tail -n +2 | sort > eic-areas-csv.txt
perl -lne 'print $1 if m{<mRID>(..Y.............)</mRID>}' allocated-eic-codes.xml|sort>eic-areas-xml.txt
comm -3 eic-areas-csv.txt eic-areas-xml.csv

46Y000000000007M
46Y000000000008K
46Y000000000009I

grep "46Y000000000007M|46Y000000000008K|46Y000000000009I" Y_eiccodes.csv
46Y000000000007M;CUT_AREA_SE3A;Cut area SE3A;;;Active;;;;Bidding Zone;Y
46Y000000000009I;CUT_COR_SE3A-SE3;Cut corridor SE3A-SE3;;;Active;;;;Bidding Zone;Y
46Y000000000008K;CUT_AREA_SE3;Cut area SE3;;;Active;;;;Bidding Zone;Y

2.6.2.1 EIC Fields

EIC XML has the following structure shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity:

EIC_MarketDocument =
 element mRID {ID_String},
 element revisionNumber {ESMPVersion_String},
 element type {MessageKind_String},
 element sender_MarketParticipant.mRID {PartyID_String}?,
 element sender_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
 element receiver_MarketParticipant.mRID {PartyID_String}?,
 element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
 element createdDateTime {ESMP_DateTime},
 element EICCode_MarketDocument {EICCode_MarketDocument}*

EICCode_MarketDocument =
 element mRID {EICCode_String}?,
 element status {Action_Status}?,
 element docStatus {Action_Status}?,
 element attributeInstanceComponent.attribute {xsd:string}?,
 element long_Names.name {Characters70_String},
 element display_Names.name {Characters16_String},
 element lastRequest_DateAndOrTime.date {xsd:date},
 element deactivationRequested_DateAndOrTime.date {xsd:date}?,
 element eICContact_MarketParticipant.name {Characters70_String}?,
 element eICContact_MarketParticipant.phone1 {TelephoneNumber}?,
 element eICContact_MarketParticipant.electronicAddress {ElectronicAddress}?,
 element eICCode_MarketParticipant.streetAddress {StreetAddress}?,
 element eICCode_MarketParticipant.aCERCode_Names.name {ACERCode_String}?,
 element eICCode_MarketParticipant.vATCode_Names.name {VATCode_String}?,
 element eICParent_MarketDocument.mRID {EICCode_String}?,
 element eICResponsible_MarketParticipant.mRID {EICCode_String}?,
 element description {Characters700_String}?,
 element Function_Names {Function_Name}*

StreetAddress =
 element streetDetail {StreetDetail}?,
 element postalCode {Characters10_String}?,
 element townDetail {TownDetail}?

StreetDetail =
 element addressGeneral {Characters70_String}?,
 element addressGeneral2 {Characters70_String}?,
 element addressGeneral3 {Characters70_String}?

TownDetail =
 element name {Characters35_String}?,
 element country {Characters2_String}?

We examined actual XML instances and show below the fields that are filled and useful (not constant).

A field comparison between CSV, XML and the resulting RDF properties (which we hope are shorter and easier to understand):

CSV	XML	RDF	Note
EicCode	mRID	tr:eic	Also used in URL
EicDisplayName	display_Names.name	tr:notation
EicLongName	long_Names.name	tr:name
	description	tr:description	Often repeats the Functions
EicParent	ns:eICParent_MarketDocument.mRID	tr:parentResource	As EIC URL
EicResponsibleParty	eICResponsible_MarketParticipant.mRID	tr:responsibleParticipant	As EIC URL
EicStatus	docStatus/value		Always A05, so omitted
MarketParticipantPostalCode			Not in XML
MarketParticipantIsoCountryCode	eICCode_MarketParticipant.streetAddress/townDetail/country	tr:countryCode
MarketParticipantVatCode	vATCode_Names.name	tr:vatNumber
	aCERCode_Names.name	tr:acerCode
EicTypeFunctionList	Function_Names/name	tr:function
type		tr:eicType	Generated from EIC 3rd char
	lastRequest_DateAndOrTime.date	tr:dateUpdated

So each file (XML vs CSV) has some extra fields compared to the other:

XML has dateUpdated, which can be quite important in data update scenarios
XML has acerCode, which can be important for external data integration with ACER
XML has description, which most often repeats the Functions, with some informative exceptions, eg
- "Entry/Exit Point From A Storage Between Storengy And Grtgaz"
- "Implementation of common platform for aFRR, as mandated by EB GL."
- "Domestic exit point"
- "Albanain LIO office is applying for EIC codes- identifying Kosovo Production and Generation Unit since they do not have LIO office, yet."
- "Connection With The Distribution System"
CSV has PostalCode, but we suspect that many are nonsensical data, eg
- Azerbaijan: postalCode=1002, countryCode=BE

2.6.2.2 EIC Mapping

For now, we use EIC XML, but later we might decide to replace or complement with EIC CSV. Unfortunately, both of these files are missing some Areas that are returned by the REST API.

The EIC file is mapped to RDF as follows (XML field names are shown in brackets).

All fields are extracted from XML, except eicType (see Add eicType)

2.6.3 Production and Generation Units

We use the Production and Generation Units REST API that returns XML data items having the following structure (shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity). It consists of:

one Configuration_MarketDocument header
- multiple TimeSeries describing Production Units
  - one MktPSRType describing characteristics of the Production Unit
  - multiple nested MktGeneratingUnit describing Generation Units

Configuration_MarketDocument =
 element mRID {ID_String},
 element type {MessageKind_String},
 element process.processType {ProcessKind_String},
 element sender_MarketParticipant.mRID {PartyID_String},
 element sender_MarketParticipant.marketRole.type {MarketRoleKind_String},
 element receiver_MarketParticipant.mRID {PartyID_String},
 element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String},
 element createdDateTime {ESMP_DateTime},
 element TimeSeries {TimeSeries}*

TimeSeries =
 element mRID {ID_String},
 element businessType {BusinessKind_String},
 element implementation_DateAndOrTime.date {xsd:date},
 element biddingZone_Domain.mRID {AreaID_String}?,
 element registeredResource.mRID {ResourceID_String},
 element registeredResource.name {xsd:string},
 element registeredResource.location.name {xsd:string},
 element ControlArea_Domain {ControlArea_Domain}+,
 element Provider_MarketParticipant {Provider_MarketParticipant}+,
 element MktPSRType {MktPSRType}

MktPSRType =
 element psrType {PsrType_String},
 element production_PowerSystemResources.highVoltageLimit {ESMP_Voltage}?,
 element nominalIP_PowerSystemResources.nominalP {ESMP_ActivePower}?,
 element GeneratingUnit_PowerSystemResources {MktGeneratingUnit}*

MktGeneratingUnit =
 element mRID {ResourceID_String},
 element name {xsd:string},
 element nominalP {ESMP_ActivePower},
 element generatingUnit_PSRType.psrType {PsrType_String},
 element generatingUnit_Location.name {xsd:string}

ESMP_ActivePower-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_ActivePower = ESMP_ActivePower-base, attribute unit {UnitSymbol}

ESMP_Voltage-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_Voltage = ESMP_Voltage-base, attribute unit {UnitSymbol}

2.6.3.1 Production and Generation Unit Mapping

We map the Production and Generation Unit data item to RDF as follows:

Notes:

We omit header data since the item is nearly-static ("configuration" or "master") data, and we retain only the latest version
We assign RDF types tr:ProductionUnit and tr:GenerationUnit to the higer and lower level resources, since we need them for Data Corrections later
We omit units of measure for simplicity, since all resources use the same units: MAW for output (nominalP=installedOutput, actualOutput, availableOutput) and KVA for highVoltageLimit

2.6.4 Combined Mapping

The following diagram shows how the semantic data from the previous 3 sections comes together (EIC file, Codelist, Production and Generation Units).

It uses the example of Bulgaria's NPP Kozloduy power plant and related entities (two generators; Bulgaria, the BG TSO "ESO", the "NPP Kozloduy" responsibleParticipant, etc). We use color coding to show which part of the data comes from which data item.

The diagram is adapted from our proposal. In particular, we added eicType (see Add eicType).

2.7 CSV Files

There's no schema for the CSV files, but field names are pretty clear, and we can match them to MADES UML models.

We also do some field value investigations using the csvtk tool (see csvtk#177 for proposed enhancements); equivalent results can be obtained easily with Python Pandas. For example:

# distribution of ResolutionCode
csvtk -t freq -f ResolutionCode -k 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
ResolutionCode  frequency
PT15M   15144
PT30M   9456
PT60M   87606

# analyze correlation of ActualGenerationOutput and ActualConsumption
cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn
# see below

Investigations are based on 2021_01 files, some obtained on 2022-01-05 and others on 2022-01-19 (therefore incomplete month data).

WARNINGS:

Although file names are .csv, the files are tab-separated (TSV)
The files are UTF8 encoded with BOM (Byte Order Mark), which may cause problems in some tools.
- See in particular issue tarql#94
- The following Octal Dump (od) command shows that the first 3 bytes of a CSV file are the BOM, followed by the first column name and a tab.
```
od -c -N 100 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
0000000 357 273 277   D   a   t   e   T   i   m   e  \t
```

2.7.1 InstalledGenerationCapacityAggregated_14.1.A

857 samples.

Field	Example	RDF	Comment
		tr:dataItem	`<data/generation/InstalledGenerationCapacityAggregated>`
DateTime	2022-01-01 00:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format (" " -> "T")
ResolutionCode	P1Y	tr:duration	always "P1Y"^^xsd:duration
AreaCode	10YIE-1001A00010	tr:biddingZone,tr:controlArea,tr:country	depending on AreaTypeCode (BZN, CTA, CTY)
AreaTypeCode	CTA		Values BZN, CTA, CTY used to map corresponding relations \| \| AreaName \| IE \| \| \| \| MapCode \| CTA IE \| \| \| \| ProductionType \| Geothermal \| tr:assetType \| match totr:name`and`tr:nameAlt`of`` code list
AggregatedInstalledCapacity	17.00	tr:installedOutput
DeletedFlag	0		checked csv for 2021 - always 0
UpdateTime	2021-07-27 20:56:08

Example of values of ProductionType with no match in the code lists. - Hydro Pumped Storage - Hydro Run-of-river and poundage - Hydro Water Reservoir We have created tr:altNames in the corresponding code lists. see codeliests-extra.ttl

2.7.1.1 InstalledGenerationCapacityAggregated_14.1.A Model

See InstalledGenerationCapacityAggregated.ttl

RDF URL and fixed data (where the space in (DateTime) is replaced with T):

<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;

2.7.2 InstalledGenerationCapacityComputed

This is a "synthetic" data item that holds computed totals.

We compute aggregate tr:ProductionUnit capacities (tr:installedOutput) from generation/ProductionAndGenerationUnits in order for rule installedCapacity-Aggregated-vs-Per-Unit to compare it to generation/InstalledGenerationCapacityAggregated (which reports aggregated volumes per area and asset type).

Totaled over all areas in which the Production Unit is reported (controlArea, biddingZone).
The latest reported capacities are totaled
Marked with time "now" and duration of validity "1 hour"

<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;

Model: see InstalledGenerationCapacityComputed.ttl

Example	RDF	Comment
	tr:dataItem	`<data/generation/InstalledGenerationCapacityComputed>`
2022-01-01T00:00:00	tr:date	`now()` as datatype `xsd:dateTime`
PT1H	tr:duration	Validity duration as datatype `xsd:duration`
<eic/10YIE-1001A00010>	tr:biddingZone,tr:controlArea	From the individual units
	tr:assetType	tr:assetType of the individual units
100.0	tr:installedOutput	Computed as a sum from the individual units
130.00	tr:installedOutputHigh	+30% of the value in tr:installedOutput

The computation is done by InstalledGenerationCapacityAggregated.ru

2.7.3 ActualGenerationOutputPerGenerationUnit_16.1.A

112207 samples.

Field	Example	RDF	Comment
DateTime	2022-01-01 11:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT60M	tr:duration	Convert to datatype `xsd:duration`. Values `PT15M PT30M PT60M`
AreaCode	10YGR-HTSO-----Y	tr:controlArea	Must match the `controlArea` of the Generation Unit: ActualGenerationOutputPerGenerationUnit-controlArea-conform
AreaTypeCode	CTA		Always "CTA" (control area)
AreaName	GR CTA		Matches `notation` of AreaCode, plus AreaTypeCode
MapCode	GR		Matches `notation` of AreaCode, checked 4. Some variations: this file vs EIC, eg: "DE(TransnetBW)" vs "DE-TRANSNETBW", "DE(TenneT DE)" vs "DE-TENNET_DE"
GenerationUnitEIC	29WGU-YISPAOOU-5	tr:generationUnit
PowerSystemResourceName	P_AOOU		Matches `notationAlt` of GenerationUnitEIC, checked 3.
ProductionType	Hydro Water Reservoir		Matches `assetType` of GenerationUnitEIC, checked 4.
ActualGenerationOutput	0.00	tr:actualOutput	Convert to datatype `xsd:float`. 51% `0.00`, 4.4% missing (*). Must be <= `installedOutput`: ActualGenerationOutputPerGenerationUnit-actualOutput-LTE-installedOutput
ActualConsumption		tr:actualConsumption	Convert to datatype `xsd:float`. 14.8% `0.00`, 80% missing (that's the normal case) (*)
		tr:netOutput	Compute as difference ActualGeneration-ActualConsumption, treat missing as zero, convert to `xsd:float` (*)
InstalledGenCapacity	210.00	tr:installedOutput	Convert to datatype `xsd:float`. Must match the declared `installedOutput` of the Generation Unit: ActualGenerationOutputPerGenerationUnit-installedOutput-conform
UpdateTime	2022-01-02 10:30:54	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

RDF URL and fixed data (where the space in (DateTime) is replaced with T):

<dataObs/generation/ActualGenerationOutputPerGenerationUnit/(GenerationUnitEIC)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;

(*) ActualConsumption is energy consumed by the generator for technological purposes. We analyze the correlation of ActualGenerationOutput and ActualConsumption:

cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn

cnt	ActualGenerationOutput	ActualConsumption
46183	zero
44082	NUM
10138	zero	zero
5296	NUM	zero
3974		NUM
1267	zero	NUM
1227		zero
39	NUM	NUM

There is a difference between missing and zero:

Missing value means "no data or inapplicable" whereas zero means "the generator did not produce output" respectively "did not consume anything"
- Missing actualConsumption is legitimate since there are generators that don't consume anything
- We choose not to validate that actualOutput is provided in each row
For the purpose of computing netOutput as the difference, we treat "missing" the same as "zero"

It is possible to have ActualConsumption without ActualGeneration (thus negative netOutput), eg:

The 18WMUE4B-12345-D "MUELA 4B" IBERDROLA GENERACION S.A.U. plant (Hydro Pumped Storage) was consuming 209.10 MW on 2022-01-01 at 03:00 while pumping water upward into its reservoir
The 62W373474960449Q "SEVTECCHPP-V" Severodonetsk Combined Heat and Power Plant (Fossil Gas) was consuming 2.54 MW on 2022-01-03 at 17:00 while outputting no electricity

2.7.3.1 ActualGenerationOutputPerGenerationUnit_16.1.A Model

The semantic mapping of this CSV is shown below.

Note: the ActualGenerationOutputPerGenerationUnit conversion should produce only the large node. The figure shows RDF type & EIC code in other nodes just to see the colored circles, but these should not be generated by this conversion.

2.7.4 AggregatedGenerationPerType_16.1.B_C

Field	Sample	RDF	Comment
DateTime	2022-01-01 09:15:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT15M	tr:duration	Convert to datatype `xsd:duration`. Values `PT15M PT30M PT60M`
AreaCode	10YNL----------L	tr:biddingZone tr:controlArea tr:country
AreaTypeCode	CTA		Use this field to determine property for AreaCode
AreaName	NL CTA
MapCode	NL
ProductionType	Solar	tr:assetType	match to `tr:name` and `tr:nameAlt` of `<type/Asset>` code list
ActualGenerationOutput	10.94	tr:actualOutput	Convert to datatype `xsd:float`.
ActualConsumption	0.00	tr:actualConsumption	Convert to datatype `xsd:float`.
UpdateTime	2022-01-29 11:18:30
Net Output		tr:netOutput	Difference between output and consumption. Performed at conversion

2.7.4.1 AggregatedGenerationPerType Model

<dataObs/generation/AggregatedGenerationPerType/(AreaTypeCode)/(AreaCode)/(ProductionType)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/AggregatedGenerationPerType>;

2.7.5 CurrentGenerationForecastForWindAndSolar_14.1.D

Month 2022_02, 150949 records

Field	Example	RDF	Comment
DateTime	2022-02-05 06:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT60M	tr:duration
AreaCode	10YLT-1001A0008Q	tr:biddingZone tr:controlArea tr:country
AreaTypeCode	BZN		Use this field to determine property for AreaCode
AreaName	LT BZN
MapCode	LT
ProductionType	Wind Onshore	tr:assetType	`<type/Asset/>` Match label
AggregatedGenerationForecast	351.99	tr:forecastedOutput
UpdateTime	2022-02-05 09:20:49	tr:dateUpdated

ProductionType	Frequency
Wind Offshore	10752
Wind Onshore	72018
Solar	68178

AreaTypeCode	Frequency
CTY	35468
BZN	53464
CTA	62016

2.7.5.1 CurrentGenerationForecastForWindAndSolar_14.1.D Model

<dataObs/generation/CurrentGenerationForecastForWindAndSolar/(AreaTypeCode)/(AreaCode)/match(ProductionType)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/CurrentGenerationForecastForWindAndSolar>;

2.7.6 AcceptedAggregatedOffers_17.1.D

Month 2022_01, 109263 records.

Field	Example	RDF	Comment
DateTime	2022-01-02 23:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT15M	tr:duration	Convert to datatype `xsd:duration`. Values `PT15M PT30M PT60M`
AreaCode	10YCH-SWISSGRIDZ	tr:marketBalanceArea	In namespace `<eic/>`
AreaTypeCode	MBA		Always "MBA"
AreaName	CH MBA
MapCode	CH
ReserveType	Frequency Containment Reserve (FCR)	tr:reserveType	`<type/Business/>`: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
DeletedFlag	0		Always 0
UpdateTime	2022-01-02 09:45:51	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

This and the other Balancing items (next 3 items) include a number of related (denormalized) Volume/Price fields that we normalize using the following extra fields (dimensions) and their respective code values (in parentheses is the word as it appears in the field name).

tr:direction: <type/Direction/>: A01 "UP", A02 "DOWN", A03 "UP and DOWN" (Symmetric)
tr:volumeCategory: <type/Business/>: A31 "Offered Capacity" (Offered), B95 "Procured capacity" (Accepted), A45 "Schedule activated reserves" (Activated)
tr:assetType: <type/Asset/>: A04 "Generation", A05 "Load", B20 "Other unspecified" (NotSpecified)

Each of the numeric fields are emitted as tr:volume with datatype xsd:float and the following dimension values:

Field	tr:direction	tr:volumeCategory	tr:assetType
LoadUpAcceptedVolume	A01 "UP"	B95 "Accepted"	A05 "Load"
LoadDownAcceptedVolume	A02 "DOWN"	B95 "Accepted"	A05 "Load"
LoadUpOfferedVolume	A01 "UP"	A31 "Offered"	A05 "Load"
LoadDownOfferedVolume	A02 "DOWN"	A31 "Offered"	A05 "Load"
LoadAcceptedVolumeSymmetric	A03 "UP and DOWN"	B95 "Accepted"	A05 "Load"
LoadOfferedVolumeSymmetric	A03 "UP and DOWN"	A31 "Offered"	A05 "Load"
GenerationUpAcceptedVolume	A01 "UP"	B95 "Accepted"	A04 "Generation"
GenerationDownAcceptedVolume	A02 "DOWN"	B95 "Accepted"	A04 "Generation"
GenerationUpOfferedVolume	A01 "UP"	A31 "Offered"	A04 "Generation"
GenerationDownOfferedVolume	A02 "DOWN"	A31 "Offered"	A04 "Generation"
GenerationAcceptedVolumeSymmetric	A03 "UP and DOWN"	B95 "Accepted"	A04 "Generation"
GenerationOfferedVolumeSymmetric	A03 "UP and DOWN"	A31 "Offered"	A04 "Generation"
NotSpecifiedUpAcceptedVolume	A01 "UP"	B95 "Accepted"	B20 "Other unspecified"
NotSpecifiedDownAcceptedVolume	A02 "DOWN"	B95 "Accepted"	B20 "Other unspecified"
NotSpecifiedUpOfferedVolume	A01 "UP"	A31 "Offered"	B20 "Other unspecified"
NotSpecifiedDownOfferedVolume	A02 "DOWN"	A31 "Offered"	B20 "Other unspecified"
NotSpecifiedAcceptedVolumeSymmetric	A03 "UP and DOWN"	B95 "Accepted"	B20 "Other unspecified"
NotSpecifiedOfferedVolumeSymmetric	A03 "UP and DOWN"	A31 "Offered"	B20 "Other unspecified"

2.7.6.1 AcceptedAggregatedOffers_17.1.D Model

The semantic mapping of this CSV is shown below.

Since this and the next two items talk about the same thing (balancing Volumes), add a new "synthetic" Data Item <data/balancing/AggregatedVolumes>.
- Thus we unify the data for all 3 items in a unified namespace.
Add tr:unit "MW" to this data item

RDF URL and fixed data:

<dataObs/balancing/AggregatedVolumes/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(volumeCategory)/(assetType)>
  a tr:DataObservation;
  tr:dataItem <data/balancing/AggregatedVolumes>;

IMPORTANT: If a volume field is empty, emit no triples about it (no URL should be formed for its DataObservation)
We use the dimension values in the URL (ANY for the missing/sum/total)
The space in (DateTime) is replaced with T
"match()" indicates that the field value (string) should be matched to the respective code list value. See etl_scripts/tarql/match.h.rq for such matching implemented with a VALUES clause.

See data/model/AcceptedAggregatedOffers.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.7 ActivatedBalancingEnergy_17.1.E

Month 2022_01, 106828 samples. This table has the same common fields, which are mapped in exactly the same way as the previous section (AcceptedAggregatedOffers_17.1.D):

Field	Example	RDF	Comment
DateTime	2022-01-01 00:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT60M	tr:duration	Convert to datatype `xsd:duration`. Values `PT15M PT30M PT60M`
AreaCode	10YCS-CG-TSO---S	tr:marketBalanceArea	In namespace `<eic/>`
AreaTypeCode	MBA		Always "MBA"
AreaName	ME MBA
MapCode	ME
ReserveType	Automatic Frequency Restoration Reserve (aFRR)	tr:reserveType	`<type/Business/>`: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
UpdateTime	2021-12-30 14:31:00	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

Instead of Offered/Accepted, it has Activated amounts. They are mapped in exactly the same way:

2.7.8 YearAheadTotalLoadForecast_6.1.E

The RDF mapping is exactly the same as in the previous section. We use the same kind of URLs, and the same data item.

2.7.8.1 ActivatedBalancingEnergy_17.1.E Model

See data/model/ActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.9 AggregatedBalancingEnergyBids_12.3.E

Month 2022_01, 294943 samples.

This is very similar to the previous two sections, except:

It is for area type SCA rather than MBA
Has an extra volumeCategory "Unavailable"
Direction is a separate field, rather than being encoded in the Volume field names
There's marketProduct but no assetType

Field	Example	RDF	Comment
DateTime	2022-01-02 12:45:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT15M	tr:duration	Convert to datatype `xsd:duration`. Values `PT15M PT30M PT60M`
AreaCode	10Y1001A1001A71M	tr:schedulingArea	In namespace `<eic/>`
AreaTypeCode	SCA		always "SCA"
AreaName	IT-Centre-South SCA
MapCode	IT-CSOUTH
ReserveType	Replacement reserve (RR)	tr:reserveType	`<type/Business/>`: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR (*)
TypeOfProduct	Standard	tr:marketProduct	`<type/MarketProduct/>`: match A01 Standard, A02 Specific, A04 Local
Direction	Up	tr:direction	`<type/Direction/>`: A01 "UP", A02 "DOWN"
UpdateTime	2022-01-02 12:31:10	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

(*) WARNING: the values in this data item are spelled in Lowercase (all other tables are in Capital Case):

csvtk -t freq -f ReserveType2022_01_AggregatedBalancingEnergyBids_12.3.E.csvv
Replacement reserve (RR)                       126822
Manual frequency restoration reserve (mFRR)     75114
Automatic frequency restoration reserve (aFRR)  93006

So we use the macro match_reserveType_lcase() for this item, and match_reserveType() for all others.

Map the following fields to tr:volume with datatype xsd:float, and the following dimension values:

Field	tr:volumeCategory
OfferedBidVolume	A31 (Offered)
ActivatedBidVolume	A45 (Activated)
UnavailableBidVolume	Z99 (Unavailable)

2.7.9.1 AggregatedBalancingEnergyBids_12.3.E Model

We use the same RDF model as before. Again, we use the same URLs and data item.

See data/model/AggregatedBalancingEnergyBids.ttl.

The diagram is not very elucidating since all these records are correlated by their values, not by links.
IMPORTANT: if a Volume field is missing, do not emit any triples about it

2.7.10 PricesOfActivatedBalancingEnergy_17.1.F

Month 2022_01, 158455 samples.

Field	Example	RDF	Comment
DateTime	2022-01-14 02:00:00.000	tr:date	Convert to datatype `xsd:dateTime` and valid format
ResolutionCode	PT30M	tr:duration	Convert to datatype `xsd:duration`
AreaCode	10YFR-RTE------C	tr:schedulingArea or tr:marketBalanceArea	Depending on AreaTypeCode
AreaTypeCode	SCA		SCA or MBA. Use to select the specific relation
AreaName	FR SCA
MapCode	FR
RegisterItemTypeName	Automatic Frequency Restoration Reserve (aFRR)	tr:reserveType	`<type/Business/>`: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
TypeOfProduct	A01	tr:marketProduct	`<type/MarketProduct/>`: straight A01 Standard, A02 Specific, A04 Local
PriceType	AVERAGE	tr:priceCategory	`<type/PriceCategory/>`: match A06 "Average bid price" (AVERAGE), A07 "Single marginal bid price" (MARGINAL)
Currency	EUR	tr:currency	Values: EUR (10x more popular than all the rest), BAM, CZK, HUF, PLN, RON, UAH
UpdateTime	2022-01-14 03:46:00	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

Emit all these fields as tr:price with datatype xsd:float and the following dimension values:

Field	tr:direction	tr:assetType
LoadUpPrice	A01 "UP"	A05 "Load"
LoadDownPrice	A02 "DOWN"	A05 "Load"
GenerationUpPrice	A01 "UP"	A04 "Generation"
GenerationDownPrice	A02 "DOWN"	A04 "Generation"
NotSpecifiedUpPrice	A01 "UP"	B20 "Other unspecified"
NotSpecifiedDownPrice	A02 "DOWN"	B20 "Other unspecified"

We determine the minimal set of independent fields with experiments like this:

# UNIQUE:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d

# Remove AreaTypeCode: DUPS:
csvtk cut -t -f DateTime,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-22 18:30:00.000.*10YFR-RTE------C.*Replacement Reserve (RR)" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-22 18:30:00.000 PT15M  10YFR-RTE------C  SCA  FR    SCA  FR     Replacement Reserve (RR)                        247.00  247.00  A01  AVERAGE  EUR  2022-01-22 18:31:13
2022-01-22 18:30:00.000 PT30M  10YFR-RTE------C  MBA  FR    MBA  FR     Replacement Reserve (RR)                        245.27  245.27       AVERAGE  EUR  2022-01-22 20:31:11
# The same area "FR" is reported as SCA and as MBA

# Remove RegisterItemTypeName: DUPS:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-05 00:15:00.000.*10Y1001A1001A82H.*MBA" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-05 00:15:00.000 PT15M  10Y1001A1001A82H  MBA  DE-LU MBA  DE_LU  Manual Frequency Restoration Reserve (mFRR)     0.00    0.00         AVERAGE  EUR  2022-01-04 00:30:55
2022-01-05 00:15:00.000 PT15M  10Y1001A1001A82H  MBA  DE-LU MBA  DE_LU  Automatic Frequency Restoration Reserve (aFRR)  224.91  47.71        AVERAGE  EUR  2022-01-05 02:00:56

The minimal set is AreaTypeCode,AreaCode,DateTime,RegisterItemTypeName to which we must add the dimensions direction,assetType

We add a computed field tr:priceInEUR, based on the current conversion rate of Currency to EUR

2.7.10.1 PricesOfActivatedBalancingEnergy_17.1.F Model

RDF URL and fixed data:

<dataObs/balancing/PricesOfActivatedBalancingEnergy/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(assetType)>
  a tr:DataObservation;
  tr:dataItem <data/balancing/PricesOfActivatedBalancingEnergy>;

See data/model/PricesOfActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.11 UnavailabilityOfGenerationUnits_15.1.A_B

4366 samples.

Each unavailability is identified by MRID.
There can be multiple versions (tr:version) of each unavailability. We've shown several examples to illustrate these versions.
- In this particular case, only the Status is changed
- We retain only the latest version

Field	Example1	Example2	RDF	Comment
StartTS	2022-01-28 19:00:00.000	2022-01-28 19:00:00.000		Ignored (*)
EndTS	2022-01-31 07:00:00.000	2022-01-31 07:00:00.000		Ignored (*)
TimeZone	WET	WET	tr:timeZone	String: "WET, CET, EET"
MRID	zzGVOR7oEd5SOJnhsAiapw	zzGVOR7oEd5SOJnhsAiapw	tr:ident	Also use in URL. Separate field to allow matching subsidiary table
Type	Planned	Planned	tr:typeText	String: "Planned, Forced"
Status	Active	Cancelled	tr:statusText	String: "Active, Withdrawn, Canceled"
AreaCode	10YGB----------A	10YGB----------A	tr:controlArea or tr:biddingZone	Depending on AreaTypeCode. Must match declared zone/area of the energy resource: Outage-GenerationUnit-area-conform
AreaTypeCode	CTA	CTA		"CTA, BZN" (**). Reflected in the selection of the previous link
AreaName	UK(National Grid) CTA	UK(National Grid) CTA		Matches `name` of AreaCode
MapCode	GB	GB		Matches `notation` of AreaCode
PowerResourceEIC	48W000000DIDCB5C	48W000000DIDCB5C	tr:energyResource	Must exist in Production and Generation Units: Outage-ProductionUnit-exists
UnitName	DIDCB5	DIDCB5		Matches `notation` of PowerResourceEIC
ProductionType	Fossil Gas	Fossil Gas		Matches `assetType` of PowerResourceEIC
InstalledCapacity	780.00	780.00	tr:installedOutput	Convert to datatype `xsd:float`. Must match the declared `installedCapacity` of the resource: Outage-GenerationUnit-installedCapacity-conform
AvailableCapacity	370.00	370.00	tr:availableOutput	Convert to datatype `xsd:float`. Must be less than `installedCapacity`: Outage-GenerationUnit-LT-installedCapacity
Version	1	2	tr:version	Retain only the latest version. See next section
Reason	Foreseen Maintenance	Foreseen Maintenance		Ignored (*)
UpdateTime	2018-10-02 14:29:59	2018-10-02 17:26:11

(*) Use value from the Reasons subsidiary table, see UnavailabilityOfGenerationUnitsReasons_15.1.A_B
(**) See Unavailability Redundant Records

2.7.11.1 UnavailabilityOfGenerationUnitsReasons_15.1.A_B

4996 samples.

Field	Example1	Example2	RDF	Comment
StartTS	2022-01-28 19:00:00.000	2022-01-28 19:00:00.000	tr:dateStart	Convert to datatype `xsd:dateTime` and valid format
EndTS	2022-01-31 07:00:00.000	2022-01-31 07:00:00.000	tr:dateEnd	Convert to datatype `xsd:dateTime` and valid format
MRID	zzGVOR7oEd5SOJnhsAiapw	zzGVOR7oEd5SOJnhsAiapw	tr:mrid	Use in URL.
version	2	2	tr:version	Convert to datatype `xsd:integer`. Separate field to allow picking latest version: retain only the latest version
ReasonCode	A95	B19	tr:reason	URL in `<type/ReasonCode/>`
Reason	Complementary Information	Foreseen Maintenance	tr:reasonText	Matches the `name` of codelist value "ReasonCode". Skip "Complementary Information"
ReasonText	Outage		tr:reasonText	Could include long, even bilingual text, not very well formatted
UpdateTime	2018-10-02 17:26:11	2018-10-02 17:26:11	tr:dateUpdated	Convert to datatype `xsd:dateTime` and valid format

2.7.11.2 Unavailability Model

We use the same "synthetic" data item UnavailabilityOfProductionOrGenerationUnits for both this, and UnavailabilityOfProductionUnits (see next).

This is possible since the link tr:energyResource is the same in both cases, and that resource should know whether it's a Production or Generation Unit (which is a non-trivial question, given the confusion between the two)
It is useful since the app only needs to consult one item when displaying Outages on a map of Production and Generation Units
It also obviates the need to duplicate Validation Rules for the two data items

RDF URL and fixed data:

<outage/UnavailabilityOfProductionOrGenerationUnits/(MRID)/(Version)>
  a tr:Outage;
  tr:dataItem <data/outages/UnavailabilityOfProductionOrGenerationUnits>

The RDF model is shown below, but please read subsequent sections regarding intricacies of the conversion process.

data/model/Unavailability.ttl:

2.7.11.3 Unavailability Redundant Records

Each unavailability is reported twice: for the controlArea ("CTA") and the biddingZone ("BZN") of the generator. An example with a generator in Bulgaria's Maritsa Iztok 2 TPP:

Field	Example1	Example2
StartTS	2022-01-03 16:57:00.000	2022-01-03 16:57:00.000
EndTS	2022-01-03 18:30:00.000	2022-01-03 18:30:00.000
TimeZone	CET	CET
MRID	7jf8VaSweKQI27w73v8p8w	dcadb3Ls6XlBSYhhQxvItQ
Status	Active	Active
Type	Forced	Forced
AreaCode	10YCA-BULGARIA-R	10YCA-BULGARIA-R
AreaTypeCode	CTA	BZN
AreaName	BG CTA	BG BZN
MapCode	BG	BG
PowerResourceEIC	32W001100100045G	32W001100100045G
UnitName	TPP_MI2_G5	TPP_MI2_G5
ProductionType	Fossil Brown coal/Lignite	Fossil Brown coal/Lignite
InstalledCapacity	230.00	230.00
AvailableCapacity	0.00	0.00
Version	1	1
Reason	Failure	Failure
UpdateTime	2022-01-04 09:15:58	2022-01-04 09:15:58

As you can see the two unavailabilities are precisely the same; except MRID, AreaCode, AreaTypeCode (and MapCode, AreaName derived from them) So each unavailability is reported twice:

With different MRID but same Version, UpdateTime
Even when the two areas are co-exensive, eg the above is reported against two different roles of 10YCA-BULGARIA-R Bulgaria: as "CTA" and as "BZN"

Optionally, merge the records (so we'll have one record with two outgoing links: both controlArea and biddingZone):

Identify the two records by equality of the data fields: StartTS, EndTS, TimeZone, Status, Type, PowerResourceEIC, InstalledCapacity, AvailableCapacity, Version, Reason, UpdateTime
Discard one of the MRIDs (eg the one with AreaTypeCode="CTA") and all its data except AreaCode, AreaTypeCode
Record its controlArea or biddingZone link (computed from AreaCode, AreaTypeCode) against the URL of the other record

This is non-trivial but will help with displaying Outage data.

2.7.11.4 UnavailabilityReasons Subsidiary Table

This table should be "joined" to the main table by "MRID" (which can be accomplished by using consistent URLs when RDFizing). Examining data for 2022_01 (taken on 2022-01-05):

There are "MRID" in the subsidiary table that don't match "MRID" in the main table.
- Such subsidiary records are useless since they don't mention the "PowerResourceEIC"
- It is possible the problem is due to data being split across months, so we should get outage data for several years to increase the time window
- Examples:
  - Outage 0pXGWG97HoHWd2NzlbSmmw (2 versions) is missing in the main table
  - Outage 5TmlidNqpxU_LYlWfJ5bMg (9 versions) is missing in the main table
Some outages have more versions in the subsidiary table than the main table:
- The main table never has more versions than the subsidiary table
- We checked a few records, and the times reported in the initial matching versions, also match.
- Example: outage 1F67oMiU54aDdqPoUMdJGg has only 1 version in the main table, but 4 in the subsidiary table.
- As you can see, subsidiary records iteratively refine the "StartTS, "EndTS" fields.
- In this case the "Reason" fields remain the same, but in other cases they could change

Field	main	subsidiary1	subsidiary2	subsidiary3	subsidiary4
StartTS	2022-01-05 00:00:00.000	2022-01-05 00:00:00.000	2022-01-05 07:00:00.000	2022-01-05 07:00:00.000	2022-01-05 06:00:00.000
EndTS	2022-01-06 00:00:00.000	2022-01-06 00:00:00.000	2022-01-05 09:00:00.000	2022-01-05 09:00:00.000	2022-01-05 07:00:00.000
TimeZone	CET
MRID	1F67oMiU54aDdqPoUMdJGg	1F67oMiU54aDdqPoUMdJGg	1F67oMiU54aDdqPoUMdJGg	1F67oMiU54aDdqPoUMdJGg	1F67oMiU54aDdqPoUMdJGg
Type	Active
Status	Forced
AreaCode	10YCZ-CEPS-----N
AreaTypeCode	CTA
AreaName	CZ CTA
MapCode	CZ
PowerResourceEIC	27W-GU-EPVR-B1-L
UnitName	EPVR.B1
ProductionType	Fossil Gas
InstalledCapacity	200.00
AvailableCapacity	0.00
Version	1	1	2	3	4
ReasonCode		B18	B18	B18	B18
Reason	Failure	Failure	Failure	Failure	Failure
ReasonText
UpdateTime	2022-01-05 07:00:48	2022-01-05 07:00:48	2022-01-05 08:00:57	2022-01-05 08:00:59	2022-01-05 08:00:59

The subsidiary table carries more detailed "Reason" info than the main table:
- May carry several reasons, as shown in UnavailabilityOfGenerationUnitsReasons_15.1.A_B
- Has ReasonCode that's a value within codelist <type/ReasonCode/>
- Has ReasonText that can be a long free text

2.7.11.5 Retaining the Latest Unavailability Version

For each MRID of the main (UnavailabilityOfGenerationUnits_15.1.A_B) and subsidiary (UnavailabilityOfGenerationUnitsReasons_15.1.A_B) tables, we want to retain only the latest Version.

Both of these are used in the URL and also represented as separate fields.
Version (and UpdateTime) is correlated between the tables

That's non-trivial since:

Normal RDFization would accumulate all attributes against that URL but we need to remove values from the older version
Data fields are split between the two tables
The records in both tables need to be sorted (sorting by UpdateTime or by Version produces the same result)
The same field is spelled differently between the two tables: Version in main, version in subsidiary

2.7.12 UnavailabilityOfProductionUnits_15.1.C_D

This data item is mapped in exactly the same way as UnavailabilityOfGenerationUnits_15.1.A_B, and using the same synthetic data item URLs. The same special processing applies.

Field	Example	RDF	Comment
StartTS	2022-01-01 00:00:00.000		Use value from the Reasons subsidiary table
EndTS	2023-01-01 00:00:00.000		Use value from the Reasons subsidiary table
TimeZone	CET	tr:timeZone
MRID	ROgezRGFNz5CJzUSUkx2-Q	tr:ident	Also use in URL
Status	Active	tr:typeText
Type	Planned	tr:statusText
AreaCode	10YHU-MAVIR----U	tr:controlArea or tr:biddingZone	Depending on AreaTypeCode
AreaTypeCode	BZN		Reflected in the previous link
AreaName	HU BZN		Matches `name` of AreaCode
MapCode	HU		Matches `notation` of AreaCode
PowerResourceEIC	15WVERTES----PPX	tr:energyResource
UnitName	Oroszlányi Eromu		Matches `notation` or `notationAlt` of PowerResourceEIC
ProductionType	Fossil Brown coal/Lignite		Matches `assetType` of PowerResourceEIC
Version	1	tr:version	Retain only the latest version
VoltageConnectionLevel	120.00		Matches `highVoltageLimit` of PowerResourceEIC
InstalledCapacity	220.00	tr:installedOutput
AvailableCapacity	0.00	tr:availableOutput
Reason	Shutdown		Use value from the Reasons subsidiary table
UpdateTime	2021-12-14 10:01:32		Use value from the Reasons subsidiary table

UnavailabilityOfProductionUnitsReasons_15.1.C_D fields:

Field	Example	RDF	Comment
StartTS	2022-01-01 00:00:00.000	tr:dateStart	Convert to `xsd:dateTime` and correct format
EndTS	2023-01-01 00:00:00.000	tr:dateEnd	Convert to `xsd:dateTime` and correct format
MRID	BGaTG2bh6VYl7K4w2RyHmw	tr:mrid	Use in URL
version	1	tr:version
ReasonCode	B20	tr:reason	URL in `<type/ReasonCode/>`
Reason	Shutdown	tr:reasonText	Skip "Complementary Information"
ReasonText		tr:reasonText
UpdateTime	2021-12-14 10:01:36	tr:dateUpdated

3 Data Validation

Validating Transparency data is the most important objective of the project. We'll elaborate up to 40 data validation and quality criteria over various data items.

Based on them we will provide:

A DQA Dashboard (Data Quality Assessment) to display the count of data issues per rule and area, drill down to individual issues, (optionally) show trends over time
Data quality recommendations that may be used to recommend regulatory changes.

Improving data quality will have positive long-term effects on the energy market. Furthermore, by having more accurate master data, it will provide a foundation for a better Energy KG in the future.

3.1 Describing Validation Rules

We describe validation rules in a strict way, allowing us to then extract them from this document and serve as the basis for implementation. Rules are expressed in a semantic way using the SHACL ontology (W3C standard), which allows us to use a number of existing validators. Each rule is represented as sh:NodeShape and has the following fields:

Rule URL: from the heading name in this document
Named graph: each rule is emitted in its own graph to be passed to the SHACL engine one by one (using sh:shapeGraph)
Name (sh:name): derived from the rule URL by discarding dashes (eg "parentResource semiInverse generationUnit")
Order (sh:order): order of rule execution
- IMPORTANT: in the UI, sort groups and rules alphabetically not by sh:order
Applies to (tr:appliesTo): kind of area the rule applies to, used for grouping (see next section)
Rule Group (sh:group): used as second level of grouping (for categorization and better UI)
Description (sh:description): detailed description in the form of a "should" statement
Message (sh:message): a template with SPARQL variables, in case additional details should be provided in the validation results
Data Items (tr:dataItem): data item(s) being validated (converted to several URLs as per kb.ttl)
Fields (tr:fields): CSV or XML field(s) being validated (using XPath notation for XML) (a single string)
Severity (sh:severity):
- Violation: hard constraints, eg PowerUnit and its GenerationUnits should be in the same country
- Warning: soft constraints, eg actuals should not deviate from forecasts more than 15%
Implementation as SHACL triples, possibly including "owned" sh:PropertyShape nodes and blank nodes
Correction (tr:sparqlUpdate): the next subsubsection after the rule: which Data Correction to apply

Taking the rule parentResource-semiInverse-generationUnit as example, here's an RDF model of representing rules. This also shows the implementation (sh:property triples and blank nodes).

See data/model/ValidationRule.ttl, though this is emitted as .trig in graph <graph/shape/parentResource-semiInverse-generationUnit>

3.2 Rule Applicability

ENTSOE data is "indexed" by Area and/or Country Code (see sections Areas and Countries for details about these entities).

We'd like each validation result to point to the Area or Country related to it, in order to have a better summary of errors per Area/Country. Examples:

Production Units are reported in duplicate, against both controlArea and biddingZone
Each unavailability (outage) is reported in duplicate, against both controlArea and biddingZone
The EIC File specifies only countryCode for each resource. In particular, when validating Trader VAT numbers, we can only link to country code.

In order to deal with the variety of areas/countries and with missing values, validation results will have a field tr:displayArea (always populated).

Each rule specifies tr:appliesTo, which is tr:biddingZone, tr:controlArea, tr:country, tr:countryCode (there can be multiple values).

If a resource is reported in two areas in duplicate, we use only one of them to avoid reporting the same error twice.
But installedCapacity-Aggregated-vs-Per-Unit is checked in both tr:biddingZone, tr:controlArea

3.2.1 AppliesTo CountryCode

Counts:

67 country codes are used in EIC data. This applies mostly to VAT checking rules (see below).
The file countries.csv has 42 countries with power resources in ENTSOE (plus "SEM" Ireland and Northern Ireland, which is not really a country)
We consider the 35 country codes without power resources to be a "long tail".

There are many Trader countries outside of the ENTSOE jurisdiction.

We populate tr:displayArea of validation results as follows, dealing with both missing country codes and the "long tail":

Get Node.countryCode where Node is sh:focusNode (the node that caused the error)
If countryCode is missing: "none"
If countryCode is not found in countries.csv: "other"
Otherwise: countryCode

3.2.2 AppliesTo Area

The Areas that data is related to are controlArea, biddingZone, country (others listed below are not yet being validated):

Production Units are reported in duplicate: one tr:controlArea and one tr:biddingZone (though the data model permits multiple bidding zones). We have checked this with the following query:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?CTA ?BZN (count(*) as ?c) {
  {select (count(?cta) as ?CTA) (count(?bzn) as ?BZN) {
    ?x a tr:ProductionUnit
    optional {?x tr:controlArea ?cta}
    optional {?x tr:biddingZone ?bzn}
  } group by ?x}
} group by ?CTA ?BZN

Generation Units don't have direct links to areas, but the areas can be reached through their parent (^tr:generationUnit)
Unavailabilities (Outages) are reported in duplicate, against both tr:controlArea and tr:biddingZone
AcceptedAggregatedOffers, ActivatedBalancingEnergy are reported in tr:marketBalanceArea
AggregatedBalancingEnergyBids are reported in tr:schedulingArea
ActualGenerationOutputPerGenerationUnit are reported in tr:controlArea
CurrentGenerationForecastForWindAndSolar are reported in tr:controlArea, tr:biddingZone and tr:country
InstalledGenerationCapacityAggregated is reported in tr:biddingZone, tr:controlArea, tr:country.
- Its respective InstalledGenerationCapacityComputed in tr:biddingZone, tr:controlArea.
- Although the base Production Units capacity numbers are in duplicate, summing them across tr:biddingZone, tr:controlArea does not produce duplicate numbers
- Therefore the rule installedCapacity-Aggregated-vs-Per-Unit is checked in both tr:biddingZone, tr:controlArea

Counts:

Total of 55 "Control Area", of which 39 are used in Production and Generation Unit data
Total of 101 "Bidding Zone", of which 50 are used in Production and Generation Unit data

Populating tr:displayArea of tr:ValidationResult:

Get tr:sourceShape/tr:appliesTo as ?areaProp. There can be multiple values
Get sh:focusNode as ?node (the node that caused the error)
If ?node is tr:GenerationUnit, get its parent Production Unit (^tr:generationUnit) because Generation Units are not directly attached to areas
Use ?areaProp/tr:notation of ?node
Otherwise, use "none"
Save ?areaProp as ValidationCount.appliesTo

3.3 Summary Validation Results

The Summary Results are counts of validation results that enable

Grouping per applicability, group and rule
Breakdown and Totals per rule and area
Indication of severity (Violation vs Warning)
(CANCELED): Calculating prevalence (percentage of errors compared to total records)
Drill-down to individual validation results

Summaries are represented as tr:ValidationCount and have the following fields (another option would be to use the Data Quality Vocabulary (DQV)):

Rule (sh:sourceShape): validation rule (resource, from which the full rule description can be obtained, including Definition and Severity)
Area (tr:displayArea): country/zone/area (string). See section Rule Applicability
Count (tr:count): count of errors/warnings (integer)
Date (tr:date): when the counting was done (full xsd:dateTime). Please note that we retain only one set of validation results

An RDF model of summary results is in data/model/ValidationCount.ttl and the following diagram:

3.3.1 Summary Validation Results Mockup

Rules per Country Code	BG	DE	..	RS	other	none	Total
EIC
.. function not null (i)	5	3				2	10
.. function spelling (i)	3	3		1			7
.. function specific
.. function compatible with EIC hard
.. function compatible with EIC soft
VAT
.. VAT country prefix					5	4	9
.. VAT per country syntax					8		8
.. VAT country exists					10		10
.. VAT country conform
.. VAT per country exists
TOTAL	8	6	..	1	23	6	44

Rules per Control Area	BG	CA-DENMARK	DE-50HERTZ	DE-AMPRION-SCHED	..	UA-IPS	none	Total
ProdUnits
.. ProductionUnit cannot be GenerationUnit		4						4
.. parentResource semiInverse generatingUnit		2						2
.. ProductionUnits and GenerationUnits in EIC			5				100	5
.. EIC ProductionUnits GenerationUnits single
.. EIC ProductionUnits GenerationUnits assetType	5		3					8
.. EIC ProductionUnits nominalP highVoltageLimit
.. EIC GenerationUnits nominalP
.. ProductionUnit highVoltageLimit not zero	3							3
.. ProductionUnit nominalP not zero
.. only ProductionUnit or GenerationUnit				12				12
.. no GenerationUnit at top level
.. ProductionUnit and GenerationUnit same responsibleParticipant
.. ProductionUnit and GenerationUnit same country
.. ProductionUnit Zone or Area same country
.. generatingUnit function ProductionUnit
.. generatingUnit function GenerationUnit
.. location informative						23		23
.. ProductionUnit GenerationUnit capacity
Transactions
.. installedCapacity Aggregated vs Per Unit
.. actualOutput vs nominalP	10
TOTAL	18	6	8	121	..	23	100	157

Notes:

Rules are shown grouped by Applies To, then by Group
- Applies To and Groups are sorted alphabetically
Rules are sorted alphabetically by name
(i) indicates an icon: red for Violation, orange for Warning
- On hover over the name or icon, show the rule Description
- On click over the name or icon, jump to the respective section in this document (open in another window)
Table columns are tr:displayArea sorted alphabetically, but "other" and "none" come last
Cells show the count with a hyperlink
Clicking on a count displays the individual validation results for that rule and country/area (see next)
Totals are computed for each row and column

3.4 Individual Validation Results

Individual results (exceptions) are represented as sh:ValidationResult and include the following fields.

We'll use this example: consider the rightmost parentResource relation in this diagram, which is wrong (should be inverse of generationUnit):

Rule (sh:sourceShape): rule that was violated
Node (sh:focusNode): node that caused the violation (eg EIC of "NPP_KOZLODUY_G10")
Value (sh:value): erroneous value (eg EIC of "TPP_MI_2", the object of parentResource)
(CANCELED) Expected: expected value, if any (eg EIC of "NPP_KOZLODUY", the subject of generationUnit)
Display Area (tr:displayArea): country/zone/area where the violation occurred (eg "BG"). Computed according to section Rule Applicability, can be "none" or "other"
Country (tr:countryCode): only for rules that apply to Country, provides extra detail if displayArea is "other"
Severity (sh:resultSeverity): severity level of the violation: Violation or Warning (copied from the respective rule)
Message (sh:resultMessage): additional details, use only if the source shape has sh:message because the standard messages generated by the SHACL engine are most often not useful.
- Use this check: if <result>/sh:sourceShape/sh:sparql?/sh:message then use <result>/sh:resultMessage

An RDF model of individual results is in data/model/ValidationResult.ttl and the following diagram:

For Node, Value (and CANCELED: Expected) we print:

If resource:
- If tr:EnergyResource: eic, and also notation, name to ease comprehension
- Otherwise: the last 2 components of the URL, eg 32W001100100017L/2022-01-01T11:00:00.000
  - Also fetch notation, name of the linked tr:EnergyResource
- Include a link to GDB Workbench so the user can examine the data: https://transparency.ontotext.com/graphdb/resource?uri=<node>
If literal (string or number): just the literal

3.4.1 Individual Validation Results Mockup for EIC VAT

Rule: EIC-VAT: VAT country conform [back]

Description: The first two chars of VAT must equal the country code (except "GR" which is spelled "EL" in VAT codes)
Data Items: EIC file (allocated-eic-codes XML)
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
Area: other
Count: 2 Violations (as of 2022-02-27T10:23:34)

Resource	Notation	Name	Value	Area
59XREALPETROL11F	REALPETROL	REAL PETROL HOLDING KFT	HU24189514	IT
22X20110811----W	BE_INEOS_CV_LVM	INEOS CHLORVINYLS LIMITED	GB768506886	BE

<< < 1 of 5 > >>

Notes: given a tr:ValidationCount, shows all individual results with that Rule (sh:sourceShape) and tr:displayArea. Header:

Show rule Group
Show rule Name in bold, and a sa hyperlink to the respective section in this document (open in another window)
Show link [back] to return to the summary results
Show Description
Show each Data Item, with hyperlinks (see above, and the next subsection)
Show the string Fields
Show displayArea
Show count
Show severity in bold and colored icon: red (Violations), orange (Warnings)
Show date

Table:

First column: "Resource" (not "EIC" as was shown before)
- Show the "suffix" of the URL according to the following logic: examine the word after https://transparency.ontotext.com/resource/ :
- if eic: skip no more "words" (eg https://transparency.ontotext.com/resource/eic/22W20200608A---8 -> 22W20200608A---8)
- if outage: skip 1 more "word" (eg https://transparency.ontotext.com/resource/outage/UnavailabilityOfProductionOrGenerationUnits/KJUiHodFyfNlQTV9Ut5DJQ/57 -> KJUiHodFyfNlQTV9Ut5DJQ/57)
- if dataObs: skip 2 more "words" (eg https://transparency.ontotext.com/resource/dataObs/generation/ActualGenerationOutputPerGenerationUnit/36W-TE-TUZLA4--0/2022-01-07T19:00:00.000 -> 36W-TE-TUZLA4--0/2022-01-07T19:00:00.000)
- Make that a hyperlink showing the RDF triples of the resource. (These links are live in the mockup above)
Column Notation: sh:focusNode/tr:notation (if any)
Column Name: sh:focusNode/tr:name (if any)
Column Value: sh:value. If it's a URL, display only the "suffix" and make it a hyperlink, same as the first column
Column Area: tr:countryCode (NOT tr:displayArea, which is displayed before the table)
- Note: there's a minor discrepancy in the mockup: "IT", "BE" don't fall under "Area: other" because are not "long tail" countries. For "Area: other", you'd see here countries like "PR", "AE"
Display 50 rows at a time, with a pagination control at the bottom
OPTIONAL: Should be able to sort by each field: do it server side, and reset to page 1 on re-sort

The columns depend on the kind of item being validated (EIC, ProductionAndGenerationUnits, Data Observations, Outages).

The next subsection shows another example.

3.4.2 Individual Validation Results Mockup for Actual Generation Output Per Generation Unit

Rule: Arithmetics: ActualGenerationOutputPerGenerationUnit actualOutput LTE installedOutput [back]

Description: ActualGenerationOutput of each Generation Unit should not be greater than InstalledGenCapacity for each date
Data Items: Actual Generation Output per Generation Unit (ActualGenerationOutputPerGenerationUnit_16.1.A CSV), portal, description
Fields: ActualGenerationOutput, InstalledGenCapacity
Area: BG
Count: 2 Violations (as of 2022-02-27T10:23:34)

Resource	Value	Area	Message
32W001100100017L/2022-01-01T11:00:00.000	1001		Should be less than 1000
32W001100100048A/2022-01-01T09:00:00.000	231		Should be less than 230

<< < 1 of 5 > >>

The same notes apply as in the previous section, except data columns:

The node to blame (sh:focusNode) corresponds to a dataObs and the hyperlink shows only the "suffix" (last 2 URL components): EIC and dateTime
Notation and Name come from the energy resource linked to the node (sh:focusNode/tr:energyResource)
Value is the wrong value (sh:value): to illustrate, we've shown Actual Output that exceeds Installed Capacity by 1 MW
Area is the displayArea.
- Since this rule appliesTo controlArea, it comes from the controlArea linked to the node (sh:focusNode/tr:controlArea/tr:notation)
Message: sh:resultMessage, but used only if the shape has sh:message

3.5 Data Correction

We use some inference (SPARQL updates) to:

Improve the structure of data by making implicit info explicit (eg eicCode)
Make validation easier by making explicit fields, and deriving extra fields
Correct some key data so that:
- Subsequent validation rules don't report "false positives", i.e. errors that have the same root cause as already reported errors
- All subsequent rules are triggered, so we don't miss exceptions ("false negatives")

Further subsections define and implement data corrections as SPARQL Updates, and the sequence and interleaving of validation rules and corrections.

Each correction is described in a subsection after the respective rule, and attached as tr:sparqlUpdate to it. We do not use SHACL Rules (part of SHACL Advanced Fetaures) because these are limited to only invalid nodes.
All validations and corrections are run after initial data loading, and after each data update
ValidationResults capture the original wrong value in sh:value, and corrections don't overwrite this captured value, so it can be reported in the DQA Dashboard.

3.6 Validation Rules

This section describes precisely all validation rules implemented by TEKG. The semantic definition and SHACL implementation of each rule is extracted from this section.

3.6.1 function-not-null

Rule Group: EIC-function
Description: Each EIC resource should have a non-null "function" ("Valid EIC Function needed" is effectively null)
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
Severity: Violation
Applies to: countryCode

sh:targetClass tr:EnergyResource;
sh:property [
  sh:path tr:function;
  sh:minCount 1;
  sh:not [sh:hasValue "Valid EIC Function needed"]].

SPARQL check:

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * where {
    ?s a tr:EnergyResource .
    {
        FILTER NOT EXISTS {
            ?s tr:function []
        }
    } UNION {
        ?s tr:function "Valid EIC Function needed"
    }
}

3.6.2 function-spelling

Rule Group: EIC-function
Description: Functions of EIC resources should be spelled consistently
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
Severity: Violation
Applies to: countryCode

According to the following correction table (data/turtle/small/function-valid.ttl):

functionInvalid	functionValid
balance group	Balance Group
It-System	IT-system
LNG terminal	LNG Terminal
Generation	Generation Unit
Production Plant	Production Unit

Notes about the first 3 lines (case normalization):

There is no lower/upper case sensitivity required for LIOs to upload EIC data
The Transparency Portal UI always capitalizes each word of the function
We think that this case normalization should be done by the data storage layer, not at the UI

Notes about the last 2 lines:

doc Functions p4 lists both variants
However, ENTSOE communicated to us: "these functions are under review, and it is foreseen to have only Generation Unit and Production Unit at the end"

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select $this ?s2 {
        $this a tr:EnergyResource; tr:function ?invalid.
        ?s2 a tr:FunctionValid; tr:functionInvalid ?invalid}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "Will be corrected to {?valid}";
  sh:select """
    select $this (tr:function as ?path) (?invalid as ?value) ?valid {
      $this tr:function ?invalid.
      [] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid}"""].

SPARQL check:

select ?this {
      ?this a tr:EnergyResource; tr:function ?invalid.
      [] a tr:FunctionValid; tr:functionInvalid ?invalid}

3.6.2.1 correct-function-spelling

Misspellings of functions (eg "Production Plant", "Generator") are corrected to enable further checks. We use an RDF mapping table that incorporates correct and misspelled functions, with rows like this:

[] a tr:FunctionValid; tr:functionInvalid "Production Plant"; tr:functionValid "Production Unit".
[] a tr:FunctionValid; tr:functionInvalid "Generation";       tr:functionValid "Generation Unit".

The spelling correction is done by this SPARQL update:

base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

delete {graph <graph/allocated-eic-codes> {?x tr:function ?invalid}}
insert {graph <graph/allocated-eic-codes> {?x tr:function ?valid}}
where {
  ?x a tr:EnergyResource; tr:function ?invalid.
  [] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid
}

3.6.3 function-specific

Rule Group: EIC-function
Description: An EIC resource with a specific function doesn't also need "Resource Object" because that's unspecific, so it should be elided
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
Severity: Warning
Applies to: countryCode

This query finds 11 "Resource Objects" that have a more specific function:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x tr:function "Resource Object", ?fun
  filter(?fun != "Resource Object")
}

Examples:

30W-CEE-COGEA--T: "Generation Unit", "Resource Capacity Market Unit": elide "Resource Object"
45W000000000141O: "Production Unit", "Load": elide "Resource Object"

sh:targetClass tr:EnergyResource;
sh:or (
  [sh:path tr:function; sh:maxCount 1]
  [sh:path tr:function; sh:not [sh:hasValue "Resource Object"]]).

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
    {
        SELECT ?this (COUNT(?fun) as ?cnt) {
            ?this a tr:EnergyResource;
                  tr:function ?fun .
        } GROUP BY ?this
    }
    ?this tr:function ?fun .
    FILTER (?cnt > 1 && ?fun = "Resource Object")
}

3.6.4 Top-Level-must-have-only-function-ProductionUnit

Rule Group: ProductionUnit-Structure
Description: The top level resources in Production and Generation Units must have function "Production Unit", and not any other function
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
Severity: Violation
Applies to: biddingZone

Production and Generation Units data is supposed to have Production Units at the top level, and Generation Units at the bottom level. In practice, there are many "Production Units" mislabeled with function "Generation Unit" and vice versa.

This query counts all invalid situations:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select
  (count(?prodNotProd) as ?prodNotProd1)
  (count(?prodIsGen)   as ?prodIsGen1)
  (count(?genNotGen)   as ?genNotGen1)
  (count(?genIsProd)   as ?genIsProd1)
{
  {?prodNotProd a tr:ProductionUnit filter not exists{?x tr:function "Production Unit"}} union
  {?prodIsGen   a tr:ProductionUnit filter     exists{?x tr:function "Generation Unit"}} union
  {?genNotGen   a tr:GenerationUnit filter not exists{?x tr:function "Generation Unit"}} union
  {?genIsProd   a tr:GenerationUnit filter     exists{?x tr:function "Production Unit"}}
}

prodNotProd1	prodIsGen1	genNotGen1	genIsProd1
0	2499	0	3140

sh:targetClass tr:ProductionUnit;
sh:property [
  sh:path tr:function;
  sh:maxCount 1;
  sh:hasValue "Production Unit"].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select distinct ?x ?cc {
    {
        SELECT (COUNT(?function) as ?count) ?x {
            ?x a tr:ProductionUnit ;
               tr:biddingZone/tr:notation ?cc .
            OPTIONAL {
                ?x tr:function ?function .
            }
        } GROUP BY ?x
    }
    filter (not exists {
            ?x tr:function "Production Unit"
        } || ?count > 1)
} limit 1000

3.6.5 Bottom-Level-must-have-only-function-GenerationUnit

Rule Group: ProductionUnit-Structure
Description: The bottom level resources in Production and Generation Units must have function "Generation Unit", and not any other function
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
Severity: Violation
Applies to: biddingZone

sh:targetClass tr:GenerationUnit;
sh:property [
  sh:path tr:function;
  sh:maxCount 1;
  sh:hasValue "Generation Unit"].

SPARQL check

"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit
  optional {?x tr:function ?fun filter (?fun !=""Generation Unit"")}
  filter (not exists {?x tr:function ""Generation Unit""}
        || bound(?fun))
} limit 100"

3.6.6 parentResource-semiInverse-generationUnit

Rule Group: ProductionUnit-Structure
Description: The relation parentResource (in EIC) should be "semi-inverse" of generationUnit (in Production and Generation Units), i.e. GeneratingUnit.parentResource should be inverse of generationUnit
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICParent_MarketDocument.mRID, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
Severity: Violation
Applies to: biddingZone

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select $this ?s2 {
    $this a tr:GenerationUnit ;
       ^tr:generationUnit ?s2 ;
        tr:parentResource ?parent2 .
      FILTER (?s2 != ?parent2)
      ?s2 a tr:ProductionUnit .
  }
  """];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:select """
      select $this ?value {
        $this a tr:GenerationUnit ;
           ^tr:generationUnit ?value ;
            tr:parentResource ?parent2 .
          FILTER (?value != ?parent2)
          ?value a tr:ProductionUnit .
      }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit ;
     ^tr:generationUnit ?parent ;
      tr:parentResource ?parent2 .
    FILTER (?parent != ?parent2)
    ?parent a tr:ProductionUnit .
}

3.6.7 ProductionUnits-and-GenerationUnits-in-EIC

Rule Group: ProductionUnit-Structure
Description: All Production Units and Generation Units must be described in the master EIC file, thus have EIC code, name, notation, function.
Data Items: generation/ProductionAndGenerationUnits, basic/allocated-eic-codes
Fields: Configuration_MarketDocument/TimeSeries/mRID, EIC_MarketDocument/EICCode_MarketDocument/mRID
Severity: Violation
Applies to: biddingZone

There are 938 power units (Production or Generation Units) that are missing from the EIC file:

base  <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  values ?type {tr:ProductionUnit tr:GenerationUnit}
  ?x a ?type
  filter not exists {
    {graph <graph/allocated-eic-codes> {?x tr:eic []}} 
  }
}

Eg 47W000000000318I has assetType, biddingZone, controlArea, providerParticipant, generatorUnit, highVoltageLimit, installedOutput, location, notationAlt but not EIC data.

sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property [
  sh:path tr:eic;
  sh:minCount 1].

Notes:

Multiple SHACL targets are allowed: "union of terms produced by the individual targets that are declared by the shape"
Note: this below is no longer relevant, because tr:ProductionUnit, tr:GenerationUnit are disjoint
- Discussion: Is union of targets DISTINCT?
- TQ API, PySHACL: yes
- rdf4j ShaclSail: no, because checking is done in parallel
- Erratum: data-shapes#143

SPARQL check:

"base  <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  values ?type {tr:ProductionUnit tr:GenerationUnit}
  ?x a ?type
   filter not exists {
     {graph <graph/allocated-eic-codes> {?x tr:eic []}} # exists in <graph/correction/prodUnit-add-basic-data-to-EIC>
   }
}"

3.6.7.1 prodUnit-add-basic-data-to-EIC

For Power Units missing from the EIC file, we add the following basic EIC fields:

rdf:type tr:EnergyResource
function from the subclass ProductionUnit or GenerationUnit. Note: The Production and Generation Units conversion emits one of these subclasses of tr:EnergyResource:
- top level: tr:ProductionUnit
- bottom level: tr:GenerationUnit
eic from the URL (and next section calculates eicType)
notation from notationAlt
(CANCEL: no such field: countryCode in biddingZone or controlArea)

base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

clear silent graph <graph/correction/prodUnit-add-basic-data-to-EIC>;
insert {graph <graph/correction/prodUnit-add-basic-data-to-EIC> {
  ?x a tr:EnergyResource;
    tr:function ?func;
    tr:eic ?eic;
    tr:notation ?notation;
}} where {
  values (?type ?func) {
    (tr:ProductionUnit "Production Unit")
    (tr:GenerationUnit "Generation Unit")
  }
  ?x a ?type
  filter not exists {?x tr:eic []}
  bind((replace(str(?x),".*/","")) as ?eic)
  optional {?x tr:notationAlt ?notation}
}

3.6.8 ProductionUnits-GenerationUnits-described-once

Rule Group: ProductionUnit-Structure
Description: Production and Generation Units should be described only once across their applicable areas, or if multiple times then all fields should be reported consistently
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP
Severity: Violation
Applies to: biddingZone

For example, the following Units are reported with different fields in Bidding Zone vs Control Area

18WEGREEN-1234-3: different installedOutput
47W000000000355C: different installedOutput
47W000000000356A: different installedOutput
18WEGREEN-1234-3: different dateImplemented
11W0-0000-0026-Y: different location
49W0000000000342: different notationAlt

Note: highVoltageLimit, assetType, providerParticipant are always consistent. We checked with a query like this:

select * {
  ?x tr:highVoltageLimit ?y1,?y2
  filter(str(?y1)<str(?y2))
}

sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property <shape/property/100>, <shape/property/101>, <shape/property/102>, <shape/property/103>.

<shape/property/100> a sh:PropertyShape; sh:path tr:installedOutput; sh:maxCount 1.
<shape/property/101> a sh:PropertyShape; sh:path tr:dateImplemented; sh:maxCount 1.
<shape/property/102> a sh:PropertyShape; sh:path tr:location;        sh:maxCount 1.
<shape/property/103> a sh:PropertyShape; sh:path tr:notationAlt;     sh:maxCount 1.

SPARQL check

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
SELECT * WHERE {
    {
        select ?x (COUNT(?installed) as ?installedCount) (COUNT(?date) as ?dateCount) (COUNT (?loc) as ?locationCount) (COUNT(?not) as ?notationCount) {
            ?x a tr:ProductionUnit, tr:GenerationUnit ;
               tr:installedOutput ?installed ;
               tr:dateImplemented ?date ;
               tr:location ?loc ;
               tr:notationAlt ?not .
        } GROUP BY ?x
    } 
    FILTER(?installedCount > 1 || ?dateCount > 1 || ?locationCount > 1 || ?notationCount > 1)
}

3.6.9 EIC-in-ProductionUnit-data

Rule Group: ProductionUnit-Structure
Description: EIC resources with functions "Production Unit" and "Generation Unit" should be described in the Production and Generation Units data item
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, Configuration_MarketDocument/TimeSeries/MktPSRType/psrType, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/generatingUnit_PSRType.psrType
Severity: Violation
Applies to: countryCode

sh:targetClass tr:EnergyResource;
sh:or (
  [sh:not [sh:path tr:function; dash:hasValueIn ("Production Unit" "Generation Unit")]]
  [        sh:path rdf:type;    dash:hasValueIn (tr:ProductionUnit tr:GenerationUnit)]).

SPARQL check:

"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dash: <http://datashapes.org/dash#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
  ?x a tr:EnergyResource; tr:function ?fun.
  filter(?fun in (""Production Unit"", ""Generation Unit""))
  filter not exists {
    ?x a ?type
    filter(?type in (tr:ProductionUnit, tr:GenerationUnit))
  }
}"

3.6.9.1 add-eicType

This correction adds field eicType based on the third char of eic.

It connects each EnergyResource to codelist <type/Eic/> (where notation is the char, name is the type).
For example, <type/Eic/W> is "Resource Object"
This can be seen in the models EIC Mapping and Combined Mapping.

base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

clear silent graph <graph/correction/eicType>;
insert {graph <graph/correction/eicType> {
  ?x tr:eicType ?type
}} where {
  ?x tr:eic ?eic
  bind(substr(?eic,3,1) as ?notation)
  ?type tr:codeList <type/Eic>; tr:notation ?notation
}

(There is no particular reason to run this right after the previous validation rule.)

3.6.10 EIC-compatible-with-function

Rule Group: EIC-function
Description: Each EIC type (third char of EIC) should be compatible with the function(s) of the resource as per table below. To fix this, the function or EIC of these resources would need to be changed. But changing EIC is not a good idea (nor it is a good idea to embed information in identifiers)
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, EIC_MarketDocument/EICCode_MarketDocument/mRID
Severity: Violation
Applies to: countryCode

According to the following table (data/turtle/small/eicType-valid.ttl:

function	eicTypeInvalid	eicTypeValid
System Operator	`W` Resource Object	`X` Party
Control Block	`X` Party	`Y` Area or Domain
Market Area	`X` Party	`Y` Area or Domain

For the implementation we use SPARQL-based Constraints.

We first find all offending nodes using one query (sh:SPARQLTarget)
Then a second query (a sh:SPARQLConstraint) is ran for each offending node. This "double-query" approach reduces execution time because the offenders are a small subset of all tr:EnergyResource
Note: the inline-bind (tr:function as ?path) doesn't work in GDB (GDB-6713)

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select distinct $this ?s2 {
        $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
        ?s2 a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.} """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:function as ?path) (sample(?func) as ?value) {
      $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
      [] a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.
    } group by $this ?path"""].

SPARQL check:

select distinct $this ?s2 {
        $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
        ?s2 a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.}

3.6.11 function-compatible-with-EIC

Rule Group: EIC-function
Description: Each function of a resource should be compatible with its EIC type (third char of EIC) as per "List of allowed functions for the EIC codes". Misspellings are not listed here. This is a soft constraint.
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, EIC_MarketDocument/EICCode_MarketDocument/mRID
Severity: Warning
Applies to: countryCode

According to turtle/small/eicType-function.ttl, which is RDFized from docs/eicType-function-allowed.tsv, which is extracted from "List of allowed functions for the EIC codes".

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
      $this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
      filter not exists {?s2 tr:functionValid ?func}} """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:function as ?path) (sample(?func) as ?value) {
      $this tr:eicType ?type; tr:function ?func
      filter not exists {?type tr:functionValid ?func}
    } group by $this ?path"""].

SPARQL check:

      select distinct $this ?s2 {
        $this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
        filter not exists {?s2 tr:functionValid ?func}} """];

3.6.12 ProductionUnits-installedOutput-highVoltageLimit

Rule Group: ProductionUnit-Structure
Description: Production Units should have installedOutput and highVoltageLimit. This is a soft constraint
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/production_PowerSystemResources.highVoltageLimit
Severity: Warning
Applies to: biddingZone

sh:targetClass tr:ProductionUnit;
sh:property <shape/property/104>, <shape/property/105>.

<shape/property/104> a sh:PropertyShape; sh:path tr:installedOutput;  sh:minCount 1.
<shape/property/105> a sh:PropertyShape; sh:path tr:highVoltageLimit; sh:minCount 1.

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:ProductionUnit
  filter (not exists {?x tr:installedOutput []}
       || not exists {?x tr:highVoltageLimit []})
}

3.6.13 GenerationUnits-installedOutput

Rule Group: ProductionUnit-Structure
Description: Generation Units should have installedOutput. This is a soft constraint
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
Severity: Warning
Applies to: biddingZone

sh:targetClass tr:GenerationUnit;
sh:property [sh:path tr:installedOutput;  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit
  filter not exists {?x tr:installedOutput []}
}

3.6.14 ProductionUnit-highVoltageLimit-not-zero

Rule Group: ProductionUnit-Data
Description: highVoltageLimit should not be zero: such data should be omitted
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/production_PowerSystemResources.highVoltageLimit
Severity: Violation
Applies to: biddingZone

sh:targetClass tr:ProductionUnit;
sh:not [sh:path tr:highVoltageLimit; sh:hasValue "0"^^xsd:float].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:highVoltageLimit ?hvl .
    FILTER (?hvl = "0"^^xsd:float)
}

3.6.15 ProductionUnit-installedOutput-not-zero

Rule Group: ProductionUnit-Data
Description: installedOutput should not be zero, except in Production Units that are offline (perhaps kept as "cold reserve")
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
Severity: Warning
Applies to: biddingZone

sh:targetSubjectsOf tr:installedOutput;
sh:not [sh:path tr:installedOutput; sh:hasValue "0"^^xsd:float].

Some examples:

eic:46WGU0000000017Y "Karlshamn G1" fossil oil plant: ENTSOE reports it to have capacity 0, which seems to be confirmed by other sources:
- Operator's page reservkraft/karlshamnsverket: "Karlshamnsverket is powered by oil and has a total power of 662 MW and has had the role of reserve power plant since the early 1980s. The power reserve is a resource that Svenska kraftnät has on standby in the event of a power shortage in Sweden."
- Wikipedia page Karlshamn Power Station
- NordPool REMIT UMM page (significant market events) reports this resource as unavailable for a while, while generally available (but that was back in 2014 and 2015)
eic:26WIMPI-S09RDCNB "IM-S09RDCN" Nuova Radicondoli geothermal plant: ENTSOE reports it to have capacity 0, which seems to be disconfirmed by other sources:
- Toscana article Centrali geotermiche in Toscana lists NUOVA RADICONDOLI 1 and 2 with starting year 2002, 2010 respectively
- InToscana article Geotermia: a Radicondoli (SI) nuova centrale di Enel Green Power (Dec 2013): mentions capacities of 20MW, 40MW
- The book Geothermal Power Generation: Developments and Innovation (2016) lists NUOVA RADICONDOLI 1 and 2 with capacity of 40 kW, 20 kW respectively (too small, seems to be a mistake)
- COSVIG article Radicondoli, torna ”Centrale Aperta” presso l’impianto geotermico Enel (Jul 2015): "the Nuova Radicondoli plant consists of two groups with an installed capacity of 40 MW and produces electricity equal to the consumption of approximately 100,000 Tuscan families."
eic:34WETG-ZRENJ---4 "TETO Zrenjanin" cogeneration plant (TE-TO): ENTSOE reports it to have capacity 0, which seems to be disconfirmed by other sources:
- eps.rs (Electroprivreda Srbje) lists it for 110-120 MW
- elektroenergetika.info lists it for 110 MW (Start of operation: 1989)
- b52.net comment thread (Mar 2019) "TETO Zrenjanin is not profitable because ... The combined cycle is profitable when the price is approximately above 55 €/MWh"

SPARQL check:

BASE         <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    GRAPH <graph/ProductionAndGenerationUnits> {
        $this tr:installedOutput ?io .
        FILTER (?io = "0"^^xsd:float)
    }
}

3.6.16 ProductionUnit-and-GenerationUnit-same-responsibleParticipant

Rule Group: ProductionUnit-Structure
Description: A Production Unit and all its Generation Units should have the same Responsible Participant. This is a soft constraint since exceptions are possible
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICResponsible_MarketParticipant.mRID, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
Severity: Warning
Applies to: biddingZone

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
      $this a tr:ProductionUnit ;
            tr:generationUnit/tr:responsibleParticipant ?genRP ;
                             tr:responsibleParticipant ?RP .
      FILTER (?genRP != ?RP)
      $this tr:generationUnit ?s2
  }
  """];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:select """
select distinct $this ?value {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:responsibleParticipant ?value ;
                           tr:responsibleParticipant ?RP .
    FILTER (?value != ?RP)
}
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:responsibleParticipant ?genRP ;
                           tr:responsibleParticipant ?RP .
    FILTER (?genRP != ?RP)
}

3.6.17 ProductionUnit-and-GenerationUnit-same-country

Rule Group: ProductionUnit-Structure
Description: A Production Unit and all its Generation Units should have the same country code
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
Severity: Violation
Applies to: countryCode

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
        $this a tr:ProductionUnit ;
              tr:generationUnit ?s2 ;
              tr:countryCode ?RP .
        ?s2 tr:countryCode ?genRP .
        FILTER (?genRP != ?RP)
    }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?value {
        $this a tr:ProductionUnit ;
              tr:generationUnit ?s2 ;
              tr:countryCode ?RP .
        ?s2 tr:countryCode ?value .
        FILTER (?value != ?RP)
    }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:countryCode ?genRP ;
                           tr:countryCode ?RP .
    FILTER (?genRP != ?RP)
}

3.6.18 ProductionUnit-Zone-or-Area-same-country

Rule Group: ProductionUnit-Structure
Description: The country code of a Production Unit and all its Bidding Zones and Control Areas should be the same (when present). This is a soft constraint since areas and zones are not co-extensive with countries
Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country, Configuration_MarketDocument/TimeSeries/biddingZone_Domain.mRID, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
Severity: Warning
Applies to: countryCode

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
        $this a tr:ProductionUnit ;
              (tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
                                            tr:countryCode ?RP .
        FILTER (?genRP != ?RP)
        $this (tr:biddingZone | tr:controlArea) ?s2 .
        ?s2 tr:countryCode ?genRP ;
    }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?value {
        $this a tr:ProductionUnit ;
              (tr:biddingZone|tr:controlArea)/tr:countryCode ?value ;
                                            tr:countryCode ?RP .
        FILTER (?value != ?RP)
    }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          (tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
                           tr:countryCode ?RP .
    FILTER (?genRP != ?RP)
}

3.6.19 location-informative

Rule Group: ProductionUnit-Data
Description: Locations should carry informative place names (e.g. city, region, country name/code), and should not be digits only, an EIC, "intra_zonal", or "locName" (but "internal" is ok)
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/registeredResource.location.name
Severity: Warning
Applies to: countryCode

Discovered:

cd data/turtle/prodUnit
perl -lne 'm{:location +"(.*)"} and do {$_=$1; s{^\d{2}[A-Z][A-Z0-9-]{13}$}{EIC}; s{^\d+$}{digits}; print}' *|sort|uniq -c|sort -rn|less

sh:targetSubjectsOf tr:location ;
sh:property [
  sh:path tr:location;
  sh:not [sh:pattern "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"]].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this tr:location ?loc .
    FILTER (REGEX(?loc, "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"))
}

3.6.20 ProductionUnit-capacity-GTE-GenerationUnit-capacity

Rule Group: ProductionUnit-Data
Description: The capacity (Nominal Power) of a Production Unit should equal the sum of its Generating Units; or should be greater (in case some Generation Units are not described)
Data Items: generation/ProductionAndGenerationUnits
Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
Severity: Warning
Applies to: biddingZone

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  $this a tr:ProductionUnit; tr:installedOutput ?value
  {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
  filter(?value<?value2)
}

Implementation:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select $this ?s2 {
        $this a tr:ProductionUnit; tr:installedOutput ?value ; tr:generationUnit ?s2 .
        {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
        filter(?value<?value2)}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:message "Should be greater than or equal to {?value2}";
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:installedOutput as ?path) ?value ?value2 {
      $this a tr:ProductionUnit; tr:installedOutput ?value
      {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
      filter(?value<?value2)}"""].

3.6.21 VAT-country-prefix

Rule Group: EIC-VAT
Description: VAT numbers of market participants should start with a country code, not digits. This is a soft constraint since the country code (if present) can be prepended to the VAT
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
Severity: Warning
Applies to: countryCode

Example: 6326035O is invalid (IE6326035O would be valid)

sh:targetSubjectsOf tr:vatNumber;
sh:property [
  sh:path tr:vatNumber;
  sh:pattern "^[A-Z][A-Z]"].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this ?vat {
    $this tr:vatNumber ?vat .
    FILTER (!REGEX(?vat, "^[A-Z][A-Z]"))
}

3.6.21.1 VAT-add-country-prefix

This correction normalizes VAT codes: those starting with digit are prefixed with the country code, enabling VAT-per-country-syntax check and VAT-per-country-exists check (in VIES).

base        <https://transparency.ontotext.com/resource/>
prefix tr:  <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

delete {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?old}}
insert {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?new}}
where {
  values (?co ?co1 ?regex) {
    ("AL" "AL"  "^[JKLM][0-9]"      )
    ("AR" "AR"  "^[0-9]"            )
    ("AT" "AT"  "^U[0-9]"           )
    ("BA" "BA"  "^[0-9]"            )
    ("BE" "BE"  "^[0-9]"            )
    ("BG" "BG"  "^[0-9]"            )
    ("CH" "CHE" "^(CH)?[0-9]"       )
    ("CY" "CY"  "^[0-9]"            )
    ("CZ" "CZ"  "^[0-9]"            )
    ("DE" "DE"  "^[0-9]"            )
    ("DK" "DK"  "^[0-9]"            )
    ("EE" "EE"  "^[0-9]"            )
    ("ES" "ES"  "^[A-Z][0-9]"       )
    ("FI" "FI"  "^[0-9]"            )
    ("FR" "FR"  "^[0-9]"            )
    ("GB" "GB"  "^[0-9]"            )
    ("GE" "GE"  "^[0-9]"            )
    ("GR" "EL"  "^(GR|GREL)?[0-9]"  )
    ("HR" "HR"  "^[0-9]"            )
    ("HU" "HU"  "^[0-9]"            )
    ("IE" "IE"  "^[0-9]"            )
    ("IT" "IT"  "^[0-9]"            )
    ("IS" "IS"  "^[0-9]"            )
    ("KY" "KY"  "^[0-9]"            )
    ("LI" "LI"  "^[0-9]"            )
    ("LT" "LT"  "^[0-9]"            )
    ("LU" "LU"  "^[0-9]"            )
    ("LV" "LV"  "^[0-9]"            )
    ("MD" "MD"  "^[0-9]"            )
    ("ME" "ME"  "^[0-9]"            )
    ("MK" "MK"  "^[0-9]"            )
    ("MT" "MT"  "^[0-9]"            )
    ("NL" "NL"  "^[0-9]"            )
    ("NO" "NO"  "^[0-9]"            )
    ("PL" "PL"  "^[0-9]"            )
    ("PT" "PT"  "^[0-9]"            )
    ("RO" "RO"  "^[0-9]"            )
    ("RS" "RS"  "^[0-9]"            )
    ("RU" "RU"  "^[0-9]"            )
    ("SE" "SE"  "^[0-9]"            )
    ("SG" "SG"  "^[0-9]"            )
    ("SI" "SI"  "^[0-9]"            )
    ("SK" "SK"  "^[0-9]"            )
    ("TR" "TR"  "^[0-9]"            )
    ("UA" "UA"  "^[0-9]"            )
    ("US" "US"  "^[0-9]"            )
    ("XK" "XK"  "^[0-9]"            )
  }
  ?x tr:countryCode ?co; tr:vatNumber ?old.
  filter(regex(?old,?regex))
  bind(replace(?old,"^(CH|GR|GREL)","") as ?vat1)
  bind(concat(?co1,?vat1) as ?new)
}

3.6.22 VAT-per-country-syntax

Rule Group: EIC-VAT
Description: VAT numbers of market participants should be syntactically valid, according to specific rules per country-code prefix. Prefixes GBP, UK, LEI, NONE are invalid.
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
Severity: Violation
Applies to: countryCode

Examples:

IE8F52100V is valid syntax
ES20470001 is invalid syntax (ESA20470001 is valid)

sh:targetSubjectsOf tr:vatNumber;
sh:path tr:vatNumber ;
sh:or (
  [sh:pattern "^ADU\\d{6}[A-Z]$"                 ]
  [sh:pattern "^AL[JKLM]\\d{8}[A-Z]$"            ] 
  [sh:pattern "^AR\\d{14}$"                      ]
  [sh:pattern "^ATU\\d{8}$"                      ]
  [sh:pattern "^AU\\d{11}$"                      ]
  [sh:pattern "^BA\\d{12,13}$"                   ]
  [sh:pattern "^BE\\d{10}$"                      ]
  [sh:pattern "^BG\\d{9,10}$"                    ]
  [sh:pattern "^CHE\\d{9}$"                      ]
  [sh:pattern "^CY\\d{8}[A-Z]$"                  ]
  [sh:pattern "^CZ\\d{8,10}$"                    ]
  [sh:pattern "^DE\\d{9}$"                       ]
  [sh:pattern "^DK\\d{8}$"                       ]
  [sh:pattern "^EE\\d{9}$"                       ]
  [sh:pattern "^EL\\d{9}$"                       ]
  [sh:pattern "^ES[A-Z]\\d{7}[\\dA-Z]$"          ]
  [sh:pattern "^FI\\d{8}$"                       ]
  [sh:pattern "^FL\\d{11}$"                      ]
  [sh:pattern "^FR\\d{11}$"                      ]
  [sh:pattern "^GB\\d{9}$"                       ]
  [sh:pattern "^HR\\d{11}$"                      ]
  [sh:pattern "^HU\\d{8}$"                       ]
  [sh:pattern "^IE\\d[\\dA-Z]\\d{5}[A-Z]{1,2}$"  ]
  [sh:pattern "^IS\\d{5}$"                       ]
  [sh:pattern "^IT\\d{10,11}$"                   ]
  [sh:pattern "^JE\\d{10}$"                      ]
  [sh:pattern "^KY\\d{6}$"                       ]
  [sh:pattern "^LI\\d{5}$"                       ]
  [sh:pattern "^LT(\\d{9}|\\d{12})$"             ]
  [sh:pattern "^LU\\d{8}$"                       ]
  [sh:pattern "^LV\\d{11}$"                      ]
  [sh:pattern "^MA\\d{7}$"                       ]
  [sh:pattern "^MD\\d{7}$"                       ]
  [sh:pattern "^ME(\\d{8}|\\d{12})$"             ]
  [sh:pattern "^MK\\d{13}$"                      ]
  [sh:pattern "^MR\\d{8}$"                       ]
  [sh:pattern "^MT\\d{8}$"                       ]
  [sh:pattern "^NL\\d{9}B\\d{1,2}$"              ]
  [sh:pattern "^NO\\d{9}(M|MVA)?$"               ]
  [sh:pattern "^PL\\d{10}$"                      ]
  [sh:pattern "^PT\\d{9}$"                       ]
  [sh:pattern "^RO\\d{7,8}$"                     ]
  [sh:pattern "^RS\\d{9}$"                       ]
  [sh:pattern "^RU\\d{10}$"                      ]
  [sh:pattern "^SE\\d{12}$"                      ]
  [sh:pattern "^SG[A-Z]?\\d{9}[A-Z]$"            ]
  [sh:pattern "^SI\\d{8}$"                       ]
  [sh:pattern "^SK\\d{10}$"                      ]
  [sh:pattern "^SM\\d{5}$"                       ]
  [sh:pattern "^TR\\d{10}$"                      ]
  [sh:pattern "^UA\\d{8,12}$"                    ]
  [sh:pattern "^US\\d{9}([A-Z]{2}\\d)?$"         ]
  [sh:pattern "^XK\\d{9}$"                       ]
).

3.6.23 VAT-country-exists

Rule Group: EIC-VAT
Description: If a VAT is present then country code should also be present (so the VAT can be checked against that country)
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
Severity: Violation
Applies to: countryCode

sh:targetSubjectsOf tr:vatNumber;
sh:property [
  sh:path tr:countryCode;
  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/> select * {    ?this tr:vatNumber [] .    FILTER NOT EXISTS {        ?this tr:countryCode ?cc .}} limit 10

3.6.24 VAT-country-conforms

Rule Group: EIC-VAT
Description: The first two chars of VAT must equal the country code (except "GR" which is spelled "EL" in VAT codes, and "CH" which is spelled "CHE")
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
Severity: Violation
Applies to: countryCode

Examples:

59XREALPETROL11F "REAL PETROL HOLDING KFT" with VAT "HU24189514": country "IT" is wrong
22X20110811----W "INEOS CHLORVINYLS LIMITED" with VAT "GB768506886": country "BE" is wrong

More example for traders in AE (United Arab Emirates), in particular the Dubai DMCC:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?eic ?co ?vat ?name ?notation ?function ?descr {
  ?x tr:countryCode "AE"
  optional {?x tr:eic ?eic}
  optional {?x tr:countryCode ?co}
  optional {?x tr:name ?name}
  optional {?x tr:notation ?notation}
  optional {?x tr:function ?function}
  optional {?x tr:vatNumber ?vat}
  optional {?x tr:description ?descr}
}

eic	co	vat	name	notation	function	descr
48X000000000255O	AE		LUZIRA DMCC	BUGOLOBI	Interconnection Trade Responsible	A VAT number is not available for this company, so we are providing the Legal Entity Identifier (LEI) company registration number which is 984500O3EFBA8613AA78.
48X0000000000432	AE	GB383911772	COBBLESTONE ENERGY DMCC	COBBLESTONEDMCC	Balance Responsible Party	UK VAT Code not available. Value in above field is the registered company number.
11X0-0000-0554-Q	AE	NONE	ENERGETECH TRADING DMCC	ENERGETECH	Balance Responsible Party
53XPL000000ININY	AE		Infusion International INC	INFUSION_INTL	Network User	The company registered in UAE. According to local (UAE) regulations they are treated as offshore company and they function in so called free zone. No possibility for them to get the VAT code.
59XVORTICES--017	AE		Vortices Energy Ltd.	VORTICESENERGY	Balance Responsible Party	UAE Company; EU Value not inserted because non-european company.

This indicates some trouble regarding the filling of VAT information for non-European parties. Going row by row:

"LEI": it's better to extend the EIC File and data collection systems to be able to carry company identifiers other than vatNumber, including LEI
"UK VAT not available": it's unclear why the field vatNumber has the prefix "GB" given that it's an AE company. What forced the data entry user to enter this misleading value?
"NONE": it's better to leave a field null rather than enter such vacuous value. What forced the data entry user to enter this vacuous value?
"free zone, No possibility for them to get VAT code": surely such companies have a registered company number, and it should be possible to specify it
"EU Value not inserted because non-european company": surely all countries have registered company numbers, and it should be possible to specify it

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this tr:countryCode ?co; tr:vatNumber ?vat
      bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
      filter(!strstarts(?vat,?co1))}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "Country code is {?co}";
  sh:select """
    select $this (tr:vatNumber as ?path) (?vat as ?value) ?co {
      $this tr:countryCode ?co; tr:vatNumber ?vat}"""].

SPARQL check:

select $this {
        $this tr:countryCode ?co; tr:vatNumber ?vat
        bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
        filter(!strstarts(?vat,?co1))}

3.6.25 VAT-exists-in-VIES

Rule Group: EIC-VAT
Description: VAT numbers should exist when checked in external sources (EU VIES), or the market participant should have Deactivation Requested Date, or Status "Passive"
Data Items: basic/allocated-eic-codes
Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
Severity: Violation
Applies to: countryCode

A python script queries VIES in bulk, then RDFize VIES Checks records that as RDF.

Currently we check in EU VIES only for EU and IE
A future enhancement could use NO and UK services (or their open trade register data) to check those important countries
We don't check for Northern Ireland since VIES uses the country code XI but most such companies in EIC data are recorded with code GB (except 2)
VIES reports many ES VAT numbers as non-existent, perhaps the respective companies are not registered for VAT. An example is 18XFERL-12345--K Ferloga, SL (VAT ESB24049272)
- Can be found in OpenCorporates as es/24049272
- Can be found in Kompass as ES CIF B24049272, VAT ESB24049272, Kompass ES1074724
- Can be found in Registradores de Espana business registry (enter company name "Ferloga" and Business Registry Office "Ourense") as NIF B24049272
- But cannot be found in VIES using either of B24049272, 24049272, A24049272
Some countries may not have open data or checking service available, e.g. RS

We use SHACL-SPARQL in order to put the wrong VAT number in ?value:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this tr:vatInVies false}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:vatNumber as ?path) ?value {
      $this tr:vatInVies false; tr:vatNumber ?value}"""];

SPARQL check:

select $this {
      $this tr:vatInVies false}

3.6.26 installedCapacity-Aggregated-vs-Per-Unit

Rule Group: Arithmetics
Description: Aggregated capacity per area and asset (production) type should be greater than the installed capacities of individual Production Units in that area, within a 30% bound
Data Items: generation/AggregatedGenerationPerType, generation/InstalledGenerationCapacityComputed, , generation/InstalledGenerationCapacityAggregated
Fields: AggregatedInstalledCapacity, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
Severity: Warning
Applies to: biddingZone, controlArea

Notes:

The two values are not expected to be equal due to differences in capture requirements:
- ProductionAndGenerationUnits data (16.1.A) is expected to report installed capacity only for units greater than 100 MW
- AggregatedGenerationPerType data (14.1.A), is expected to report aggregate capacity of all units greater than 1 MW
- ProductionAndGenerationUnits data represents current capacity, whereas AggregatedGenerationPerType represents the capacity on Jan 1 of the respective year
So this rule is intended to capture only substantial deviations that are due to data errors (see below).
The rule will check whether AggregatedGenerationPerType is within 100...130% of the sum of generation capacity per unit.

Example of a data mistake: Installed Capacity per Production Type for France on 21-Jan-2022 showed this:

Production Type	2021 MW	2022 MW
Other	1120	7900729

This means that 7.9 TW (7.9 million MW!) of "Other" capacity was newly installed in France. Have the French tamed some Dark Energy source that would solve all our energy problems?

Checking Installed Capacity Per Production Unit shows only 1 "Other" asset:

Production Type	Code	Name	Installed Capacity at the beginning of the year	Current Installed Capacity	Location	Voltage Connection Level	Commissioning Date
Other	17W100P100P0352E	CYCOFOS TV2	62	62	France	225	01.09.2009

It was installed in 2009 and there's no change in capacity (62 MW) in the last two years. So unfortunately the 7.9 TW is not a miracle but a data error.

Implementation:

InstalledGenerationCapacityComputed calculates the totals of installed capacity of production units in each zone/area as installedOutput, and 130% of that as installedOutputHigh
To avoid double counting, we sum up only the top level (ProductionUnit) because the bottom level (GenerationUnit) capacities are already included in the top level (see rule ProductionUnit-capacity-GTE-GenerationUnit-capacity)

SPARQL check:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select ?aggr ?comp ?aggrOutput ?compOutput ?compOutputHigh {
  ?aggr a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;
    tr:controlArea|tr:biddingZone ?area;
    tr:assetType ?assetType;
    tr:installedOutput ?aggrOutput.
  ?comp a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;
    tr:controlArea|tr:biddingZone ?area;
    tr:assetType ?assetType;
    tr:installedOutput ?compOutput;
    tr:installedOutputHigh ?compOutputHigh.
  filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))
} limit 1000

Implementation with SHACL-SPARQL. We return extra info using sh:message

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select (?aggr as $this) ?s2 {
      ?aggr a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?aggrOutput.
      ?s2 a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?compOutput;
        tr:installedOutputHigh ?compOutputHigh.
      filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:message "Must be between {?compOutput} and {?compOutputHigh}";
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this (?aggrOutput as ?value) ?compOutput ?compOutputHigh {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?aggrOutput.
      ?comp a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?compOutput;
        tr:installedOutputHigh ?compOutputHigh}"""].

3.6.27 ActualGenerationOutputPerGenerationUnit-controlArea-conform

Rule Group: Observations-Structure
Description: The Control Area of the observation must match the Control Area of the Generation Unit. This finds too many violations, so we return only the first 1000.
Data Items: generation/ActualGenerationOutputPerGenerationUnit, generation/ProductionAndGenerationUnits
Fields: AreaCode, GenerationUnitEIC, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
Severity: Violation
Applies to: controlArea

Out of 4.5M observations over 3 months, there are 3.3M violations:

11.5k where the Generation Unit has a matching controlArea, but that's because it was submitted at the top level of Production and Generation Units, i.e. that is a discrepancy
3.3M involving a Generation Unit that has no controlArea, neither itself or through its Production Unit (parentResource)

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this ?s2 ?s3 {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:controlArea ?area; tr:generationUnit ?s2 .
      optional {?s2 tr:parentResource? ?s3}
      filter not exists {$this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
    } limit 1000"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:controlArea as ?path) (?area as ?value) {
      $this tr:controlArea ?area}"""].

SPARQL check:

base <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>; tr:controlArea ?area.
  filter not exists {?this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
  optional {
    ?this tr:generationUnit ?gen
    optional {?gen tr:controlArea ?genArea}}
  optional {
    ?this tr:generationUnit/tr:parentResource ?prod
    optional {?prod tr:controlArea ?prodArea}}
} limit 1000

3.6.28 ActualGenerationOutputPerGenerationUnit-installedOutput-conform

Rule Group: Observations-Structure
Description: The InstalledGenCapacity of the observation must match the declared nominalP of the Generation Unit
Data Items: generation/ActualGenerationOutputPerGenerationUnit, generation/ProductionAndGenerationUnits
Fields: InstalledGenCapacity, GenerationUnitEIC, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP
Severity: Violation
Applies to: controlArea

SPARQL check:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select $this (?output1 as ?value) ?genUnitOutput {
  $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
    tr:installedOutput ?output1.
  optional {$this tr:generationUnit/tr:installedOutput ?output2}
  filter (!bound(?output2) || !(?output1 = ?output2))
  bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)
} limit 200

SPARQL count:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select (count(*) as ?c) (count(?output2) as ?c2) {
    $this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:installedOutput ?output
    filter not exists {$this tr:generationUnit/tr:installedOutput ?output2 filter (?output2=?output)}
    optional{$this tr:generationUnit/tr:installedOutput ?output2}
}

Violations:

generationUnit has different installedOutput 22.3k of 4.5M observations over 3 months; 25k over 4 months
generationUnit doesn't have any installedOutput 34.7k of 4.5M observations over 3 months; 63k over 4 months

Implementation:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this ?s2 {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:installedOutput ?output1; tr:generationUnit ?s2.
      filter not exists {?s2 tr:installedOutput ?output2
        filter(?output1 = ?output2)}}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The GenerationUnit installed capacity (nominalP) {?genUnitOutput}";
  sh:select """
    select $this (tr:installedOutput as ?path) (?output1 as ?value) ?genUnitOutput {
      $this tr:installedOutput ?output1.
      optional {$this tr:generationUnit/tr:installedOutput ?output2}
      bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)}"""].

3.6.29 ActualGenerationOutputPerGenerationUnit-LTE-installedOutput

Rule Group: Arithmetics
Description: ActualGenerationOutput should be less than or equal to the InstalledGenCapacity for each Generation Unit and date
Data Items: generation/ActualGenerationOutputPerGenerationUnit
Fields: ActualGenerationOutput, InstalledGenCapacity
Severity: Violation
Applies to: controlArea

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:actualOutput ?actual; tr:installedOutput ?installed
      filter(!(?actual <= ?installed))}"""];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:message "The actual generation output, `{?value}` of this observation is greater than the installed output, `{?installed}` for its Generation Unit." ;
    sh:select """
      select distinct $this ?installed ?value {
        $this a tr:DataObservation ;
              tr:actualOutput ?value; tr:installedOutput ?installed .     
        filter(!(?value <= ?installed))}  
    """].

SPARQL check:

base <https://transparency.ontotext.com/resource/>
select $this {
  $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
    tr:actualOutput ?actual; tr:installedOutput ?installed
  filter(!(?actual <= ?installed))};

3.6.30 Outage-controlArea-conform

Rule Group: Outage
Description: The area of an Outage must match the declared area of the Production Unit
Data Items: outages/UnavailabilityOfProductionUnits, generation/ProductionAndGenerationUnits
Fields: AreaCode, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
Severity: Violation
Applies to: controlArea

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
    $this a tr:Outage ;
          tr:controlArea ?ca ;
          tr:energyResource/tr:controlArea ?eca .
    FILTER (?ca != ?eca)
    $this tr:energyResource ?s2 .
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has the control area {?ca}, but its energy resource has the control area {?value}";
  sh:select """
    select distinct $this ?ca ?value {
      $this a tr:Outage ;
            tr:controlArea ?ca ;
            tr:energyResource/tr:controlArea ?value .
      FILTER (?ca != ?value)
    }      
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:controlArea ?ca ;
          tr:energyResource/tr:controlArea ?eca .
    FILTER (?ca != ?eca)
}

3.6.31 Outage-biddingZone-conform

Rule Group: Outage
Description: The zone of an Outage must match the declared zone of the Production Unit
Data Items: outages/UnavailabilityOfProductionUnits, generation/ProductionAndGenerationUnits
Fields: AreaCode, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/biddingZone_Domain.mRID
Severity: Violation
Applies to: biddingZone

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
      $this a tr:Outage ;
            tr:biddingZone ?ca ;
            tr:energyResource/tr:biddingZone ?eca .
      FILTER (?ca != ?eca)
      $this tr:energyResource ?s2
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has the bidding zone {?ca}, but its energy resource has the bidding zone {?value}";
  sh:select """
  select distinct $this ?value ?ca {
      $this a tr:Outage ;
            tr:biddingZone ?ca ;
            tr:energyResource/tr:biddingZone ?value .
      FILTER (?ca != ?value)
      $this tr:energyResource ?s2
  }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:biddingZone ?ca ;
          tr:energyResource/tr:biddingZone ?eca .
    FILTER (?ca != ?eca)
}

3.6.32 Outage-Unit-exists

Rule Group: Outage
Description: The Production/Generation Unit reported in an Outage must be described in Production And Generation Units
Data Items: outages/UnavailabilityOfProductionUnits, outages/UnavailabilityOfGenerationUnits, generation/ProductionAndGenerationUnits
Fields: PowerResourceEIC, Configuration_MarketDocument/TimeSeries/registeredResource.mRID
Severity: Violation
Applies to: controlArea

sh:targetClass tr:Outage;
sh:property [
  sh:path (tr:energyResource tr:eic);
  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage .
    FILTER NOT EXISTS {
        $this tr:energyResource/tr:eic ?eic
    }
}

3.6.33 Outage-installedCapacity-conform

Rule Group: Outage
Description: Installed Capacity reported in an Outage must match the Installed Capacity as described in Production And Generation Units
Data Items: outages/UnavailabilityOfProductionUnits, outages/UnavailabilityOfGenerationUnits, generation/ProductionAndGenerationUnits
Fields: InstalledCapacity, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
Severity: Violation
Applies to: controlArea

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
    $this a tr:Outage ;
          tr:installedOutput ?ca ;
          tr:energyResource/tr:installedOutput ?eca .
    FILTER (?ca != ?eca)
    $this tr:energyResource ?s2
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has an installed capacity {?ca}, but its energy resource has the installed capacity {?value}";
  sh:select """
  select distinct $this ?ca ?value {
    $this a tr:Outage ;
      tr:installedOutput ?ca ;
      tr:energyResource/tr:installedOutput ?value .
    FILTER (?ca != ?value)
  }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:installedOutput ?ca ;
          tr:energyResource/tr:installedOutput ?eca .
    FILTER (?ca != ?eca)
}

3.6.34 Outage-availableCapacity-LT-installedCapacity

Rule Group: Outage
Description: Available Capacity reported in an Outage must be less than the Installed Capacity
Data Items: outages/UnavailabilityOfGenerationUnits
Fields: AvailableCapacity, InstalledCapacity
Severity: Violation
Applies to: controlArea

sh:targetClass tr:Outage;
sh:property [
  sh:path tr:availableOutput;
  sh:lessThan tr:installedOutput].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:availableOutput ?ao ;
          tr:installedOutput ?io .
    FILTER (?io <= ?ao)
}

3.7 More Validation Rules

Here are ideas for more validation rules that are not yet defined. As we define them, we move them to the section above:

The forecasts and actuals of Generation in an area should be less than the max capacity of Production Units in that region
The actuals of Generation in an area should not deviate from forecasts more than a certain threshold (15%)

The following rules will not be implemented:

Locations should be meaningful, eg a City/Town name. We've implemented a limited variant, see location-informative. Could be implemented through integration with OSM.

3.8 Already Checked Rules

The following rules were checked quickly and no errors were found, so we found no need to implement them:

Each resource should be described in EIC once (or if multiple times then with consistent data)
- grep "<mRID>" allocated-eic-codes.xml|sort|uniq -d
If Production and Generation Units are described multiple times, the following fields are always consistent:
- highVoltageLimit, assetType, controlArea, biddingZone
- However, installedOutput is not consistent and we have a validation rule for that
Each EIC resource should have name and notation (short name)
- ?x tr:eic [] filter (!exists {?x tr:notation []} || !exists {?x tr:name []})
All quantities should use the same unit (installedOutput, actualOutput, availableOutput: MAW, highVoltageLimit: KVT)
- select ?unit (count(*) as ?c) {?x tr:unit ?unit} group by ?unit
- Therefore we can simplify the representation by omitting the unit
The nominalP unit of a "Production Unit" and all its "Generation Units" is always specified (and by the above-checked rule, is the same). Note: we've now eliminated tr:unit so the query below will not work

select ?powUnit ?powUnitN ?powUnitUOM ?genUnit ?genUnitN ?genUnitUOM {
    ?powUnit tr:generationUnit ?genUnit.
    optional {?powUnit tr:installedOutput/tr:unit ?powUnitUOM}
    optional {?genUnit tr:installedOutput/tr:unit ?genUnitUOM}
    filter (!bound(?powUnitUOM) || !bound(?genUnitUOM) || ?powUnitUOM != ?genUnitUOM)
}

3.9 Validation Service

Validation service options are currently under investigation. There are two validators under consideration: TopQuadrant SHACL API and GraphDB's ShaclSail.

The chief questions to be investigated are:

How expressive is the validator?
What is the validator's performance?
Can we use a hybrid approach integrating several validators?
How often to run validation? How to update validation results?
Do we need a TEKG API to initiate validation?
Will we check incrementally (only changed data points) or totally?

3.9.1 TQ SHACL API

The TQ SHACL API is an open-source API developed by TopQuadrant. It is based on Apache Jena.

It's a very flexible validator, with full support of SHACL-SPARQL and partial support of SHACL Advanced.
Slow performance, especially with SPARQL constraints. The validator uses internal data structures for selecting validation targets. SPARQL constraints are executed for each focus node. This can lead to substantial slowdowns.
SHACL definitions and report formats are the same as ShaclSail, but the Apache Jena models are different from RDF4J models. We would need an integration layer with GraphDB.
The validator is bulk, works on the complete data model.
There is no support for sh:annotationProperty, which would make reporting harder.

The performance issue could be mitigated by clever target definitions, i.e., using SPARQL for targeting.

Since we store data in GraphDB, we would need to fetch all data to be validated, store it in a Jena model (can be in-memory), then validate.

3.9.2 RDF4J ShaclSail

ShaclSail is implemented in RDF4J and is part of GraphDB. It is native to our database, so we would need no integration layer.

Partial support of core SHACL.
Has a targeting extension mechanism with RSX, which emulates a lot of sh:SPARQLTarget functionality more efficiently.
Better performance.
The validator can be bulk or incremental.
Insertions are always rejected when they contain invalid data.

Since we never want to reject data, and only want to record validation errors, we need to run with the validator toggled off, then do a bulk validation. This can be achieved in one of two ways:

Have SHACL always loaded in the database. Do all insertions first with validation turned on, to produce a report. Then with validation turned off, to store the data.
Do not have SHACL loaded in the database. Post-insertion, try inserting it, triggering a bulk validation. Store the violation report and, if there are no errors in the whole database, clear the SHACL shapes. If there were errors, SHACL shapes would not have been persisted.

Of the two, the first option is notably better performance-wise, except for very large files.

3.9.3 Custom SPARQL validations

Custom SPARQL validations are very flexible and offer better performance than SHACL-SPARQL. The downside is that we would need custom logic to implement them. Custom SPARQL validation also can easily be used in conjunction with one of the two SHACL validators.

3.10 DQA Dashboard

The DQA (Data Quality Assessment) Dashboard displays validation results.

The functions (scope) of the DQA dashboard include:

Navigation of rules by applicability (country or area), group (category)
Display validation result counts per area/country, rule, severity (Violation, Warning)
(CANCELED) Display %prevalence (percent of errors compared to all records of that kind)
(CANCELED) Display trends in time
Drilldown to individual violations
- Pagination
- Display enough info for each violation to be able to understand it
- Hyperlink to jump to the RDF data for the violating node, to be able to diagnose in details

DQA Mockups are shown in textual form in preceding sections:

Summary Validation Results Mockup
Individual Validation Results Mockup:
- for EIC VAT,
- for Actual Generation Output Per Generation Unit

4 External Data Integration

This section specifies Integrations and/or Validations based on external data to be integrated into the KG. In addition to the external data sources described in subsections, we also considered the following sources:

Wikidata (WD) is a global crowd-sourced knowledge base with encyclopedic coverage
- It has info about 100M items, about 5B claims, which translates to about 16B RDF triples
- It has about 10k descriptive properties, of which 6.5k are links to external databases. WD is therefore a coreferencing hub for integrating different data sources
- WD has about 16k power plants or generators world-wide, of which 8220 are in Europe (see query https://w.wiki/4dqA)
- 7222 of European power plants have geo-coordinates (see query https://w.wiki/4fKq)
- 920 of European power plants have EIC (see query https://w.wiki/4dq8)
- One WD power plant may have several EIC, the Bellevue NPP (France) has 4 EIC: 2 Production Units and 2 Generation Units. Thus, the modeling granularity is higher than in ENTSOE
- We decided not to use WD because OSM (see below) has deeper geographic info and comparable other info
- In a future project it's certainly worth to explore WD integration because it has excellent additional info, eg administrative areas where the power plant is located, and their populations. WD can be used together with OSM and TEKG by using SPARQL Federation
National data sources
- WD in its role of a coreferencing hub has a property for EIC: P8645 Energy Identification Code
- Its source website for the property characteristic lists about 11 national EIC lists (about half were added by us)
- We could use these to reach to national data sources, eg https://opendata.reseaux-energies.fr/ of France
- However, we decided not to use them, in favor of the more specialized Production Unit sources described below

4.1 External VAT Validation

Over 10000 VAT numbers are present in the data. We will validate them using the VIES-on-the-Web system. It is a free web service provided by the EC, running on top of national VAT databases corresponding to EC Member States and Northern Ireland.

The service is a simple SOAP API where two parameters are sent as XML elements: countryCode and vatNumber. The response is a boolean value whether the VAT number is valid, and if valid then some basic information about the entity it corresponds to.

Example response for VAT IT13433711002:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
            <countryCode>IT</countryCode>
            <vatNumber>13433711002</vatNumber>
            <requestDate>2022-01-12+01:00</requestDate>
            <valid>true</valid>
            <name>ARCADIA ITALIA S.R.L.</name>
            <address>VIA PERUGINO 4 00196 ROMA RM </address>
        </checkVatResponse>
    </soap:Body>
</soap:Envelope>

4.1.1 VATs in ENTSOE Data and VIES Coverage

An important limitation of VIES is that not all countries relevant for ENTSOE are present. A future project should evaluate the possibility to use an additional free service such as VATApp, or use directly open data dumps provided by the respective countries (UK and NO in particulr).

Find all countries in the ENTSOE dataset:

select (count(*) as ?c) ?co {
    ?x tr:vatNumber ?vat
    optional {?x tr:countryCode ?co}
} group by ?co order by desc(?c)

The query returns 57 results. One is blank (" ") which is not a country. So we are left with 56 countries.
VIES does not support the following: GB, CH, UA, MK, RS, GR, AL, NO, BA, MD, XK, ME, US, TR, LI, SG, KY, AE, GE, IS, AD, AR, AU, MA, MY, NC, PR, RU, SM, UK (Total 30)
Therefore VIES Supports 26 out of 56 countries, which is 46%
17 of 56 countries have less than 9 VAT numbers: TR LI SG KY AE GE IS AD AR AU MA MY NC PR RU SM UK. They'll will be ignored for VAT format analysis (see below)
Others have 11 or more

VAT Number Statistics: Out of 9,919 VAT numbers

50 start with 1 letter, e.g. K42101801N, country AL
5,472 start with 2 letters
3,733 start with 3 letters
There's 1 that starts with 4 letters: GREL099790528, country GR.
659 start with numbers, i.e. lack the country prefix
- Some of these are invalid, e.g. have extra or missing digits

4.1.2 VIES Validation Statistics

countryCode	total	valid	invalid	names
AT	128	110	18	110
BE	133	107	26	107
BG	187	169	18	169
CY	22	10	12	10
CZ	326	199	127	199
DE	1027	969	58
DK	92	85	7	85
EE	57	47	10	47
EL	93	90	3	90
ES	3455	1499	1956
FI	229	224	5	224
FR	124	107	17	107
HR	152	106	46	106
HU	109	75	34	75
IE	57	49	8	49
IT	611	349	262	349
LT	88	69	19	69
LU	25	21	4	21
LV	70	52	18	52
MT	11	11		11
NL	202	165	37	165
PL	341	235	106	235
PT	102	95	7	95
RO	228	186	42	186
SE	34	31	3	31
SI	117	79	38	79
SK	274	195	79	195
XI	2	1	1	1
TOTAL	8296	5335	2961	2867

#+TBLFM: @>$2..$> = vsum(@I..@-1)

VIES covers 30 countries, and 8.3k of the 10k VAT numbers present
The number of invalid VATs is surprisingly high: 2.9k of 8.3k or 35.7%
DE and ES never report names (and addresses), even for valid VATs

4.1.3 Per Country VAT Format

VAT format was researched on:

Wikipedia for most countries (referred to as WP)
This is another reference that was used for VAT Numbers, referred to as EU-TID:

Format and structure of tax identification numbers (TINs) in the EU

AL (Albania): 10 characters, first char following the prefix is [JKL], and the last character is a letter. E.g. K99999999L, L99999999G
- 44 out of 50 are valid according to the above format
- Invalid VATs: ALL11731504A, ALJ61820031J, ALL32130008F, M12221008I, ALK11624001V
AT (Austria): WP: 'ATU'+8 digits. E.g. ATU99999999. EU-TID 9 digits.
- 127/130: valid
- Invalid: U50568407, U49637200 and ATU6729404 (7 digits)
BA (Bosnia and Herzegovina)
BE (Belgium): WP: 'BE' + 8 digits + 2 check digits. E.g. BE09999999XX. EU-TID: 10 digits
- 125/138 comply
- 8/138: 9 digits. Invalid examples: GB768506886, 0711797282, 0754605263
BG (Bulgaria): WP: 9-10 digits. E.g. BG999999999. EU-TID : 10 digits
- 187/188 have 9 digits
BY (Belarus): Not present in the dataset
CH (Switzerland): 'CHE' + 9 digits with optional punctuation. E.g. CHE-123.456.788. The last digit is a MOD11 checksum
- 173/292 start with CHE followed by 9 digits
- 92/292 start with CH followed by 6 digits
- 11/292 start with CH followed by 9 digits
- 2/292 start with CH followed by 11 digits
- 7/292 start with CHE followed by 8 digits
- 1/292 start with CHE followed by 7 digits
- As we can see, many CH VATs in the dataset don't follow the format definition
CY (Cyprus): WP: 9 characters. E.g. CY99999999L. EU-TID: the same for individuals but 8 digits for legal entities.
- 21/23 comply with the official format
- 2 Invalid: 10375510G and 10390426G, miss CY prefix
CZ (Czech Republic): WP: 'CZ'+ 8 to 10 digits. EU-TID 8 digits.
- 321/332: 'CZ'+8 digits
- 8/332: 'CZ'+9 digits
- 2/332 are wrong: DE289523572 and DE814987657
DE (Germany): WP: 9 digits. E.g. DE999999999. EU-TID : 11 digits.
- 1029/1044 comply
- 2/1044: 8 digits, e.g. DE29149497 and DE29535215
- 2/1044: 10 digits, e.g. DE4370403223 and DE3503951816
- 9/1044 don't start with DE: 6 of them have 11 digits, 3 of them have 10 digits
DK (Denmark): WP: 8 digits, last digit is a checksum. E.g. DK99999999. EU-TID: 8 digits
- 95/100 comply
- 3/100 miss DK prefix
- 2 are completely wrong: GB684966762 and CZ07292015
EE (Estonia): WP: 9 digits. EU-TID : 8 digits for legal entities and 11 digits for individuals.
- 57/58 comply
- 1 is 14912868 which misses prefix as well as it has 8 digits instead of 9
ES (Spain)
- Format for companies: either 'ES'+letter+8 digits or 'ES'+letter+7 digits+letter. EU-TID:same. Where the first letter defines the type of company and the following first 2 digits define the province where the company was registered. The last character is a control digit.
- Format for individual people/freelancers: either 'ES'+8 digits+letter (for Spaniards) or 'ES'+letter+7 digits+letter (for foreigners). E.g. ESX9999999R
- 3363/3464 comply with the first format: 'ES'+letter+8 digits
- 45/3464 comply with the second format: 'ES'+letter+7 digits+letter
- 3/3464 miss 1 digit, e.g. ESA0879906, ESB9159561, ESA5840219
- 1/3464 is ESB588111980 (9 digits instead of 8)
- 3 don't start with ES: B95713541, PT980633745 and PT508193117
FI (Finland): WP: FI + 7 digits + check digit. E.g. FI99999999. EU-TID:same
- 230/234 comply
- 4/234 miss FI prefix
FR (France): WP: 'FR'+ 2 digits (as validation key) + 9 digits (as SIREN), the first and/or the second value can also be a character – e.g. FRXX999999999. EU-TID: 9 digits for legal entities and completely different thing for individuals.
- 118/125 comply
- 2/125 are wrong: DE813871435 and 0000000000000
- 2/125 miss one digit: FR5950773519 and FR2783328587
- 2/125 miss two digits: FR572221034 and FR440117620
- 1/125 misses 3 digits: FR69448572
GB (Great Britain): 9 digits, sometimes written with spaces eg 123 4567 89
- 361/395 comply
- 11 miss GB prefix
- 7 miss one digit
- 1 misses 2 digits
- 3 have 10 digits instead of 9
- Several others miss prefix + some digits
GR/EL (Greece). EU-TID: 9 digits.
- 93/99 with format EL + 9 digits
- 5 miss prefix EL
- 1 is GREL099790528
HR (Croatia): WP: 'HR'+ 11 digits. EU-TID: 11 digits.
- 152/156 comply
- 3 miss HR prefix
- 1 misses a digit: HR1642377552
HU (Hungary): WP: 8 digits (the first 8 digits of the national tax number), e.g. HU12345678. EU-TID: 10 digits.
- 105/109 comply
- 1 misses prefix
- 2 miss 1 digit
- 1 is 10728068244 (too many digits)
IE (Ireland)
- Format: WP: Two standards: 'IE'+7 digits+2 letters, e.g. IE1234567FA; or 'IE'+7 digits+1 letter, optionally followed by 'W' for married women, e.g. IE1234567T or IE1234567TW. EU-TID: the same both for legal entities and individuals.
- 26/72 end with two letters (first format)
- 29/72 end with 1 letter (second format)
- 6 miss prefix
- 6 start with GB
- strange occurrence IE9Y66I020
IT (Italy): WP: 11 digits (the first 7 digits is a sequential number, the following 3 indicate the province of residence, the last digit is a checksum. EU-TID: the same.
- 597/742 comply
- 116/742 miss IT prefix
- 22/472 miss 1 digit, e.g. IT1374910113 (10 digits instead of 11)
- 5 miss 1 digit as well as IT prefix, e.g. 2822840605
- 1 wrong: HU24189514
LT (Lithuania): WP: 9 or 12 digits. EU-TID: the same.
- 47/88 with format LT860632610 (9 digits)
- 39/88 with format LT1106284811 (10 digits)
- 1 is with 11 digits: LT10000580981
LU (Luxembourg): WP: 8 digits. EU-TID: 11 digits.
- 25/26 comply
- 1 missess LU prefix
LV (Latvia): WP: 11 digits. EU-TID: the same.
- 69/81
- 12/81 miss LV prefix
MD (Moldova): 7 digits
- 16/17 with format 0203943 (no MD prefix)
- 1/17 is MD05754540655
ME (Montenegro): 8 or 12 digits
- 10/11 have 8 digits, e.g. 02751372 (without ME prefix)
- 1/11 has 11 digits: 40310007516
MK (North Macedonia): 'MK'+13 digits. E.g. MK4032013544513
- 130/132 with format 4080009501086 (without MK prefix)
- 1/132 is MK403000452960 (12 digits after prefix)
- 1/132 is 40430008038555 (14 digits)
NL (The Netherlands): WP: 'NL'+9 digits+B+2 digits. E.g. NL999999999B01. EU-TID: 9 digits.
- 201/207 comply
- 1/207 don't have the prefix
- 4 completely wrong: 32117527, 801424250RT000, GB115163840 and IT01831490766
NO (Norway): 9 digits, optionally followed by 'MVA' to indicate VAT registration
- 6/43 comply, e.g. NO989795848MVA
- 8/43 don't have prefix NO
- 6/43 have just 9 digits , e.g. 981355210
- 1 has 7 digits
- 1 has 12 digits
- 1 wrong: GB894770371
PL (Poland): WP: 'PL'+10 digits. EU-TID: the same.
- 339/349 comply
- 8 miss prefix
PT (Portugal): WP: 'PT'+9 digits (last digit is a checksum). EU-TID: the same.
- 99/102 comply
- 2/102 miss prefix
RO (Romania): WP: 'RO' (optional) + 10 digits. EU-TID: the same.
- 188/231 with format RO13328043 (8 digits)
- 33/231 have 7 digits, e.g. RO1092690
- 6/231 have 6 digits, e.g. RO943038
- 3/231 have 7 digits and miss prefix
- 1/231 has 9 digits: RO291111546
RS (Serbia): 9 digits
- 92/113 comply
- 9/113 start with RS, e.g RS107350223
- Some have SR and SK prefix: SR109027050, SK2022490800, SR105523323, SR107634440, SR104217641, SR104613706
SE (Sweden): WP: 12 digits. EU-TID: 10 digits.
- 33/37 comply
- 3/37 have 10 digits, e.g. 5561085688
SI (Slovenia): WP:'SI'+8 digits. EU-TID: the same.
- 117/117 comply, e.g. SI20874731
SK (Slovakia): WP: 'SK'+10 digits. EU-TID the same.
- 268/279 comply (with the prefix)
- 5/579 miss the prefix
- 1 has 8 digits: 36699624
UA (Ukraine): 12 digits
- 122/236 comply
- 92/236 miss the prefix
- 11/236 have just 8 digits, e.g. 40298595
- 2 have 9 digits
XK (Control Area Kosovo): 9 digits
- 16/16 comply

4.1.4 VAT Format Summary

Most VATs comply with their official definitions. The majority of numbers start with their corresponding country code.

However, there are VATs which are valid but miss their country prefix. The inconsistencies are of several types:

countryCode different from vatNumber prefix, e.g. DE289523572 appears in CZ VATs; GB appears in VATs of countries like NO, NL, IE, DK, BE
Some VATs miss digits, others have additional digits
Strange inconsistencies like IE9Y66I020 where the format doesn't allow for letters between the country code and digits

4.1.5 VAT Validation Python Script

For easier verification of VAT Numbers (both format and existence in VIES), a python script was developed. It:

Accepts a tabular data file and the column name where the VAT numbers are present. They should start with the country prefix, e.g. DE289523572.
Performs bulk validation
The output is a CSV file with all VAT numbers that have valid format.

It can also accept a single VAT number, validate it, and retrieve all the info from the VIES service.

4.2 RDFize VIES Checks

The above script also queries EU VIES for VAT codes in EU+IE and records it as CSV: The query etl_scripts/VAT-from-VIES.ru RDFizes this data and attaches it to EIC nodes:

tr:viesCheckDate (request date): when the check was made
tr:vatInVies (VAT validity): whether the VAT is found and valid (not expired)
tr:nameInVies (company name): Legal company name as reported by VIES
tr:addressInVies (address): Company address as reported by VIES

4.3 Open Street Map

Open Street Map (OSM) is a global crowd-sourced database of geographic information, including power plants and generators. E.g. the screenshot below shows a coal power station and some of the OSM data fields that describe it.

OSM has three element types:

node - represents a specific point on the earth's surface defined by its latitude and longitude. Each node comprises at least an id number and a pair of coordinates.
way - ordered list of between 2 and 2,000 nodes that define a polyline. Ways are used to represent linear features such as rivers and roads.
relation - multi-purpose data structure that documents a relationship between two or more data elements (nodes, ways, and/or other relations)

The following screenshots show Varna Power Plant with its three generators. Note that the generators are of type node and they are part of the relation corresponding to te power plant.

4.3.1 Planned use of OSM

We'll use it to complement ENTSOE Production Unit data with detailed geo-information.

OSM includes detailed data such as:

Coordinates of selected power plants and generators
Detailed outline maps of power plants and generators
Descriptive data such as power output, fuel, technology
EIC identifiers using tag Key:ref:EU:ENTSOE_EIC
WD identifiers and Wikipedia links
- Optionally, extra info such as images can be obtained through these links

We've tried several different services to provide OSM data:

Geo mapping and visualization services
The Overpass query service, and Overpass Turbo as a wizard for constructing queries.
The Sophox SPARQL endpoint that in addition to OSM querying allows federated use together with WD and TEKG ENTSOE data.
- But it turns out that Sophox has significantly less data: 35k Plants and 600k Generators, versus 48k Plants and 1.5M Generators in Overpass. The reason is that the Sophox semantic repository is not updated often enough from OSM data:

Another reason why we chose Overpass over Sophox is that the SPARQL endpoint did not always work properly. Eg 20k Plants have property osmt:name, but when you try to download all the Plants along with other properties, only the first 2k records had the osmt:name field.

Although the world-wide coverage of power plants in OSM is very good, its number of EIC ids is not so large. Therefore:

We'll use additional databases (see next) to correlate EIC ids to coordinates
Then match these to OSM plants and generators
Then enrich OSM with EIC ids. This enriched dataset will be published openly (as part of OSM), allowing others to also use our work

4.3.2 Contributing to OSM

In order to contribute to OSM:

First you need to create an account
Pass the tutorial in which they explain very good how to edit current tagged location or how to create a new one.
Bulk edit many locations will be done via API endpoint.

Also there are third party editors which we can use as alternatives. These are the most popular:

4.3.3 Tag Info

OSM Tag Info is a series of dashboards allowing to explore the distribution of different tags. We used it to explore the distribution of objects with a EIC id (ref:EU:ENTSOE_EIC) and objects tagged as power:plant. The Timelines display the gradual contribution of this type of objects to the OSM database.

Geography and Chronology of tag power=plant (61.5k); plus tag power=generator (1.84M)

Map with objects with tag power=plant and power=generator Timeline with object tag power=plant and power=generator

Geography and Chronology of key ref:EU:ENTSOE_EIC (3667). Our recent contributions are also visible on this timeline.

Map with objects with EIC id in Europe

4.3.4 Overpass API

Data about the Plants has been downloaded in JSON format from Overpass by using the below query:

/*
This has been generated by the overpass-turbo wizard.
*/
[out:json][timeout:3000];
(
  // query part for: “power=plant
  node["power"="plant"];
  way["power"="plant"];
  relation["power"="plant"];
);
// print results
out body;
>;
out skel qt;

Generators have been downloaded with wget request to http://overpass-api.de/api/interpreter because the Overpass workbench was crashing due to the large size of the data.

First you should create file generator.osm which contains the following query:

/*
This has been generated by the overpass-turbo wizard.
*/
[out:json];
(
  // query part for: “power=generator
  way["power"="generator"];
);
// print results
out body;
>;
out skel qt;

After that run below command:

wget -O generator.json --post-file=generator.osm "http://overpass-api.de/api/interpreter"

You have to repeat above steps for node and relation, save the ouput in different json files and then merge them into one. We have to do this due to large size of generators. Other option is to download the generator for each country because in OSM you can't filter by continent.

Note: There are Plants and Generators which have output electricity with values yes or no instead of number.

We've researched how accurate are the coordinates for the Plants and Generators when we have cascades, where the dam/weir and pipeline can be far removed. We have gone through several examples and we can say that the pinpoints are good.

For example, below is a comparison of the outline and

4.3.5 Comparison of Detailed Coordinates Against Centroid

Also, we have found an exception where we have a hydro plant which covers a large area, but even then we have close point to the facility:

Some other useful Overpass queries:

Search by EIC

[out:json][timeout:300];
(
  way["ref:EU:ENTSOE_EIC"~"32W001100100089X"] ;
);
out body;
>;
out skel qt;

Search for centroid

[out:csv(::type,::id,name,::lat,::lon)][timeout:20];
(rel(2865507);) -> .object;
.object out center;

4.3.6 OSM Validation

The following screenshots show some excellent OSM issue/validation reports

A trend with the number of power plant related issues

4.4 External Power Plant Databases

We also investigate a number of other external databases. We analyse them and evaluate the possibility to import the missing generation and production units into Open Street Map.

4.4.1 FRESNA (PowerPlantMatcher)

github

Data fusion of multiple power plant databases. 7 databases, including ENTSO Transparency, of which 6 are free (Platts WEPP is paid).

Source

Summary by country

csvtk summary -f id -g Country matched_data_red.csv |csvtk sort -k 2:rn
Country,id:count
Germany,1193
Norway,1009
France,993
Spain,761
Italy,575
Switzerland,528
United Kingdom,464
Portugal,288
Finland,212
Austria,201
Sweden,166
Romania,142
Poland,120
Czech Republic,55
Netherlands,55
Greece,50
Bulgaria,49
Slovenia,46
Belgium,45
Ireland,39
Slovakia,36
Denmark,32
Hungary,30
Croatia,27
"Macedonia, Republic of",12
Estonia,11
Lithuania,5
Latvia,4
Luxembourg,2

Summary by project ID

csvtk cut -f projectID matched_data_red.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
   5159 CARMA
   3455 JRC
   2728 OPSD
   1370 GPD
   1324 ENTSOE
   1197 GEO

4.4.2 Global Power Plant Database

WRI GPPD (World Resources Initiative, Global Power Plant Database) a comprehensive, global, open source database of power plants. The database covers approximately 35,000 power plants from 167 countries.

website

Available fields: country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,other_fuel3,commissioning_year,owner,source,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,generation_gwh_2018,generation_gwh_2019,generation_data_source,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017

The latest version is form June 2021. Approximatly 10765 powerplants are in ENTSOE countries

Summary by country

csvtk summary -f gppd_idnr:count -g country global_power_plant_database.csv|csvtk sort -k 2:nr|head -21
country,gppd_idnr:count
USA,9833
CHN,4235
GBR,2751
BRA,2360
FRA,2155
IND,1589
DEU,1309
CAN,1159
ESP,829
RUS,545
JPN,522
AUS,486
PRT,469
CZE,462
ITA,396
CHL,315
NOR,306
MEX,277
VNM,236
ARG,236
THA,196
POL,189

Summary by ENTSOE country, marked with "*" are countries where we are not sure of relevant for ENTSOE

csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv |csvtk summary -f gppd_idnr -g iso3
iso3,gppd_idnr:count
ALB,8
AUT,103
BEL,69
BGR,43
BIH,20
BLR,24  (*)
CHE,168
CYP,3
CZE,462
DEU,1309
DNK,47
ESP,829
EST,17
FIN,185
FRA,2155
GBR,2751
GRC,90
HRV,24
HUN,18
IRL,59
ISL,20
ITA,396
LTU,6
LUX,2
LVA,5
MDA,6   (*)
MKD,12
MNE,3
NLD,71
NOR,306
POL,189
PRT,469
ROU,68
RUS,545 (*)
SRB,12
SVK,30
SVN,8
SWE,168
UKR,64

csvtk summary -f capacity_mw:min,capacity_mw:q1,capacity_mw:q2,capacity_mw:median,capacity_mw:q3,capacity_mw:mean,capacity_mw:max,capacity_mw:stdev,capacity_mw:variance global_power_plant_database.csv
min, q1,  q2,   median,q3,   mean,  max,     stdev, variance
1.00,4.90,16.74,16.74, 75.34,163.36,22500.00,489.64,239743.48

```bash
csvtk summary -f year_of_capacity_data:min,year_of_capacity_data:max -i global_power_plant_database.csv
min,    max
2000.00,2019.00

Breakdown by all fuels

csvtk cut -f primary_fuel,other_fuel1,other_fuel2,other_fuel3 global_power_plant_database.csv|perl -pe "s{,}{\n}g"|sort|uniq -c|sort -rn
  10718 Solar
   7191 Hydro
   5358 Wind
   4512 Gas
   3568 Oil
   2420 Coal
   1506 Biomass
   1182 Waste
    195 Nuclear
    189 Geothermal
    186 Storage
    130 Other
     48 Cogeneration
     35 Petcoke
     10 Wave and Tidal

Breakdown by primary fuel in ENTSOE countries:

csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv | csvtk cut -f primary_fuel | sort|uniq -c|sort -rn
   3921 Solar
   2329 Wind
   2056 Hydro
    779 Gas
    503 Biomass
    443 Waste
    420 Coal
    125 Oil
     74 Nuclear
     46 Geothermal
     31 Storage
     22 Other
      8 Wave and Tidal
      7 Cogeneration

4.4.3 PyPSA-Eur

PyPSA-Eur, the first open model dataset of the European power system at the transmission network level to cover the full ENTSO-E area, is presented.

Complete European data-set for generation and transmission expansion planning studies from freely available data.
Publication of the composition pipeline from downloaded data to an electricity system model ready for load-flow analyses.
An automatically updatable free power plant data-set covering all European countries using a modern record-matching algorithm.
New methodology to compare geo-referenced network datasets against one another.

A power plant database is presented using a sophisticated algorithm that matches records from a wide range of available sources and includes geo-data

5151 records

Fields: id,Name,Fueltype,Technology,Set,Country,Capacity,Duration,YearCommissioned,Retrofit,lat,lon,File,projectID,bus

Example row: 705,Ec łódź,Hard Coal,Steam Turbine,PP,Poland,403.0,0.0,,, 51.74050670000001,19.440413600000007,, "{'CARMA': ['CARMA25606', 'CARMA25608', 'CARMA25607'], 'ENTSOE': ['19W000000000107C', '19W000000000106E'], 'GEO': ['GEO42495']}",4403

Summary by fuel type

csvtk summary -f id -g Fueltype PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
Hydro,3594
OCGT,406
CCGT,257
Hard Coal,197
Bioenergy,188
Oil,132
Waste,129
Other,79
Lignite,72
Nuclear,62
Geothermal,29
"CCGT, Thermal",2
Storage Technologies,1
Pv,1
Caes,1

Summary by country

csvtk summary -f id -g Country PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
France,830
Spain,734
Norway,581
Switzerland,555
Germany,552
Italy,507
United Kingdom,305
Finland,202
Austria,163
Sweden,145
Portugal,126
Poland,56
Netherlands,48
Slovenia,46
Greece,38
Romania,35
Slovakia,32
Belgium,31
Bulgaria,30
Czech Republic,28
Croatia,24
Ireland,23
Denmark,23
Hungary,20
Lithuania,5
Estonia,5
Latvia,4
Luxembourg,2

Summary by source file

csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe "s{\, }{\n}g"$ csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe 's{\, }{\n}g; s{"}{}g'|sort|uniq -c|sort -rn|head -20
   2232
    727 SEDE
    417 BFE
    400 ENTSOE
    230 IWPDCY.csv
    220 GOV
    198 EnergyAuthority
    147 energy_storage_exchange
    144 Department for Business Energy & Industrial Strategy
    130 https://www.verbund.com/de-at/ueber-verbund/kraftwerke/unsere-kraftwerke
     98 Energias Endogenas de Portugal
     96 RTE
     70 Nordpool
     53 Red Eléctrica de España
     43 Terna
     30 SEAS
     24 Vattenfall
     22 GPI
     15 Tennet_Q4
     15 Energinet DK

Summary by source dataset

csvtk cut -f projectID PyPSA-Eur-powerplants.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
   4072 CARMA
   2734 OPSD
   1730 ENTSOE
    883 GEO
    816 GPD
    230 IWPDCY
    147 ESE

4.4.4 JRC-PPDB-OPEN

github

In 2017 the Joint Research Centre developed a Power Plant Database for energy systems modelling (JRC-PPDB) in order to support the unit activities in energy systems modelling and knowledge management.

Size: Production and Generation units: 7118, of which 3961 unique Production Unit EIC

A mapping between identifiers is provided in JRC_OPEN_LINKAGES.csv.

Unique ID counts

csvtk summary -f eic_p:countunique,eic_g:countunique,eprtr_facilityID:countunique,WRI_id:countunique,GEO_id:countunique
,fresna_id:countunique JRC_OPEN_LINKAGES.csv
eic_p, eic_g, eprtr,WRI, GEO, fresna
1967,  3359,  592,  983, 597, 1306

Breakdown of WRI identifiers

csvtk cut -f WRI_id JRC_OPEN_LINKAGES.csv |tr 0-9 d|sort|uniq -c
      4 BRAddddddd
      2 CANddddddd
    213 GBRddddddd
     55 GEODBddddddd
      2 USAddddddd
   2171 WRIddddddd

4.4.5 Summary and EIC overlap

The table summarises the contents of the datasests above, the number of records with EIC identifiers and the number of coordinate pairs in each of the datasets.

Also are counted the EIC codes present in each dataset which we also find in Open Street Map and the other external datasets

SPARQL query for entities with ref:EU:ENTSOE_EIC on OSM.

Data Source	Items with EIC	Distinct EIC ids	Coords Total	OSM Match
OSM TagInfo	3364	-	3364	-
Sophox	3540	3533	3540	-
PyPsa	5061	5049	1975	3541
Open Power System	4277	3944	997	3639
JRC Open Plants	3961	3961	4865	993
JRC Open Generators	6809	6809	4722	59
Wikidata	1267	1267	1120	791

5 Analytics

The following analytics will be provided, using items from data domains EIC, Generation, Load, and Outages.

5.1 Faceted Search for Production and Generation Units

A faceted search will allow searching for production and generation units based on their location and fuel type. The following facets will be included:

5.1.1 Search Parameters

Bidding Zone
Control Area
Country (hierarchical)
- ADM1 administrative subdivision
Fuel Type (hierarchical)
- fossil
  - coal
  - gas...
- renewable
  - solar
  - wind
  - hydro...
- nuclear

5.1.2 Display of Aggregated Values

Aggregated values for number of units and cumulative capacity will be displayed on each element of the search.

The results of the search will be displayed as a list. It is however possible to also combine the search with other modalities and display the result on a map or on a chart

5.2 (Canceled) Actual and Forecasted Load Timeline

A timeline showing all the data from the load domain (actual and projected, 5 individual tables) for a given Control Area, Bidding Zone, Country

Below is a mockup of this chart realized using Google Charts.

It displays data for the month of December for BZA BG
Actual and Day Ahead data aree shown as a line chart.
Week/Month/Year -ahead forecasts, possibly as superimposed upper and lower bound on the timeline.

An interactive version of the chart is available here. N.B it is not available for mobile browsers.

The mockup is limited by Google Charts' features but shows how the data looks when superimposed. Of particular interest are the occasions when the forecast and actual load are mismatched. This is easily visible on the chart and we will emphasise on them in the final version, using the available functionalities of the Vega charting library, (e.g this example)

5.3 Wind and Solar Actual vs Forecasted Generation

A Timeline showing day ahead wind and solar and actual generation wind and solar.

The timeline will be analogous to the previous example.

The forecasted data is provided aggregated by the TSOs.
Actual generation will be calculated based on the Actual Generation data and the fuel type.

5.4 Production Units on a Map

Zoomable and navigable map with the production and generation units.

Example of a map showing power plants by capacity and fuel type:

5.4.1 Data Visible on the Map Markers

current generation
installed capacity
fuel type
existence of a future planned outage

5.4.2 Drill-down

Drill-down data is available when interacting with a marker. This can be:

A pop-up or tooltip will display detailed information about the unit, gathered from ENTSOE data and augmented with OpenStreetMap data
Outages: current or future, planned or forced, active or canceled
Links to external data sources (such as Wikidata, Wikipedia, OSM)
Detailed power plant outline on a map (whenever available from OSM)

5.5 Outages on a Map

Outages displayed on a map: current or future, planned or forced, active or canceled.

Shown per Bidding Zone or Control Area
Filterable by time range

5.6 Balancing Energy Timeline

A timeline showing Prices Of Activated Balancing Energy and ActivatedBalancingEnergy for any given area. The diagram consists of 2 vertically symmetrical zones, one for "up" regulation and one "down" regulation. Each zone superimposes - 4 line charts for the price of each resource type - A stacked histogram for the volume of each activated resource

The following transformations need to be applied

Temporal harmonisation
- All values are converted to hourly or daily
Values aggregation
- Volumes are summed
- Prices are averaged
Currency transformations
- Non EUR currencies are converted to EUR using the daily rate

An example of a similar diagram can be seen in this vega example

5.7 Future accepted offers bubble plot timeline

A timeline chart with circular markers showing future accepted offers from AcceptedAggregatedOffers_17.1.D data item The chart will display the following variables: - temporal dimension (x-axis) - area concerned by the bid (y-axis): this will create a swimlane effect - Volume: size of the marker - direction: shape of the marker (a circular marker with a protrusion directed up or down) - type of asset: color of the marker - a summary of the above variables displayed in the popup

An example of a similar diagram can be seen in this vega-lite example

5.8 Area price/volume bubble plot

A timeline chart combining ActivatedBalancingEnergy_17.1.E and PricesOfActivatedBalancingEnergy_17.1.F

Similar to the chart above the price/volume bubble chart will show price instead of time.

price of the product (x-axis)
area concerned by the bid (y-axis): this will create a swimlane effect
Volume: size of the marker
Direction: shape of the marker (a circular marker with a protrusion directed up or down)
Type of asset: color of the marker
a summary of the above variables displayed in the popup

5.9 Analytics Design

Technologies to use for Analytics:

The web application will be built via React or Angular.
The visualisations will be created via Kibana and embedded in the web application.
Kibana dashboards offer great tooling for visualisations. Among the built-in tools it offers rich custom visualisation options via Vega and Vega-lite
Data will be stored in ElasticSearch for quick access and aggregations and will be accessed directly from Kibana for the visualisations and via the Ontotext Platform for the facets.

5.10 Update Process

The data is updated automatically from the ENTSOE SFTP and REST services on a daily basis

6 Semantic Models

The semantic models is in the form of turtle examples and diagrams of all semantic data areas. They are shown in previous sections:

Semantic model of data quality:
Semantic models of ENTSOE data:
- XML Items and XML Schemas contains models and transformation specifications of the data items ingested from XML files.
- CSV Files contains the models transformation specifications of the transactional data items ingested from CSV files.

6.1 Basic Semantic Data

"Manual" RDFization

Eg1: doc SFTP Appendix B: Area Naming Convention has the zone codes used on ENTSO portal.
- Eg EIC 10Y1001A1001A869 is BZN|UA-DobTPP (bidding zone Ukraine-Dobrotvirska TPP)
- BZN is a prefix that is displayed for the particular time series, not an attribute of that EIC
- But the EIC file has notation UA-DOB_TPP (different spelling) and functions "Control Area, Market Balance Area, Scheduling Area" but not "Bidding Zone"
Eg2: the "knowledge base" (kb.ttl) describes the Data Items, more details are needed. See section above

6.2 TEKG Ontology

The TEKG ontology is available in tr.ttl and covers the full scope of the semantic models.

The ontology is also available in the Annex of this document.

6.3 TEKG SOML (GraphQL) Schema

7 System Architecture

We have revised and elaborated the conceptual architecture compared to the proposal. It presents the technologies and services that TEKG will use and implement to achieve its objectives:

ETL Application: responsible for entire process of download, transformation and import of the transparency data.
- Data will be fetched from the ENTSOE transparency platform on a scheduled basis. We'll use XML for master data (EIC, code lists and Installed Capacity), and CSV for transactional (time series) data.
- The RDFization process will be done using GraphDB OntoRefine tool and its Mapping UI to transform the loaded data to RDF.
Validation: provides data validations using a combination of standard SHACL and advanced SHACL-SPARQL rules and integrating external data validations
Semantic Storage: RDF is loaded or updated to a semantic repository in GraphDB. Modest inference is implemented (GraphDB rules and/or SPARQL Updates)
ElasticSearch: RDF data is automatically indexed to ElasticSearch for full-text search, faceting, and analytics.
Elastic Index Monitoring: Kibana is used on top of Elastic to provide easy index management and monitoring.
TEKG Application: provides UI for visualizations and validation reporting on the transparency data that has been ingested and analysed in the different components.
Monitoring: InfluxDB and Grafana are used to monitor the overall infrastructure and performance of the system.

All components will be packaged and deployed in an enterprise-ready fashion using Docker, Kubernetes, and Helm charts.

The programing languages and frameworks used for development of the different components, services and tests are:

Java: used for development of the data processing components (data fetchers, ETL processing, RDF data validation? and import)
- Spring Boot: allows quick building of services. It provides a lot of flexibility and functionalities out of the box
JavaScript: used for the web application and the acceptance tests of the components/services
- Angular: development platform, which includes various tools, libraries and frameworks for building and scaling web applications
- Cypress: framework for end-to-end testing
- Cucumber.js: test framework for behavior-driven development
Python: used for scripting several small functionalities. For example VAT numbers validation
- AIOHTTP: Asynchronous HTTP client/server framework
- pandas: data analysis and manipulation tool

7.1 Data Fetching

Source data is obtained from ENTSOE transparency platform on a scheduled basis (frequency to be discussed) via:

REST API: master data in XML
SFTP server: transactional (time series) data in tab delimited flat files saved with the *.csv file extension

7.2 Semantic Conversion Service

The service will convert the ingested XMLs and CSVs and produce RDF data. The initial assumption was that we are going work only with the XMLs from the REST API and the main tool that we proposed was XSPARQL. After careful exploration of the data and its sources, we discovered additional data in CSV format that we need.

To achieve flexible and generic service that can handle the required data, we've considered using additional tools like OntoRefine and TARQL. In order to measure the performance of the different tools and to pick the right one for the service, we've done some experiments. The results are presented in the Conversion Performance Comparison section.

7.2.1 XSPARQL

XSPARQL is a language for transforming data between XML and RDF.

It is built by combining the strengths of two query languages: XQuery for XML, and SPARQL for RDF.

XSPARQL Github contains the implementation of the tools that we are using.

7.2.1.1 XSPARQL Example

Data

<?xml version="1.0" encoding="UTF-8"?>
<Configuration_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0">
    <mRID>8be8471a92f345ce8129102d965c19d7</mRID>
    <type>A95</type>
    <process.processType>A39</process.processType>
    <sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
    <sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
    <receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
    <receiver_MarketParticipant.marketRole.type>A32</receiver_MarketParticipant.marketRole.type>
    <createdDateTime>2022-01-17T12:50:49Z</createdDateTime>
    <TimeSeries>
        <mRID>87546cb0270a4ea8</mRID>
        <businessType>B11</businessType>
        <implementation_DateAndOrTime.date>2021-10-01</implementation_DateAndOrTime.date>
        <biddingZone_Domain.mRID codingScheme="A01">10YUA-WEPS-----0</biddingZone_Domain.mRID>
        <registeredResource.mRID codingScheme="A01">62W875768058757F</registeredResource.mRID>
        <registeredResource.name>KALUSHCHPP</registeredResource.name>
        <registeredResource.location.name>Kalush</registeredResource.location.name>
        <ControlArea_Domain>
            <mRID codingScheme="A01">10YUA-WEPS-----0</mRID>
        </ControlArea_Domain>
        <Provider_MarketParticipant>
            <mRID codingScheme="A01">10X1001C--00001X</mRID>
        </Provider_MarketParticipant>
        <MktPSRType>
            <psrType>B05</psrType>
            <production_PowerSystemResources.highVoltageLimit unit="KVT">110</production_PowerSystemResources.highVoltageLimit>
            <nominalIP_PowerSystemResources.nominalP unit="MAW">200</nominalIP_PowerSystemResources.nominalP>
            <GeneratingUnit_PowerSystemResources>
                <mRID codingScheme="A01">62W2081564720502</mRID>
                <name>KALUSHCHPP-V</name>
                <nominalP unit="MAW">200</nominalP>
                <generatingUnit_PSRType.psrType>B05</generatingUnit_PSRType.psrType>
                <generatingUnit_Location.name>Kalush</generatingUnit_Location.name>
            </GeneratingUnit_PowerSystemResources>
        </MktPSRType>
    </TimeSeries>
</Configuration_MarketDocument>

Script

prefix ns:  <urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0>
prefix tr:  <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

declare variable $input as xs:string external;
declare option saxon:output "method=text";

for $data in doc($input)/ns:Configuration_MarketDocument/ns:TimeSeries
let $BASE := "https://transparency.ontotext.com/resource/"
let $TYPE := fn:concat($BASE,"type/")
let $UNIT := fn:concat($TYPE,"UnitSymbol/") # TODO or "UnitOfMeasure/" ?
let $EIC  := fn:concat($BASE,"eic/")
let $url  := fn:concat($EIC,$data/ns:registeredResource.mRID/text())

construct {
  <{$url}>
    tr:dateImplemented {$data/ns:implementation_DateAndOrTime.date/text()}^^xsd:date;
    tr:notationAlt {$data/ns:registeredResource.name/text()};
    tr:location {$data/ns:registeredResource.location.name/text()};
    tr:assetType <{fn:concat($TYPE,"Asset/",$data/ns:MktPSRType/ns:psrType/text())}>.
    {
      for $x in $data/ns:biddingZone_Domain.mRID/text()                                  # 0-1
        construct {<{$url}> tr:biddingZone <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:ControlArea_Domain/ns:mRID/text()                               # 1-many
        construct {<{$url}> tr:controlArea <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:Provider_MarketParticipant/ns:mRID/text()                       # 1-many
        construct {<{$url}> tr:providerParticipant <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:MktPSRType/ns:production_PowerSystemResources.highVoltageLimit  # 0-1
        construct {
          <{$url}> tr:highVoltageLimit {$x/text()}^^xsd:float
        },
      for $x in $data/ns:MktPSRType/ns:nominalIP_PowerSystemResources.nominalP           # 0-1
        construct {
          <{$url}> tr:installedOutput {$x/text()}^^xsd:float
        },
      for $gen in $data/ns:MktPSRType/ns:GeneratingUnit_PowerSystemResources             # 0-many
        let $url1 := fn:concat($EIC,$gen/ns:mRID/text())
        construct {
          <{$url}> tr:generationUnit <{$url1}>.
          <{$url1}>
            tr:notationAlt {$gen/ns:name/text()};
            tr:assetType <{fn:concat($TYPE,"Asset/",$gen/ns:generatingUnit_PSRType.psrType/text())}>;
            tr:location {$gen/ns:generatingUnit_Location.name/text()};
            tr:installedOutput {$gen/ns:nominalP/text()}^^xsd:float
        }
    }
}

Result

@base <https://transparency.ontotext.com/resource/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tr: <https://transparency.ontotext.com/resource/tr/> .

<eic/62W875768058757F> tr:dateImplemented  "2021-10-01"^^xsd:date .
<eic/62W875768058757F> tr:notationAlt  "KALUSHCHPP" .
<eic/62W875768058757F> tr:location  "Kalush" .
<eic/62W875768058757F> tr:assetType  <type/Asset/B05> .
<eic/62W875768058757F> tr:biddingZone  <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:controlArea  <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:providerParticipant  <eic/10X1001C--00001X> .
<eic/62W875768058757F> tr:highVoltageLimit  "110"^^xsd:float .
<eic/62W875768058757F> tr:installedOutput  "200"^^xsd:float .
<eic/62W875768058757F> tr:generationUnit  <eic/62W2081564720502> .

<eic/62W2081564720502> tr:notationAlt  "KALUSHCHPP-V" .
<eic/62W2081564720502> tr:assetType  <type/Asset/B05> .
<eic/62W2081564720502> tr:location  "Kalush" .
<eic/62W2081564720502> tr:installedOutput  "200"^^xsd:float .

7.2.1.2 XSPARQL Service Implementation

Ontotext has packaged XSPARQL as a web service (WAR file). The benefit of using a web service is that it saves Java startup time, which is needed for every invocation of the command-line tool.

The WAR with XSPARQL service can be loaded directly to the embedded web server (Tomcat) that Spring Boot uses to boot the main application.
The implementation requires a factory class, which builds and provides a context for the service endpoint.
The factory also plays the role for interceptor, when the server starts, which triggers the WAR file provisioning.
The service is invoked by providing a dataset in XML format and the transformation query.

As a further optimization, we considered precompiling the various conversion and putting them into a Registry. This would save the transpilation time (from XSPARQL to XQuery) and compilation time (from XQuery to executable transformation).

7.2.1.3 XSPARQL Issues

Uses log4j, which needs to be updated to the latest version due to security vulnerabilities.
Maintenance of the code will be hard as it was written some years ago and the community looks inactive.
- Hard to improve the code or to extend its functionality.
- Hard to fix issues related to the transformations.
Lack of batch processing/transformation
Works only with XML file types.

7.2.2 OntoRefine

OntoRefine is a user-friendly tool for cleaning data and converting it to RDF.

It's an adaptation of the popular OpenRefine tool developed by Ontotext and integrated in GraphDB Workbench.
It allows visual development of data conversions with a Mapping UI, therefore is suitable for non-programmers.
It can process various file formats, including CSV, Google sheets, XML, JSON

The fact that the OntoRefine handles various file formats, including XML, CSV, JSON, etc., makes it a perfect candidate for the current project. It is the preferred option because it is developed and maintained by Ontotext, and shows best overall performance.

Issues:

There are bugs present in the Mapping UI that prevent defining blank nodes (not an issue for this project).
To process a file, the tool creates a project (workspace), which should be cleared afterwards. It adds time and complexity to the process.

Note: the rest of this section describes Reconciliation, which is not used in the current project.

Another big advantage is matching of tabular data to KGs via different reconciliation services that OntoRefine supports. Reconciliation services provide semantic matching functionality.

There are various free reconciliation services that can be used by OntoRefine. The Reconciliation Testbench provides a list of some of these services. We host and support three such services based on a subset of Wikidata:

Ontotext Wikidata People Reconciliation Service: https://reconcile.ontotext.com/people
Ontotext Wikidata Organization Reconciliation Service: https://reconcile.ontotext.com/organizations
Ontotext Wikidata Location Reconciliation Service: https://reconcile.ontotext.com/locations

7.2.2.1 OntoRefine Example

The OntoRefine Mapping UI allows visual creation of semantic transformations. Here's a transformation for the same XML data as in the XSPARQL example:

Using the same data as in the XSPARQL example, it produces a semantically equivalent result.

A conversion script can be exported from the Mapping UI (as JSON) and used as a batch process (see next section). Additionally, the script contains all operations performed over the dataset, including data cleaning and the reconciliation operations.

7.2.2.2 OntoRefine Service Implementation

We developed a conversion service using OntoRefine: a public library called ontorefine-client.

It exposes a large portion of OntoRefine functionalities through an intuitive API, which we use to build and integrate the transformation process.
The process is exposed through a REST endpoint.
The user invokes the service by providing a dataset and a previously saved transformation script (created in OntoRefine Mapping UI).

7.2.3 TARQL

TARQL is a highly performant tool for converting very large CSV/TSV files.

Tarql GitHub Project contains the source code of the tool.
Conversions are written in the form of SPARQL CONSTRUCT queries that iterate over every table row.
One can do limited data cleaning; splitting cell values is also supported.
It is developer-oriented since it requires SPARQL knowledge.

Issues:

It is a third party tool, which could be a problem, if there are issues that should be fixed quickly.
Lack of batch processing.
No out of the box web service. We have to create one from scratch.
Works only with CSV or TSV files.

If the project used XSPARQL for conversion of XML files, we could use TARQL for conversion of CSV files.

7.2.3.1 TARQL Example

Data (CSV example from CrunchBase)

permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
lifelock,LifeLock,,web,Tempe,AZ,1-May-07,6850000,USD,b

Mapping

PREFIX ex: <http://ex.org/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

CONSTRUCT {
  ?URI a ex:Organization;
    ex:permalink ?permalink;
    ex:name ?company;
    ex:employees ?numEmployees;
    ex:category ?category;
    ex:city ?city;
    ex:state ?state;
    ex:fundingDate ?fundedDate;
    ex:raisedAmt ?amount;
    ex:raisedCurrency ?raisedCurrency;
    ex:round ?round;
}
WHERE {
  BIND (URI(CONCAT('http://ex.org/companies/', ?permalink)) AS ?URI)
  BIND (xsd:integer(?numEmps) AS ?numEmployees)
  BIND (xsd:decimal(?raisedAmt) AS ?amount)
}

Result

<http://ex.org/companies/lifelock>
  a ex:Organization ;
  ex:permalink "lifelock" ;
  ex:name "LifeLock" ;
  ex:category "web" ;
  ex:city "Tempe" ;
  ex:state "AZ" ;
  ex:fundingDate "1-May-07" ;
  ex:raisedAmt "6850000"^^xsd:decimal ;
  ex:raisedCurrency "USD" ;
  ex:round <http://example.com/b> .

7.2.3.2 TARQL Service Implementation

TARQL does not have a web service implementation for so we would need to implement one.

The tool can be wrapped in a process that is called when the REST endpoint is invoked.
The process will trigger the execution of a standard TARQL command and provide the required arguments to it.
Same as other proposed solutions, the service would be implemented using Spring Boot so that it can be deployed and distributed easily.

7.2.4 Conversion Performance Comparison

We did some performance testing to ensure that the most suitable tool can be selected. We used prototypes of conversion services to measure their performance.

7.2.4.1 OntoRefine vs XSPARQL

For the comparison we use XML datasets in data/xml/Production_Unit (documents of type Configuration_MarketDocument).

Because the number and size of data files is not that large yet, we have multiplied them in order to measure at scale and reproduce the load of an actual production environment.

The first two columns show count and size of files (MB), the last two columns show time to process by 2 of the tools (seconds).

count	MB	XSPARQL	OntoRefine
46	4.1	2	1.6
460	12.6	13.6	10.6
4600	353.9	181.9	156.5
7000	620.2	294.6	264.1

For comparison purposes we made the services work in an identical way and process the datasets one by one. There are a few optimizations possible for each service, but they are not worth doing at the moment.

7.2.4.2 OntoRefine vs TARQL

We compared the performance of TARQL and OntoRefine on a 240 MB CSV file, producing the same RDF data.

OntoRefine processes the file in 13-20 seconds
TARQL is 2-3 times slower

7.2.5 Semantic Conversion Scripts

The semantic conversion scripts are in etl_scripts/OR. They are specialized SPARQL CONSTRUCT queries, that run in a OntoRefine instance and map tabular data to a predefined graph pattern.

7.3 Semantic Data Pipeline

The data pipeline is glue code to implement Fetch> Conversion> GraphDB> (Validation, Elastic indexing).

It is a standalone Spring Boot application, which have the following components:

FTP Resource Downloaders
HTTP Resource Downloaders
Conversion Services
Data Import Services

Simple layout of the application components.

Interaction flow between the services.

TODO M4: Add Import and Validation flow

FTP Resource Downloaders

The service is responsible for retrieval of specified datasets from the SFTP. It servers as data provider for the automatic Conversion Service by retrieving the required datasets. The retrieval is done by process, which listens for changes in the FTP, more specifically upload of new dataset. When such event is detected, the service will trigger and make a copy of the file in a configured dataset store. It is possible to filter the trigger event by providing a matching pattern for the file names.

As addition to the automatic mode, the service supports manual invocation. It is convenient for testing or when another application/system want to plug into the processing pipeline.

HTTP Resource Downloaders

Similar to FTP Resource Downloader, this service provides datasets to the Conversion Service. However, unlike the other downloader, this one is not reactive. The process of retrieving the required datasets is by performing HTTP requests to specific REST API. The requests are performed at configurable fixed rate. The datasets that should be retrieved are specified by the request parameters, which are provided externally by configurations. This design allows flexibility and easy modifications, if such are necessary. It also provides the ability to change the scale of the scope of the data that the system is processing.

As the other one, this service exposes its functionality via REST endpoint, which can be invoked manually.

Conversion Service

The purpose of the Conversion Service is to transform the downloaded datasets to RDF data, which can be imported in GraphDB. Like the downloaders, this service has two aspects:

manual: allows invocation of the transformation on demand by calling a REST endpoint and providing specific parameters along with the dataset that should be transformed.
automatic: the main functionally of the service. It is trigger, when a new file is added to the dataset storage. To the dataset is applied transformation script, which contains mapping to RDF data format.

The automatic transformation process begins, when the application is started. If there are unprocessed files in the datasets store, it is picked and the transformations are applied. The transformations are predefined scripts in JSON format. When the conversion is successful, the result RDF data is stored in a file, which later is imported in GraphDB.

The transformation itself is done by using OntoRefine tool. It functionalities are invoked by the OntoRefine Service, which contains the required steps to process a single dataset.

Data Import Service

This service does the job of importing the RDF data in GraphDB and trigger the validation. Following the design of the other components, the import service will have manual and automatic aspects. Similar to the automatic conversion, the trigger of the import service is existence of a unprocessed RDF data file. If the import is successful, the file will be marked as imported and removed from the directory.

7.4 TEKG Dashboard Application

Transparency EKG (TEKG) dashboard application is a single page web application with analytical user interface that provides visualizations and validation reporting upon the transparency data that has been ingested, analyzed and validated in GraphDB, see DQA Dashboard.

Transparency EKG uses GraphDB's Elasticsearch connector to synchronize all relevant data in multiple Elasticsearch indices. This enables the dashboard to perform full text and faceted searches in order to construct visualizations as well as to limit down data requests to a single data source.

Refer to Elasticsearch GraphDB connector documentation for more information.

7.4.1 Design

TEKG Dashboard application consists of two parts: the static HTML and CSS files and a server part that serves these static files and acts as an API proxy.

The server part acts as a "backend for front end" which proxies API requests from the web and constructs queries that are then sent to Elasticsearch. This server is implemented with NodeJS and Express framework. Checkout NodeJS and Express documentations for more information.

The web part is implemented with the Angular platform and Typescript. This is a modern choice of framework stack that helps designing and building single page applications (SPA). The source code is organized in web components grouped in Angular modules that are type safe and reusable throughout the application. The Angular platform comes with its own CLI tool which helps generate various web components and modules very easily. Checkout Angular documentation for more information.

The web part will proxy all of its requests down to the server part in order to avoid direct communications from the client to the Elasticsearch server. Queries will be constructed in the server part in order to shift away the complexity from the web.

7.4.1.1 Visualization of Analytics

For analytics visualizations, the TEKG dashboard application makes use of VEGA. This is a visualization grammar with vast options for chart types, transformations and interactions. TEKG Dashboard application will fetch data from ES for each analytic, transform it and pass it to VEGA for rendering. The design of the analytics visualizations is as follows:

VEGA wrapper component with default settings for rendering and responsive layout.
A set of settings that specifies concrete loading options, transformations and visualizations for each analytics. This allows to add analytics step by step.
Analytics service that uses the set of settings to load and transform the data.
A web page that injects the analytics service and uses it to request data and render the different analytics with VEGA.

The web page will have options for filtering the analytics data which will result in re-fetching it from ES.

7.4.1.2 Visualization of Validation Reports

The TEKG dashboard application will allow the user to browse and analyze validation reports that have been performed by the Semantic Data Validation Service. The validation visualizations will consist of:

Validation table component that renders the content of validation reports. This will be a paginated component with standard options for sorting and filtering.
Validation service that performs API requests to the server with different options for paging, sorting, filtering etc.
A web page embedding the table with different filters and facets to narrow down the fetched validation reports.

7.4.1.3 Visualization of Map Data

For visualizing map data, the TEKG dashboard application will use Leaflet, a library for making interactive maps with OpenStreetMap data. It provides an easy to use API with a lot of options for configurations and extensions.

The dashboard application will have a wrapper component of Leaflet that can be embedded throughout the analytics to provide more context and insight of the data.

7.4.2 Layout

An example layout for the TEKG dashboard application

7.4.3 Packaging

TEKG Dashboard is packaged as a Docker image to achieve portability, ease of deployment and scalability. It can be deployed as a simple Docker container (with Docker compose for example) or as a Kubernetes deployment.

7.5 Monitoring

We use Grafana to monitor the overall infrastructure and performance of the system and its services, primarily GraphDB and Ontotext Platform (Semantic Objects service).

Monitoring data is collected with various Telegraf plugins and then stored in the InfluxDB time series database.

8 Energy Knowledge Graph

V1 of the Energy Knowledge Graph is currently available as RDF graph and SPARQL endpoint.

The GraphDB Workbench URL is https://tekg.ontotext.com/graphdb
The URL of the SPARQL endpoint is https://tekg.ontotext.com/graphdb/repositories/tekg

The Graph consists of 116 million triples and covers the selected data items for a period of three full months as well as the data from the current month (2022-01 - 2022-04).

The following table summarizes the number of observations (tr:DataObservation) per Data Item:

dataItem	n_observetions
generation/ActualGenerationOutputPerGenerationUnit	3969000
generation/AggregatedGenerationPerType	2812002
balancing/AggregatedVolumes	2003930
balancing/AggregatedVolumes_HOURLY	829965
balancing/PricesOfActivatedBalancingEnergy	708636
balancing/PricesOfActivatedBalancingEnergy_HOURLY	347570
generation/CurrentGenerationForecastForWindAndSolar	283136
outages/UnavailabilityOfProductionOrGenerationUnits	79404
balancing/AggregatedVolumes_DAILY	43351
balancing/PricesOfActivatedBalancingEnergy_DAILY	17417
generation/InstalledGenerationCapacityComputed	41

A number of sample queries are available on the GraphDB Workbench home page

9 Annex

9.1 Full TEKG Ontology

Bellow is the ontology in Turtle format.

# @prefix trr:  <https://transparency.ontotext.com/resource/> .    # OMIT since this takes over all other prefixes
@prefix tr:   <https://transparency.ontotext.com/resource/tr/> .   # Ontology
@prefix eic:  <https://transparency.ontotext.com/resource/eic/> .  # EnergyResource with EIC
@prefix type: <https://transparency.ontotext.com/resource/type/> . # codelists

@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix dct:    <http://purl.org/dc/terms/> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sh:     <http://www.w3.org/ns/shacl#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .
@prefix vann:   <http://purl.org/vocab/vann/> .

tr: a owl:Ontology;
  rdfs:label "Transparency Energy ontology";
  rdfs:comment "Ontology for data from the ENTSOE Electricity Market Transparency portal";
  rdfs:seeAlso <https://transparency.entsoe.eu/>, <https://transparency.ontotext.com/>;
  dct:creator <https://ontotext.com/>, <mailto:vladimir.alexiev@ontotext.com>;
  dct:created "2021-06-02"^^xsd:date;
  dct:modified "2022-02-21"^^xsd:date;
  owl:versionInfo "1.0";
  vann:preferredNamespaceUri "https://transparency.ontotext.com/resource/tr/";
  vann:preferredNamespacePrefix "tr".

#################### classes

tr:Area a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Area";
  rdfs:comment "Area, as referenced in CSV files, described in REST API documentation and out of which resources are served by the REST API".

tr:CodeList a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Code List";
  rdfs:comment "A code list (eg Message type, UnitOfMeasure, Asset type)".

tr:CodeValue a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Code Value";
  rdfs:comment "Value in a code list".

tr:Country a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Country";
  rdfs:comment "Country (member state)".

tr:DataDomain a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Domain";
  rdfs:comment "Major area of transparency data".

tr:DataItem a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Item";
  rdfs:comment "Data item (time series) of transparency data in a particular domain".

tr:DataObservation a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Observation";
  rdfs:comment "Data Observation, having dataItem, date, dateUpdated and observation-specific fields".

tr:EicTypeValid a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC Type Valid";
  rdfs:comment "EIC types that are valid or invalid with the listed function".

tr:EnergyResource a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Energy Resource";
  rdfs:comment "Energy resource or participant identified with EIC and having a function".

tr:FunctionValid a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Function Valid";
  rdfs:comment "A valid function and a corresponding invalid (misspelt) function".

tr:GenerationUnit a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Generation Unit";
  rdfs:comment "Generation Unit (generator) as described at the lower level of Installed Capacity of Production and Generation Units".

tr:Outage a rdfs:Class;
  rdfs:subClassOf tr:DataObservation;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Outage";
  rdfs:comment "Outage (unavailability) of Production or Generation Unit".

tr:ProductionUnit a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Production Unit";
  rdfs:comment "Production Unit (power plant) as described at the higher level of Installed Capacity of Production and Generation Units".

tr:ValidationCount a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Validation Count";
  rdfs:comment "Validation summary result, characterized by rule (shape), area and count".

#################### properties

tr:acerCode a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ACER code";
  rdfs:comment "Agency for Cooperation of Energy Regulators code of an energy participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:actualConsumption a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "actual consumption";
  rdfs:comment "Actual consumption of Production Unit due to technological consumption (MW)"; # or Area?
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:actualOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "actual output";
  rdfs:comment "Actual power output of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:appliesTo a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "applies to";
  rdfs:comment "Whether this validation rule applies to 'Country' or 'Area' (used for sorting them into tables)";
  rdfs:domain sh:Shape;
  rdfs:range xsd:string.


tr:assetType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "asset type";
  rdfs:comment "Asset type of a Power System Resource";
  rdfs:domain tr:EnergyResource, tr:DataObservation;
  rdfs:range tr:CodeValue;
  tr:xpath "MktPSRType/psrType".

tr:availableOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Available power output of Production or Generation Unit, reduced due to Outage (MW)";
  rdfs:domain tr:Outage;
  rdfs:range xsd:float.

tr:biddingZone a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "bidding zone";
  rdfs:comment "Bidding Zone of this Energy Resource or Outage";
  schema:domainIncludes tr:EnergyResource, tr:Outage;
  rdfs:range tr:Area;
  tr:xpath "biddingZone_Domain.mRID".

tr:codeList a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "code list";
  rdfs:comment "List this code value is part of";
  rdfs:domain tr:CodeValue;
  rdfs:range tr:CodeList.

tr:controlArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "control area";
  rdfs:comment "Control Area(s) of this Energy Resource or Outage";
  schema:domainIncludes tr:EnergyResource, tr:Outage;
  rdfs:range tr:Area;
  tr:xpath "ControlArea_Domain/mRID".

tr:count a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "count";
  rdfs:comment "Count of violations";
  rdfs:domain tr:ValidationCount;
  rdfs:range xsd:integer.

tr:countryCode  a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "country code";
  rdfs:comment "Country code of an energy resource or participant";
  schema:domainIncludes tr:EnergyResource, sh:ValidationResult, tr:ValidationCount;
  rdfs:range xsd:string;
  tr:xpath "eICCode_MarketParticipant.streetAddress/townDetail/country".

tr:currency a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "currency";
  rdfs:comment "Currency code corresponding to the 'price' field";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:string.

tr:dataDomain a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "data domain";
  rdfs:comment "Domain of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range  tr:DataDomain.

tr:dataItem a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "data item";
  rdfs:comment "Data item(s) that this observation (or validation rule) is (are) about";
  schema:domainIncludes tr:DataObservation, sh:Shape;
  rdfs:range tr:DataItem.

tr:date a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date";
  rdfs:domain tr:DataObservation;
  rdfs:comment "Date of an observation";
  rdfs:range xsd:dateTime.

tr:dateEnd a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date end";
  rdfs:domain tr:Outage;
  rdfs:comment "Ending date of an outage";
  rdfs:range xsd:dateTime.

tr:dateImplemented a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date implemented";
  rdfs:comment "Date when an Energy Resource was implemented";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:date;
  tr:xpath "implementation_DateAndOrTime.date".

tr:dateStart a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date start";
  rdfs:domain tr:Outage;
  rdfs:comment "Starting date of an outage";
  rdfs:range xsd:dateTime.

tr:dateUpdated a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date updated";
  schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource, tr:DataObservation, tr:Outage;
  rdfs:comment "Date when a record was last updated";
  rdfs:range xsd:dateTime;
  tr:xpath "lastRequest_DateAndOrTime.date".

tr:description a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "description";
  rdfs:comment "A description of something";
  schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string.

tr:direction a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "direction";
  rdfs:comment "Direction of energy flow of this balancing volume or price (Up, Down, Up and Down)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:displayArea a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "display area";
  rdfs:comment "Area notation or country code where this validation result or count should be grouped, including the special values 'other' and 'none'";
  schema:domainIncludes sh:ValidationResult, tr:ValidationCount;
  rdfs:range xsd:string.

tr:duration a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "duration";
  rdfs:comment "Duration (time quant) of this data observation";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:duration.

tr:eic a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC";
  rdfs:comment "Energy Identification Code of an energy resource or participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:eicType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type";
  rdfs:comment "Type of Energy resource or participant derived from the third char of its EIC. It's a single-value field and is a 'supertype' of 'function'";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:CodeValue.

tr:eicTypeInvalid a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type invalid";
  rdfs:comment "EIC type that is invalid with the listed function";
  rdfs:domain tr:EicTypeValid;
  rdfs:range tr:CodeValue.

tr:eicTypeValid a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type valid";
  rdfs:comment "EIC type that is valid with the listed function";
  rdfs:domain tr:EicTypeValid;
  rdfs:range tr:CodeValue.

tr:ekgCheckDataQuality a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "TEKG checks data quality";
  rdfs:comment "Whether the TEKG project checks the quality of data of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:ekgImplementAnalytics a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "TEKG implements analytics";
  rdfs:comment "Whether the TEKG project implements analytics over this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:energyResource a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "energy resource";
  rdfs:comment "Energy resource (Production or Generation Unit) reported in this outage";
  rdfs:domain tr:Outage;
  rdfs:range tr:EnergyResource.

tr:fields a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "fields";
  rdfs:comment "Fields that this validation rule is about (listed as a single string)";
  rdfs:range sh:Shape;
  rdfs:range xsd:string.

tr:fileName a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "file name";
  rdfs:comment "Root file name of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:fileType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "file type";
  rdfs:comment "File type of this data item as consumed by the TEKG project (XML or CSV)";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:function a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function";
  rdfs:comment "Function(s) of an energy resource or participant, eg Generation Unit, Production Unit, Generation, Load, Connection Point, Internal Line, Tieline, Transformer, Substation, Trade Responsible Party, Balance Responsible Party, Production Responsible party, Consumption Responsible Party...";
  rdfs:domain tr:EnergyResource, tr:EicTypeValid, tr:FunctionValid;
  rdfs:range xsd:string.

tr:functionInvalid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function invalid";
  rdfs:comment "Function that is invalid (misspelled)";
  rdfs:domain tr:FunctionValid;
  rdfs:range xsd:string.

tr:functionValid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function valid";
  rdfs:comment "Function that is valid, or allowed for this EIC type";
  rdfs:domain tr:CodeValue, tr:FunctionValid;
  rdfs:range xsd:string.

tr:generationUnit a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "generation unit";
  rdfs:comment "Generation Units of this Production Unit (semi-inverse of parentResource)";
  rdfs:domain tr:ProductionUnit;
  rdfs:range tr:GenerationUnit.

tr:hasProdUnits a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "has Production Units";
 rdfs:comment "Whether the area has Production/Generation Units returned from the REST API";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:highVoltageLimit a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "high voltage limit";
  rdfs:comment "High voltage limit of Production Unit";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:float;
  tr:xpath "production_PowerSystemResources.highVoltageLimit".

tr:inAPI a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in API";
 rdfs:comment "Whether the area is returned by the REST API";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inDoc a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in Documentation";
 rdfs:comment "Whether the area is decsribed in the REST API documentation";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inEIC a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in EIC";
 rdfs:comment "Whether the area is described in the EIC file (we've added the missing ones in eic-extra.ttl)";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inVies a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "in VIES";
  rdfs:comment """Whether a Country or a particular Party's VAT Number is present in the EU VAT Information Exchange System (VIES).
No value is recorded for Party if its country is not covered by VIES""";
  rdfs:domain tr:EnergyResource, tr:Country;
  rdfs:range xsd:boolean.

tr:installedOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "installed output";
  rdfs:comment "Installed nominal power output of Production or Generation Unit (MW)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:float;
  tr:xpath "nominalP".

tr:isFreeReuse a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "is for free reuse";
  rdfs:comment "Whether the data item can be reused freely";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:isVatValid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "is VAT valid";
  rdfs:comment "Whether the Value Added Tax number is syntactically valid according to per-country patterns";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:boolean.

tr:iso2 a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ISO alpha2";
  rdfs:comment "2-letter alphabetical ISO code of this country, used for linking to external datasets";
  rdfs:domain tr:Country;
  rdfs:range xsd:string.

tr:iso3 a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ISO alpha3";
  rdfs:comment "3-letter alphabetical ISO code of this country, used for linking to external datasets";
  rdfs:domain tr:Country;
  rdfs:range xsd:string.

tr:link a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link";
  rdfs:comment "Link to page with information or direct download page (outside of portal)";
  rdfs:domain tr:DataItem.

tr:linkDescription a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link to description";
  rdfs:comment "Link to detailed Knowledge Base description on portal"; 
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:linkPortal a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link to portal";
  rdfs:comment "Link to data serving page on portal";
  rdfs:domain tr:DataItem.

tr:location  a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "location";
  rdfs:comment "Location of an energy resource (Production Unit)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.location.name", "generatingUnit_Location.name".

tr:marketBalanceArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "market balance area";
  rdfs:comment "Market Balance Area of this balancing volume or price";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:Area.

tr:marketProduct a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "market product";
  rdfs:comment "Type of market product of this balancing volume or price (Standard, Specific, Local)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:mrid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "message id";
  rdfs:comment "Unique message id (mRID), used in the URL";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:name a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "name";
  rdfs:comment "The name of something";
  schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.location.name". # TODO and more

tr:nameAlt a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "name alt";
  rdfs:comment "Alternative name of a code value, as present in CSV files";
  rdfs:domain tr:CodeValue;
  rdfs:range xsd:string.

tr:netOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "net output";
  rdfs:comment "Net power output (actualOutput minus actualConsumption) of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:forecastedOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "forecasted output";
  rdfs:comment "Forecasted output of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:notation a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "notation";
  rdfs:comment """Code of something, eg A01 (a code value), EFET (European Federation of Energy Traders), CB-RO-OP (Control Block Romania Operator).
Single value, coming from EIC or code list master data""";
  schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "long_Names.name".

tr:notationAlt a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "notation alt";
  rdfs:comment """Alternative code for an Energy Resource.
Potentially multiple values, coming from messages (Configuration_MarketDocument)""";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.name".

tr:parentResource a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "parent resource";
  rdfs:comment """Parent of this Energy Resource, eg:
- Control Block   parentResource Coordination Center Zone
- Generation Unit parentResource Production Unit
""";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource.

tr:price a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price";
  rdfs:comment "Price reported in this data observation in 'currency' per MW/h (see also 'priceInEur')";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:priceCategory a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price category";
  rdfs:comment "Price category of this balancing price (Average or Marginal)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:priceInEur a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price in EUR";
  rdfs:comment "Price reported in this data observation in EUR per MW/h (see also 'price')";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:providerParticipant a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "provider participant";
  rdfs:comment "Provider participant(s) of this Energy Resource";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource;
  tr:xpath "Provider_MarketParticipant.mRID".

tr:reason a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reason";
  rdfs:comment "Motivation of an act (in whole Message or individual TimeSeries) in coded form";
  schema:domainIncludes tr:Message, tr:TimeSeries;
  rdfs:range tr:CodeValue.

tr:reasonText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reason text";
  rdfs:comment "Motivation of an act as free text, when `reason` is A95 Complementary information";
  schema:domainIncludes tr:Message, tr:TimeSeries;
  rdfs:range xsd:string.

tr:regArticle a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "regulation article";
  rdfs:comment "Article in Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets that describes the data item";
  rdfs:seeAlso <https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0543>;
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:reserveType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reserve type";
  rdfs:comment "Type of reserve resource of this balancing volume or price (FCR, aFRR, mFRR, RR)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:responsibleParticipant a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "responsible participant";
  rdfs:comment "Participant that is responsible for this Energy Resource";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource;
  tr:xpath "eICResponsible_MarketParticipant.mRID".

tr:schedulingArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "scheduling area";
  rdfs:comment "Scheduling Area of this balancing volume or price";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:Area.

tr:statusText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Latest status of an Outage: 'Active, Withdrawn, Canceled'";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:timeZone a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "time zone";
  rdfs:domain tr:Outage;
  rdfs:comment "Time zone code of an Outage";
  rdfs:range xsd:string.

tr:typeText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Type of an Outage: 'Planned, Forced'";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:vatNumber a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VAT number";
  rdfs:comment "Value Added Tax number of an energy participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:version a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "version";
  rdfs:comment "Version of the message. Only the latest version(s) of a MRID are retained. Used in the URL";
  rdfs:domain tr:Outage;
  rdfs:range xsd:integer.

tr:viesAddress a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES address";
  rdfs:comment "Party address as returned by EU VIES (only if present in VIES)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:viesCheckDate a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES check date";
  rdfs:comment "Datetime when EU VIES check was performed";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:dateTime.

tr:viesName a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES name";
  rdfs:comment "Party name as returned by EU VIES (only if present in VIES)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:volume a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "volume";
  rdfs:comment "Volume offered, accepted, activated or unavailable (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:volumeCategory a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "volume category";
  rdfs:comment "Volume category of this balancing volume (offered, accepted, activated or unavailable)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:xpath a owl:DatatypeProperty;
  rdfs:label "xpath";
  rdfs:comment "xpath that carries XML data for an RDF property. TODO: also need namespace and enclosing elements?";
  schema:domainIncludes owl:ObjectProperty, owl:DatatypeProperty; # rdfs:Class ?
  rdfs:range xsd:string.