Transparency EKG Requirements Specification, Architecture and Semantic Model
Last updated: 28-09-2022
Authors:
Vladimir Alexiev, Viktor Ribchev, Miroslav Chervenski, Nikola Tulechki, Mihail Radkov, Antoniy Kunchev, Radostin Nanov








Developed by:     Ontotext (Sirma AI)
Based on data from:     ENTSO-E Transparency Platform
Powered by:

This project has received funding from the European Union’s Horizon 2020 research and innovation programme
under grant agreement No 824330: INTERRFACE Open Call (cascade funding)


 

0.1 Document Revision History

Version Date Changes Made
M4 2022-06-10 Final Version
M4 2022-04-08 V1 of the TEKG
Refinement of validation rules
M3.1 2022-03-23 Started tracking revison histiory
Review comment addressed in installedCapacity-Aggregated-vs-Per-Unit
М3 2022-03-08 M3 Deliverable corresponding to V1 of the TEKG

1 Intro

The ENTSO-E Transparency Platform provides information that is crucial for the efficient and fair operation of the EU energy market. It includes a large number of data items (time series) that are strictly defined in EUreg Transparency and further elaborated in MoP DDD (see Project Glossary on where to find these references).

Knowledge Graphs (KG) have numerous benefits for data integration across enterprises and disciplines. The Energy Identification Code (EIC) is a global identifier of energy resources (objects) and parties (domains/areas, market participants, exchanges, etc).

With this project we hope to make a step in the direction of Energy KGs by creating a Transparency Energy KG (TEKG) from ENTSOE Transparency data. We use GraphDB, the Ontotext Platform, and semantic data integration. We demonstrate the benefits of KG for:

  • Data quality, uncovering a number of Data Quality problems in ENTSOE data
  • Integrating of external data
  • Data analytics, showing GraphDB-Elasticsearch-Kibana data flows

This living document specifies the TEKG:

  • M1 (2022-01-14) specifies the project Scope and some draft requirements
  • M2 (2022-02-02) specifies all Business Requirements including mockups
  • M3 (2022-03-02) specifies Semantic Models, Software Architecture (and incorporates a Test Plan)
  • M4 (2022-06-01) specifies the final version of the TEKG

The demonstrator is availble at https://transparency.ontotext.com/

1.1 Project Glossary

We have created and will maintain a comprehensive project glossary. Every special term and abbreviation that we encounter is added to the glossary.

It also includes a list of Sources:

  • EC regulations
  • Manual of Procedures (MoP) and its parts, including DDD Detailed Data Descriptions
  • Other ENTSOE documents and pages, amongst them:
    • doc Free Reuse: Data Available for Free Re-Use
    • doc Functions: List of allowed functions for the EIC codes
  • Scientific Papers

1.2 Areas

The constituency of ENTSOE is broken up into a number of Domain/Area "meshes" according to different principles. See glossary#areas for a description of all kinds of Areas.

The following kinds of Areas are most important for Transparency because they are used in Data Items:

  • Bidding Zone, BZN: largest geographical area in which there is a uniform spot price, in which Market Participants can exchange energy without Capacity Allocation.
  • Control Area, CA=CTA: coherent part of the interconnected system, operated by a single system operator and shall include connected physical loads and/or generation units
  • Member State (Country), CTY: EU member state or a neighboring state
  • Market Balance Area, MBA: geographic area in which there is a uniform balancing energy price. Consists of one or more Metering Grid Areas with common market rules for which the settlement responsible party carries out a balance settlement and which has the same price for imbalance. May also be defined due to bottlenecks.
  • Scheduling Area, SCA: same as Bidding Zone, except if there is more than one Responsibility Area within this Bidding Zone. In the latter case, the Scheduling Area equals Responsibility Area or a group of Responsibility Areas.

Resources (Eg Production and Generation Units) of these Areas can be requested from the Transparency portal and are used as key request parameters in the REST API. For example:

The following query finds 198 relevant Areas of the above kinds in the EIC file, and returns them with all functions:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
    values ?fun {"Member State" "Control Area" "Bidding Zone" "Market Balance Area" "Scheduling Area"}
    ?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
    optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)

We get from EIC the 3 critical kinds CTY, CTA, BZN that are of interest to us (111 such Areas):

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
    values ?fun {"Member State" "Control Area" "Bidding Zone"}
    ?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
    optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)

Unfortunately there are discrepancies, see data/areas.tsv that has the following columns (with count shown):

  • name: area name (121)
  • co: country code (49, 29 unique)
  • eic: EIC code (121)
  • funcs: which of the 3 functions BZN, CTA, CTY are listed for the area (121)
  • inEIC: whether it's present in the EIC file (111)
  • inDoc: whether it's present in the documentation REST API Guide#Areas (89)
  • inAPI: whether it's accepted by the REST API request master_data i.e. Installed Capacity Per Production Unit (87)
  • inVIES: whether VAT numbers of that country can be validated in VIES. see External VAT Validation

We have the following combinations:

inEIC inDoc inAPI count
0 1 1 10
1 0 0 31
1 0 1 1
1 1 0 3
1 1 1 76
  • 34 areas are listed in the EIC file but rejected by the API
  • 3 areas are documented but rejected by the API:
not eic funcs comment
DE 10Y1001A1001A83F Member State Instead, use BZN (CZ-DE-SK, DE-AT-LU, DE-LU) or CTA (50hertz, Amprion, Tennet GER, TransnetBW) are used
DK 10Y1001A1001A65H Member State Instead, use BZN (DK-1, DK-2) is used
UK 10Y1001A1001A92E Member State Instead, use BZN (GB National Grid, IE(SEM)) or CTA (National Grid, NIE) are used
  • 1 area is accepted by the API but not documented:
not eic funcs comment
GB-NI 10Y1001A1001A016 Control Area NIE?
  • 10 areas are missing from the EIC XML file but are documented and accepted by the API: we added their data to areas.tsv and to a manually crafted turtle/eic-extra.ttl
notation co eic funcs
IT-BRINDISI IT 10Y1001A1001A699 Bidding Zone
IT-FOGGIA IT 10Y1001A1001A72K Bidding Zone
IT-PRIOLO IT 10Y1001A1001A76C Bidding Zone
IT-ROSSANO IT 10Y1001A1001A77A Bidding Zone
BY BY 10Y1001A1001A51S Control Area, Bidding Zone, Market Balance Area
MD MD 10Y1001A1001A990 Control Area, Bidding Zone, Market Balance Area
RU RU 10Y1001A1001A49F Control Area, Bidding Zone, Market Balance Area
KALININGRAD RU 10Y1001A1001A50U Control Area, Bidding Zone, Market Balance Area
PL-CZ 10YDOM-1001A082L Control Area, Bidding Zone
CZ+DE+SK 10YDOM-CZ-DE-SKK Bidding Zone

1.3 Countries

We find some interesting discrepancies of "Member State" areas:

  • Many are missing country code (tr:countryCode): BE, CZ, DE, ES, FR, ICELAND, IT, LU, NL, NO, SE, SK, UA, UK
  • All tr:name are country code except "ICELAND" which is a full name
  • LV lists 4 "Member States": "LV" but also "END_USERS_LV", "DISTRIBUTION_LV", "VTP_LV"

In other to join external power plant datasets, we need a list of ENTSOE countries with ISO2 and ISO3 codes.

The following query finds 36 countries that are members of ENTSOE. We use a Federated query to Wikidata:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?eic ?iso2 ?iso3 ?name ?wd_name where {
  ?x tr:function "Member State"; tr:eic ?eic; tr:notation ?n; tr:name ?name.
  bind(if(?n="ICELAND","IS",?n) as ?iso2)
  service <https://query.wikidata.org/sparql> {
    ?y wdt:P297 ?iso2; wdt:P298 ?iso3; rdfs:label ?wd_name
    filter(lang(?wd_name)="en")
  }
} order by ?iso2
  • We replace dynamically "ICELAND" with "IS" which is its proper iso2 code
  • The join to Wikidata by iso2 eliminates the 3 extraneous LV "Member States"
  • In the result data/countries.csv, we merge the two names of NL to one row: "Netherlands; Kingdom of the Netherlands"
  • We resolve a difference of United Kingdom vs Great Britain
  • Finally, we add the 3 countries missing from the EIC file (RU, BY, MD)

2 Data Items

The ENTSOE Transparency portal includes about 80-135 data items (depending on how you count). The items cover 7 domains:

  • Load: power consumption forecasts and actuals
  • Generation: production installed capacities (configuration), forecasts and actuals
  • Transmission: power transfers over borders between areas
  • Balancing: regulation energy used to keep the electrical transmission grid in balance: bids (price & volume), capacity, imbalance prices and volume
  • Outages: planned maintenances and unplanned failures inside the electrical grid: transmission, generation, consumption, offshore grid. The most popular domain
  • Congestion Management: actions taken to relieve overloaded parts of the electrical transmission grid
  • System Operations: Operational Agreements (on Synchronous Areas, LFC Blocks), Measurements of frequency quality (PDFs)

Data items are described in various documents:

  • EUreg Transparency: Commission Regulation (EU) No 543/2013
  • data-items-sitemap.txt: page Sitemap: 7 domains, 84 items
  • data-items-kb.txt: page Knowledge Base: 7 domains, 84 items
    • Includes ECreg Transparency item definitions and clause references, as well as more detailed item descriptions, sometimes with illustrations
  • data-items-sftp.txt: page SFTP: 6 domains (excludes System Operations), 100 items
    • Includes ECreg Transparency clause references
    • Includes column descriptions of the 156 fields that appear in these 100 tables. But fields are not always explained well, only examples are provided
    • Has some important omissions, eg ActualGenerationOutputPerGenerationUnit shows PowerSystemResourceName, but the respective CSV file also has GenerationUnitEIC
  • doc Free Reuse: Data Available for Free Re-Use (2019-11).
    • Describes 35 data items that are available for Free Reuse (and are therefore our first target).
    • For data items that are not in the list, one needs to seek the consent of the primary data owner (see Primary Owner of Data for each row of MoP DDD, most often the TSO)

2.1 Data Item Description

We have reconciled the various descriptions of data items and integrated them in this Google Sheet .

From it we generate a semantic description in file data/turtle/small/kb.ttl using the query in etl_scripts/dataItems.ru which includes the following properties (examples given for item <data/load/ActualTotalLoad_6.1.A>):

  • tr:name: item name, eg "Actual Total Load"
  • tr:file: base file name of XML (REST API) or CSV (SFTP), eg "ACTUAL_TOTAL_LOAD" or "ActualTotalLoad"
  • tr:dataDomain: parent data domain, eg <data/load>
  • tr:linkDescription: link to detailed description (see "knowledge base" above), eg Total Load - Day Ahead - Actual
  • tr:linkPortal: link to ENTSOE portal where the item can be viewed/downloaded, eg totalLoadR2/show
  • tr:linkDownload: download link, applies only to "static" files:
  • tr:link: applies only to "external" sources
  • tr:regArticle: article of ECreg Transparency describing the item, eg:
    • 6.1.A for Actual Total Load
    • 12.3.A.d for Explicit Allocations - Auction Revenue (daily)
    • 12.3.A.i for Explicit Allocations - Auction Revenue (intraday)
    • 16.1.B and 16.1.C for Aggregated Generation per Type
  • tr:isFreeReuse: whether the item is available for free reuse
  • tr:ekgCheckDataQuality: whether TEKG will implement Data Validations over the item
  • tr:ekgImplementAnalytics: whether TEKG will implement Analytics over the item

2.2 Data Items to be Integrated

This is the full list of data items that will be integrated. It includes items to be validated (ekgCheckDataQuality) and items to implement analytics for (ekgImplementAnalytics):

  • (Basic) Energy Identification Code file (EIC)
  • (Basic) Codelists
  • (External) Open Street Map (OSM)
  • (External) Other external datasets of power plants and generators
  • (Load) Actual Total Load (Cancelled)
  • (Load) Day-ahead Total Load Forecast (Cancelled)
  • (Load) Month-ahead Total Load Forecast (Cancelled)
  • (Load) Week-ahead Total Load Forecast (Cancelled)
  • (Load) Year-ahead Total Load Forecast (Cancelled)
  • (Generation) Installed Capacity Per Production Unit
  • (Generation) Aggregated Generation per Type
  • (Generation) Current Generation Forecasts for Wind and Solar
  • (Outages) Planned Unavailability and Changes in Actual Availability of Generation Units
  • (Outages) Planned Unavailability and Changes in Actual Availability of Production Units
  • (Balancing) Accepted Aggregated Offers
  • (Balancing) Prices Of Activated Balancing Energy
  • (Balancing) Activated Balancing Energy

The following subsections provide detailed description and analysis of each item:

  • Research the data items
  • Research data availability. We take:
    • The "Basic" items from REST API as XML
    • The "transactional" items (Load, Generation, Outages) from SFTP as CSV
  • Analyze XML schemas XSDs, take and analyze XML and CSV examples
  • Add to Data Validation
  • Create semantic models showing which data fields are mapped to what RDF constructs
  • This will be the basis of semantic conversions

2.3 Historical Data Ingestion

  • Historical data for Balancing is ingested for 6 months in the past.
    • e.g. for ActivatedBalancingEnergy on 2022-03-01, a total of 12 CSV files need to be processed, prefixed from 2022_03 to 2021_02
  • Historical data for Generation is ingested 1 full month in the past
    • e.g for AggregatedGenerationPerType on 2022-03-01, 2 csv files need to be processed, prefixed 2022_02 and 2022_01
  • DayAheadGenerationForecastForWindAndSolar is also ingested 1 month in the past
  • All available future data is always ingested and processed.

Exception: DayAheadGenerationForecastForWindAndSolar for CTA 10YAL-KESH-----5 has over 1 year of null forecasts with 0.00 values. For this reason we will limit future data for this data item to 1 month.

2.4 Temporal Aggregation

Temporal aggregation is required for producing analytics where the diagrams require a coarser level of aggregation than the raw data. This section specifies the temporal aspects of the time-series data.

2.4.1 Generation of Aggregated Data

Temporal aggregation is provided by creating synthetic data items where the amounts are aggregated at the desired temporal resolution. Eg the Balancing Energy Timeline requires hourly or daily aggregates of the Prices Of Activated Balancing Energy and Activated Balancing Energy data items.

Depending on the source, these data items are reported on different temporal resolutions from 15 min to 1h (PT15M, PT30M and PT60M) These values are harmonised at:

  • PT1H (hourly) resolution stored as data item PricesOfActivatedBalancingEnergy_HOURLY
  • P1D (daily) resolution stored as data item PricesOfActivatedBalancingEnergy_DAILY

Similarly, Activated Balancing Energy is aggregated in ActivatedBalancingEnergy_HOURLY and ActivatedBalancingEnergy_DAILY

  • Hourly aggregations are defined by the timestamp at the whole hour preceding the measurement.
  • Daily aggregations are defined by the timestamp at midnight preceding the measurement.
  • These synthetic data items are produced using arithmetical operations in SPARQL update queries.
  • They run on the entire data and overwrite the aggregated values at each execution.

Note: A similar procedure is used for spatial aggregation of individual capacities in a given area, see InstalledGenerationCapacityComputed

2.4.2 Summary Operations

Summary operations differ according to the values being aggregated:

  • PricesOfActivatedBalancingEnergy: the amounts are averaged over the time period
  • ActivatedBalancingEnergy: the amounts are summed over the time period

2.5 Semantic Model Diagrams

We visualize semantic models (RDF mappings) using the rdfpuml tool from https://github.com/VladimirAlexiev/rdf2rml . These are graph models that show:

  • Colored circles to indicate the data item (eg (C)=Codelist, (E)=EIC file, (P)=Production and Generation Units)
  • The XML path or CSV file used to source data for each node, as "..." right after the class name
  • XML or CSV field names in brackets (round brackets in URLs and square brackets in literal values)
  • The datatype used for each literal

2.6 XML Items and XML Schemas

We obtained XML schemas from CIM_xsd_package.zip (and a few others) and saved to folder xsd

2.6.1 Codelists

The codelists describe the basic lookups used on the Transparency platform.

We obtained CodelistV80.zip and saved data/code-lists/urn-entsoe-eu-wgedi-codelists.xsd. The codelists are embedded in this XSD. We use only "Standard" TypeLists, eg:

  <xsd:simpleType name="StandardAssetTypeList">
    <xsd:annotation>
      <xsd:documentation>
        <Uid>ET0031</Uid>
        <Definition>The identification of the type of asset.</Definition>
      </xsd:documentation>
    </xsd:annotation>
    <xsd:restriction base="xsd:NMTOKEN">
      <xsd:enumeration value="A01">
        <xsd:annotation>
          <xsd:documentation>
            <CodeDescription>
              <Title>Tieline</Title>
              <Definition>A high voltage line used for cross border energy interconnections.</Definition>
            </CodeDescription>
          </xsd:documentation>
        </xsd:annotation>
      </xsd:enumeration>

2.6.1.1 Codelist Mapping

We convert XML codelists to this simple RDF representation (alternatively, we could use SKOS):

@base <https://transparency.ontotext.com/resource/> .

<type/Asset> a tr:CodeList;
  tr:name "Asset";
  tr:notation "ET0031";
  tr:description "The identification of the type of asset.".

<type/Asset/A01> a tr:CodeValue;
  tr:codeList <type/Asset>;
  tr:name "Tieline";
  tr:notation "A01";
  tr:description "A high voltage line used for cross border energy interconnections." .

A general model looks like data/model/codelist.ttl:

In order to match string values in CSV files to the codelists, we add nameAlt to some code values. For example, the code value for "FCR" (a type of balancing reserve) looks like data/turtle/small/codelists-extra.ttl:

To facilitate faceted search/display, we have added a hierarchy to <type/Asset> using the tr:fuelTypeClassification predicate. Тhe different varieties of Hydro powered assets under a generic Hydro asset typeare meterilized in data/model/codelist-eg.ttl. We also add some matching info in order to match fuel type from other databases to the ENTSOE codelist.

2.6.2 EIC File

The EIC file provides basic information about Energy Resources.

  • EIC was devised by ENTSOE but is also used by ENTSOG.
  • While ENTSOE allocates some EIC codes (in its role as CIO), most are issued by national authorities (LIO) in a distributed way. The important EIC codes are sent back to ENTSOE
  • The third char of EIC determines the kind of resource according to the table shown in (*) below. We populate a field eicType, see Add eicType

The ENTSOE EIC file is available from several sources:

  • XML allocated-eic-codes.xml (namespace urn:iec62325.351:tc57wg16:451-n:eicdocument:1:0), 2021-12-31, has grown by 3.3% in 7 months
  • CSV: page eic-approved-codes that offers browsing and several CSV downloads:
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/A_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/T_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/V_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/W_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/X_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Y_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Z_eiccodes.csv

Counting the number of records:

  • XML total and breakdown per type:
grep -c "<EICCode_MarketDocument>" allocated-eic-codes.xml
perl -lne 'print $1 if m{<mRID>..(.).............</mRID>}' allocated-eic-codes.xml|sort|uniq -c
  • After transforming XML to RDF and loading to GraphDB we Add eicType

  • CSV total and breakdown per type (need to subtract 1 from each result to account for the header line)

wc -l *.csv

(*) Counts for XML and CSV:

char type XML CSV
"A" "Substation" 2447 2457
"T" "Tieline/Transformer" 9985 10104
"V" "Location" 516 522
"W" "Resource Object" 20116 20195
"X" "Party" 10115 10138
"Y" "Area or Domain" 1140 1143
"Z" "Measurement point" 1841 1842
TOTAL 46160 46401

So the CSV has 241 records more than the XML.

The CSV has field EicStatus and we guessed that maybe the extra resources have status Passive. While trying to get statistics for this field, we found that the CSV is malformed: it is semicolon-separated but includes fields with embedded semicolon and no quoting. For example:

  • X_eiccodes.csv: GASINDUR; S.L.
  • Y_eiccodes.csv: Enson tutkimustehdas; Imatra
csvtk summary -d ; -f EicCode:count -g EicStatus X_eiccodes.csv
[ERRO] record on line 2731: wrong number of fields

head -2731 X_eiccodes.csv |tail -1
18X0000000000KCL;INDUR;GASINDUR; S.L.;;;Active;47012;ES;ESB34041400;Trade Responsible Party;X

head -1051 Y_eiccodes.csv |tail -1
44Y-00000000246A;FI_EGTU00;Enson tutkimustehdas; Imatra;;44X-00000000100F;Active;;FI;;Metering Grid Area;Y

We guessed the opposite status is Passive but found no resources with this word:

grep -c Passive *.csv

Judging from the count, the CSV is a superset of the XML. But we double-checked the particular EIC ids for the critical type "Area or Domain", and indeed CSV has 3 extra records (namely Cut Areas/Corridors):

cut -f 1 -d \; Y_eiccodes.csv | tail -n +2 | sort > eic-areas-csv.txt
perl -lne 'print $1 if m{<mRID>(..Y.............)</mRID>}' allocated-eic-codes.xml|sort>eic-areas-xml.txt
comm -3 eic-areas-csv.txt eic-areas-xml.csv

46Y000000000007M
46Y000000000008K
46Y000000000009I

grep "46Y000000000007M|46Y000000000008K|46Y000000000009I" Y_eiccodes.csv
46Y000000000007M;CUT_AREA_SE3A;Cut area SE3A;;;Active;;;;Bidding Zone;Y
46Y000000000009I;CUT_COR_SE3A-SE3;Cut corridor SE3A-SE3;;;Active;;;;Bidding Zone;Y
46Y000000000008K;CUT_AREA_SE3;Cut area SE3;;;Active;;;;Bidding Zone;Y

2.6.2.1 EIC Fields

EIC XML has the following structure shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity:

EIC_MarketDocument =
 element mRID {ID_String},
 element revisionNumber {ESMPVersion_String},
 element type {MessageKind_String},
 element sender_MarketParticipant.mRID {PartyID_String}?,
 element sender_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
 element receiver_MarketParticipant.mRID {PartyID_String}?,
 element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
 element createdDateTime {ESMP_DateTime},
 element EICCode_MarketDocument {EICCode_MarketDocument}*

EICCode_MarketDocument =
 element mRID {EICCode_String}?,
 element status {Action_Status}?,
 element docStatus {Action_Status}?,
 element attributeInstanceComponent.attribute {xsd:string}?,
 element long_Names.name {Characters70_String},
 element display_Names.name {Characters16_String},
 element lastRequest_DateAndOrTime.date {xsd:date},
 element deactivationRequested_DateAndOrTime.date {xsd:date}?,
 element eICContact_MarketParticipant.name {Characters70_String}?,
 element eICContact_MarketParticipant.phone1 {TelephoneNumber}?,
 element eICContact_MarketParticipant.electronicAddress {ElectronicAddress}?,
 element eICCode_MarketParticipant.streetAddress {StreetAddress}?,
 element eICCode_MarketParticipant.aCERCode_Names.name {ACERCode_String}?,
 element eICCode_MarketParticipant.vATCode_Names.name {VATCode_String}?,
 element eICParent_MarketDocument.mRID {EICCode_String}?,
 element eICResponsible_MarketParticipant.mRID {EICCode_String}?,
 element description {Characters700_String}?,
 element Function_Names {Function_Name}*

StreetAddress =
 element streetDetail {StreetDetail}?,
 element postalCode {Characters10_String}?,
 element townDetail {TownDetail}?

StreetDetail =
 element addressGeneral {Characters70_String}?,
 element addressGeneral2 {Characters70_String}?,
 element addressGeneral3 {Characters70_String}?

TownDetail =
 element name {Characters35_String}?,
 element country {Characters2_String}?

We examined actual XML instances and show below the fields that are filled and useful (not constant).

A field comparison between CSV, XML and the resulting RDF properties (which we hope are shorter and easier to understand):

CSV XML RDF Note
EicCode mRID tr:eic Also used in URL
EicDisplayName display_Names.name tr:notation
EicLongName long_Names.name tr:name
description tr:description Often repeats the Functions
EicParent ns:eICParent_MarketDocument.mRID tr:parentResource As EIC URL
EicResponsibleParty eICResponsible_MarketParticipant.mRID tr:responsibleParticipant As EIC URL
EicStatus docStatus/value Always A05, so omitted
MarketParticipantPostalCode Not in XML
MarketParticipantIsoCountryCode eICCode_MarketParticipant.streetAddress/townDetail/country tr:countryCode
MarketParticipantVatCode vATCode_Names.name tr:vatNumber
aCERCode_Names.name tr:acerCode
EicTypeFunctionList Function_Names/name tr:function
type tr:eicType Generated from EIC 3rd char
lastRequest_DateAndOrTime.date tr:dateUpdated

So each file (XML vs CSV) has some extra fields compared to the other:

  • XML has dateUpdated, which can be quite important in data update scenarios
  • XML has acerCode, which can be important for external data integration with ACER
  • XML has description, which most often repeats the Functions, with some informative exceptions, eg
    • "Entry/Exit Point From A Storage Between Storengy And Grtgaz"
    • "Implementation of common platform for aFRR, as mandated by EB GL."
    • "Domestic exit point"
    • "Albanain LIO office is applying for EIC codes- identifying Kosovo Production and Generation Unit since they do not have LIO office, yet."
    • "Connection With The Distribution System"
  • CSV has PostalCode, but we suspect that many are nonsensical data, eg
    • Azerbaijan: postalCode=1002, countryCode=BE

2.6.2.2 EIC Mapping

For now, we use EIC XML, but later we might decide to replace or complement with EIC CSV. Unfortunately, both of these files are missing some Areas that are returned by the REST API.

The EIC file is mapped to RDF as follows (XML field names are shown in brackets).

All fields are extracted from XML, except eicType (see Add eicType)

2.6.3 Production and Generation Units

We use the Production and Generation Units REST API that returns XML data items having the following structure (shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity). It consists of:

  • one Configuration_MarketDocument header
    • multiple TimeSeries describing Production Units
      • one MktPSRType describing characteristics of the Production Unit
      • multiple nested MktGeneratingUnit describing Generation Units
Configuration_MarketDocument =
 element mRID {ID_String},
 element type {MessageKind_String},
 element process.processType {ProcessKind_String},
 element sender_MarketParticipant.mRID {PartyID_String},
 element sender_MarketParticipant.marketRole.type {MarketRoleKind_String},
 element receiver_MarketParticipant.mRID {PartyID_String},
 element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String},
 element createdDateTime {ESMP_DateTime},
 element TimeSeries {TimeSeries}*

TimeSeries =
 element mRID {ID_String},
 element businessType {BusinessKind_String},
 element implementation_DateAndOrTime.date {xsd:date},
 element biddingZone_Domain.mRID {AreaID_String}?,
 element registeredResource.mRID {ResourceID_String},
 element registeredResource.name {xsd:string},
 element registeredResource.location.name {xsd:string},
 element ControlArea_Domain {ControlArea_Domain}+,
 element Provider_MarketParticipant {Provider_MarketParticipant}+,
 element MktPSRType {MktPSRType}

MktPSRType =
 element psrType {PsrType_String},
 element production_PowerSystemResources.highVoltageLimit {ESMP_Voltage}?,
 element nominalIP_PowerSystemResources.nominalP {ESMP_ActivePower}?,
 element GeneratingUnit_PowerSystemResources {MktGeneratingUnit}*

MktGeneratingUnit =
 element mRID {ResourceID_String},
 element name {xsd:string},
 element nominalP {ESMP_ActivePower},
 element generatingUnit_PSRType.psrType {PsrType_String},
 element generatingUnit_Location.name {xsd:string}

ESMP_ActivePower-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_ActivePower = ESMP_ActivePower-base, attribute unit {UnitSymbol}

ESMP_Voltage-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_Voltage = ESMP_Voltage-base, attribute unit {UnitSymbol}

2.6.3.1 Production and Generation Unit Mapping

We map the Production and Generation Unit data item to RDF as follows:

Notes:

  • We omit header data since the item is nearly-static ("configuration" or "master") data, and we retain only the latest version
  • We assign RDF types tr:ProductionUnit and tr:GenerationUnit to the higer and lower level resources, since we need them for Data Corrections later
  • We omit units of measure for simplicity, since all resources use the same units: MAW for output (nominalP=installedOutput, actualOutput, availableOutput) and KVA for highVoltageLimit

2.6.4 Combined Mapping

The following diagram shows how the semantic data from the previous 3 sections comes together (EIC file, Codelist, Production and Generation Units).

It uses the example of Bulgaria's NPP Kozloduy power plant and related entities (two generators; Bulgaria, the BG TSO "ESO", the "NPP Kozloduy" responsibleParticipant, etc). We use color coding to show which part of the data comes from which data item.

The diagram is adapted from our proposal. In particular, we added eicType (see Add eicType).

2.7 CSV Files

There's no schema for the CSV files, but field names are pretty clear, and we can match them to MADES UML models.

We also do some field value investigations using the csvtk tool (see csvtk#177 for proposed enhancements); equivalent results can be obtained easily with Python Pandas. For example:

# distribution of ResolutionCode
csvtk -t freq -f ResolutionCode -k 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
ResolutionCode  frequency
PT15M   15144
PT30M   9456
PT60M   87606

# analyze correlation of ActualGenerationOutput and ActualConsumption
cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn
# see below

Investigations are based on 2021_01 files, some obtained on 2022-01-05 and others on 2022-01-19 (therefore incomplete month data).

WARNINGS:

  • Although file names are .csv, the files are tab-separated (TSV)
  • The files are UTF8 encoded with BOM (Byte Order Mark), which may cause problems in some tools.
    • See in particular issue tarql#94
    • The following Octal Dump (od) command shows that the first 3 bytes of a CSV file are the BOM, followed by the first column name and a tab.
    od -c -N 100 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
    0000000 357 273 277   D   a   t   e   T   i   m   e  \t

2.7.1 InstalledGenerationCapacityAggregated_14.1.A

857 samples.

Field Example RDF Comment
tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>
DateTime 2022-01-01 00:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format (" " -> "T")
ResolutionCode P1Y tr:duration always "P1Y"^^xsd:duration
AreaCode 10YIE-1001A00010 tr:biddingZone,tr:controlArea,tr:country depending on AreaTypeCode (BZN, CTA, CTY)
AreaTypeCode CTA Values BZN, CTA, CTY used to map corresponding relations | | AreaName | IE | | | | MapCode | CTA IE | | | | ProductionType | Geothermal | tr:assetType | match totr:nameandtr:nameAltof` code list
AggregatedInstalledCapacity 17.00 tr:installedOutput
DeletedFlag 0 checked csv for 2021 - always 0
UpdateTime 2021-07-27 20:56:08

Example of values of ProductionType with no match in the code lists. - Hydro Pumped Storage - Hydro Run-of-river and poundage - Hydro Water Reservoir We have created tr:altNames in the corresponding code lists. see codeliests-extra.ttl

2.7.1.1 InstalledGenerationCapacityAggregated_14.1.A Model

See InstalledGenerationCapacityAggregated.ttl

RDF URL and fixed data (where the space in (DateTime) is replaced with T):

<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;

2.7.2 InstalledGenerationCapacityComputed

This is a "synthetic" data item that holds computed totals.

We compute aggregate tr:ProductionUnit capacities (tr:installedOutput) from generation/ProductionAndGenerationUnits in order for rule installedCapacity-Aggregated-vs-Per-Unit to compare it to generation/InstalledGenerationCapacityAggregated (which reports aggregated volumes per area and asset type).

  • Totaled over all areas in which the Production Unit is reported (controlArea, biddingZone).
  • The latest reported capacities are totaled
  • Marked with time "now" and duration of validity "1 hour"
<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;

Model: see InstalledGenerationCapacityComputed.ttl

Example RDF Comment
tr:dataItem <data/generation/InstalledGenerationCapacityComputed>
2022-01-01T00:00:00 tr:date now() as datatype xsd:dateTime
PT1H tr:duration Validity duration as datatype xsd:duration
<eic/10YIE-1001A00010> tr:biddingZone,tr:controlArea From the individual units
tr:assetType tr:assetType of the individual units
100.0 tr:installedOutput Computed as a sum from the individual units
130.00 tr:installedOutputHigh +30% of the value in tr:installedOutput

The computation is done by InstalledGenerationCapacityAggregated.ru

2.7.3 ActualGenerationOutputPerGenerationUnit_16.1.A

112207 samples.

Field Example RDF Comment
DateTime 2022-01-01 11:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT60M tr:duration Convert to datatype xsd:duration. Values PT15M PT30M PT60M
AreaCode 10YGR-HTSO-----Y tr:controlArea Must match the controlArea of the Generation Unit: ActualGenerationOutputPerGenerationUnit-controlArea-conform
AreaTypeCode CTA Always "CTA" (control area)
AreaName GR CTA Matches notation of AreaCode, plus AreaTypeCode
MapCode GR Matches notation of AreaCode, checked 4. Some variations: this file vs EIC, eg: "DE(TransnetBW)" vs "DE-TRANSNETBW", "DE(TenneT DE)" vs "DE-TENNET_DE"
GenerationUnitEIC 29WGU-YISPAOOU-5 tr:generationUnit
PowerSystemResourceName P_AOOU Matches notationAlt of GenerationUnitEIC, checked 3.
ProductionType Hydro Water Reservoir Matches assetType of GenerationUnitEIC, checked 4.
ActualGenerationOutput 0.00 tr:actualOutput Convert to datatype xsd:float. 51% 0.00, 4.4% missing (*). Must be <= installedOutput: ActualGenerationOutputPerGenerationUnit-actualOutput-LTE-installedOutput
ActualConsumption tr:actualConsumption Convert to datatype xsd:float. 14.8% 0.00, 80% missing (that's the normal case) (*)
tr:netOutput Compute as difference ActualGeneration-ActualConsumption, treat missing as zero, convert to xsd:float (*)
InstalledGenCapacity 210.00 tr:installedOutput Convert to datatype xsd:float. Must match the declared installedOutput of the Generation Unit: ActualGenerationOutputPerGenerationUnit-installedOutput-conform
UpdateTime 2022-01-02 10:30:54 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

RDF URL and fixed data (where the space in (DateTime) is replaced with T):

<dataObs/generation/ActualGenerationOutputPerGenerationUnit/(GenerationUnitEIC)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;

(*) ActualConsumption is energy consumed by the generator for technological purposes. We analyze the correlation of ActualGenerationOutput and ActualConsumption:

cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn
cnt ActualGenerationOutput ActualConsumption
46183 zero
44082 NUM
10138 zero zero
5296 NUM zero
3974 NUM
1267 zero NUM
1227 zero
39 NUM NUM

There is a difference between missing and zero:

  • Missing value means "no data or inapplicable" whereas zero means "the generator did not produce output" respectively "did not consume anything"
    • Missing actualConsumption is legitimate since there are generators that don't consume anything
    • We choose not to validate that actualOutput is provided in each row
  • For the purpose of computing netOutput as the difference, we treat "missing" the same as "zero"

It is possible to have ActualConsumption without ActualGeneration (thus negative netOutput), eg:

  • The 18WMUE4B-12345-D "MUELA 4B" IBERDROLA GENERACION S.A.U. plant (Hydro Pumped Storage) was consuming 209.10 MW on 2022-01-01 at 03:00 while pumping water upward into its reservoir
  • The 62W373474960449Q "SEVTECCHPP-V" Severodonetsk Combined Heat and Power Plant (Fossil Gas) was consuming 2.54 MW on 2022-01-03 at 17:00 while outputting no electricity

2.7.3.1 ActualGenerationOutputPerGenerationUnit_16.1.A Model

The semantic mapping of this CSV is shown below.

Note: the ActualGenerationOutputPerGenerationUnit conversion should produce only the large node. The figure shows RDF type & EIC code in other nodes just to see the colored circles, but these should not be generated by this conversion.

2.7.4 AggregatedGenerationPerType_16.1.B_C

Field Sample RDF Comment
DateTime 2022-01-01 09:15:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT15M tr:duration Convert to datatype xsd:duration. Values PT15M PT30M PT60M
AreaCode 10YNL----------L tr:biddingZone
tr:controlArea
tr:country
AreaTypeCode CTA Use this field to determine property for AreaCode
AreaName NL CTA
MapCode NL
ProductionType Solar tr:assetType match to tr:name and tr:nameAlt of <type/Asset> code list
ActualGenerationOutput 10.94 tr:actualOutput Convert to datatype xsd:float.
ActualConsumption 0.00 tr:actualConsumption Convert to datatype xsd:float.
UpdateTime 2022-01-29 11:18:30
Net Output tr:netOutput Difference between output and consumption.
Performed at conversion

2.7.4.1 AggregatedGenerationPerType Model

<dataObs/generation/AggregatedGenerationPerType/(AreaTypeCode)/(AreaCode)/(ProductionType)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/AggregatedGenerationPerType>;

2.7.5 CurrentGenerationForecastForWindAndSolar_14.1.D

Month 2022_02, 150949 records

Field Example RDF Comment
DateTime 2022-02-05 06:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT60M tr:duration
AreaCode 10YLT-1001A0008Q tr:biddingZone
tr:controlArea
tr:country
AreaTypeCode BZN Use this field to determine property for AreaCode
AreaName LT BZN
MapCode LT
ProductionType Wind Onshore tr:assetType <type/Asset/> Match label
AggregatedGenerationForecast 351.99 tr:forecastedOutput
UpdateTime 2022-02-05 09:20:49 tr:dateUpdated
ProductionType Frequency
Wind Offshore 10752
Wind Onshore 72018
Solar 68178
AreaTypeCode Frequency
CTY 35468
BZN 53464
CTA 62016

2.7.5.1 CurrentGenerationForecastForWindAndSolar_14.1.D Model

<dataObs/generation/CurrentGenerationForecastForWindAndSolar/(AreaTypeCode)/(AreaCode)/match(ProductionType)/(DateTime)>
  a tr:DataObservation;
  tr:dataItem <data/generation/CurrentGenerationForecastForWindAndSolar>;

2.7.6 AcceptedAggregatedOffers_17.1.D

Month 2022_01, 109263 records.

Field Example RDF Comment
DateTime 2022-01-02 23:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT15M tr:duration Convert to datatype xsd:duration. Values PT15M PT30M PT60M
AreaCode 10YCH-SWISSGRIDZ tr:marketBalanceArea In namespace <eic/>
AreaTypeCode MBA Always "MBA"
AreaName CH MBA
MapCode CH
ReserveType Frequency Containment Reserve (FCR) tr:reserveType <type/Business/>: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
DeletedFlag 0 Always 0
UpdateTime 2022-01-02 09:45:51 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

This and the other Balancing items (next 3 items) include a number of related (denormalized) Volume/Price fields that we normalize using the following extra fields (dimensions) and their respective code values (in parentheses is the word as it appears in the field name).

  • tr:direction: <type/Direction/>: A01 "UP", A02 "DOWN", A03 "UP and DOWN" (Symmetric)
  • tr:volumeCategory: <type/Business/>: A31 "Offered Capacity" (Offered), B95 "Procured capacity" (Accepted), A45 "Schedule activated reserves" (Activated)
  • tr:assetType: <type/Asset/>: A04 "Generation", A05 "Load", B20 "Other unspecified" (NotSpecified)

Each of the numeric fields are emitted as tr:volume with datatype xsd:float and the following dimension values:

Field tr:direction tr:volumeCategory tr:assetType
LoadUpAcceptedVolume A01 "UP" B95 "Accepted" A05 "Load"
LoadDownAcceptedVolume A02 "DOWN" B95 "Accepted" A05 "Load"
LoadUpOfferedVolume A01 "UP" A31 "Offered" A05 "Load"
LoadDownOfferedVolume A02 "DOWN" A31 "Offered" A05 "Load"
LoadAcceptedVolumeSymmetric A03 "UP and DOWN" B95 "Accepted" A05 "Load"
LoadOfferedVolumeSymmetric A03 "UP and DOWN" A31 "Offered" A05 "Load"
GenerationUpAcceptedVolume A01 "UP" B95 "Accepted" A04 "Generation"
GenerationDownAcceptedVolume A02 "DOWN" B95 "Accepted" A04 "Generation"
GenerationUpOfferedVolume A01 "UP" A31 "Offered" A04 "Generation"
GenerationDownOfferedVolume A02 "DOWN" A31 "Offered" A04 "Generation"
GenerationAcceptedVolumeSymmetric A03 "UP and DOWN" B95 "Accepted" A04 "Generation"
GenerationOfferedVolumeSymmetric A03 "UP and DOWN" A31 "Offered" A04 "Generation"
NotSpecifiedUpAcceptedVolume A01 "UP" B95 "Accepted" B20 "Other unspecified"
NotSpecifiedDownAcceptedVolume A02 "DOWN" B95 "Accepted" B20 "Other unspecified"
NotSpecifiedUpOfferedVolume A01 "UP" A31 "Offered" B20 "Other unspecified"
NotSpecifiedDownOfferedVolume A02 "DOWN" A31 "Offered" B20 "Other unspecified"
NotSpecifiedAcceptedVolumeSymmetric A03 "UP and DOWN" B95 "Accepted" B20 "Other unspecified"
NotSpecifiedOfferedVolumeSymmetric A03 "UP and DOWN" A31 "Offered" B20 "Other unspecified"

2.7.6.1 AcceptedAggregatedOffers_17.1.D Model

The semantic mapping of this CSV is shown below.

  • Since this and the next two items talk about the same thing (balancing Volumes), add a new "synthetic" Data Item <data/balancing/AggregatedVolumes>.
    • Thus we unify the data for all 3 items in a unified namespace.
  • Add tr:unit "MW" to this data item

RDF URL and fixed data:

<dataObs/balancing/AggregatedVolumes/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(volumeCategory)/(assetType)>
  a tr:DataObservation;
  tr:dataItem <data/balancing/AggregatedVolumes>;
  • IMPORTANT: If a volume field is empty, emit no triples about it (no URL should be formed for its DataObservation)
  • We use the dimension values in the URL (ANY for the missing/sum/total)
  • The space in (DateTime) is replaced with T
  • "match()" indicates that the field value (string) should be matched to the respective code list value. See etl_scripts/tarql/match.h.rq for such matching implemented with a VALUES clause.

See data/model/AcceptedAggregatedOffers.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.7 ActivatedBalancingEnergy_17.1.E

Month 2022_01, 106828 samples. This table has the same common fields, which are mapped in exactly the same way as the previous section (AcceptedAggregatedOffers_17.1.D):

Field Example RDF Comment
DateTime 2022-01-01 00:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT60M tr:duration Convert to datatype xsd:duration. Values PT15M PT30M PT60M
AreaCode 10YCS-CG-TSO---S tr:marketBalanceArea In namespace <eic/>
AreaTypeCode MBA Always "MBA"
AreaName ME MBA
MapCode ME
ReserveType Automatic Frequency Restoration Reserve (aFRR) tr:reserveType <type/Business/>: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
UpdateTime 2021-12-30 14:31:00 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

Instead of Offered/Accepted, it has Activated amounts. They are mapped in exactly the same way:

2.7.8 YearAheadTotalLoadForecast_6.1.E

The RDF mapping is exactly the same as in the previous section. We use the same kind of URLs, and the same data item.

2.7.8.1 ActivatedBalancingEnergy_17.1.E Model

See data/model/ActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.9 AggregatedBalancingEnergyBids_12.3.E

Month 2022_01, 294943 samples.

This is very similar to the previous two sections, except:

  • It is for area type SCA rather than MBA
  • Has an extra volumeCategory "Unavailable"
  • Direction is a separate field, rather than being encoded in the Volume field names
  • There's marketProduct but no assetType
Field Example RDF Comment
DateTime 2022-01-02 12:45:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT15M tr:duration Convert to datatype xsd:duration. Values PT15M PT30M PT60M
AreaCode 10Y1001A1001A71M tr:schedulingArea In namespace <eic/>
AreaTypeCode SCA always "SCA"
AreaName IT-Centre-South SCA
MapCode IT-CSOUTH
ReserveType Replacement reserve (RR) tr:reserveType <type/Business/>: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR (*)
TypeOfProduct Standard tr:marketProduct <type/MarketProduct/>: match A01 Standard, A02 Specific, A04 Local
Direction Up tr:direction <type/Direction/>: A01 "UP", A02 "DOWN"
UpdateTime 2022-01-02 12:31:10 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

(*) WARNING: the values in this data item are spelled in Lowercase (all other tables are in Capital Case):

csvtk -t freq -f ReserveType2022_01_AggregatedBalancingEnergyBids_12.3.E.csvv
Replacement reserve (RR)                       126822
Manual frequency restoration reserve (mFRR)     75114
Automatic frequency restoration reserve (aFRR)  93006

So we use the macro match_reserveType_lcase() for this item, and match_reserveType() for all others.

Map the following fields to tr:volume with datatype xsd:float, and the following dimension values:

Field tr:volumeCategory
OfferedBidVolume A31 (Offered)
ActivatedBidVolume A45 (Activated)
UnavailableBidVolume Z99 (Unavailable)

2.7.9.1 AggregatedBalancingEnergyBids_12.3.E Model

We use the same RDF model as before. Again, we use the same URLs and data item.

See data/model/AggregatedBalancingEnergyBids.ttl.

  • The diagram is not very elucidating since all these records are correlated by their values, not by links.
  • IMPORTANT: if a Volume field is missing, do not emit any triples about it

2.7.10 PricesOfActivatedBalancingEnergy_17.1.F

Month 2022_01, 158455 samples.

Field Example RDF Comment
DateTime 2022-01-14 02:00:00.000 tr:date Convert to datatype xsd:dateTime and valid format
ResolutionCode PT30M tr:duration Convert to datatype xsd:duration
AreaCode 10YFR-RTE------C tr:schedulingArea or tr:marketBalanceArea Depending on AreaTypeCode
AreaTypeCode SCA SCA or MBA. Use to select the specific relation
AreaName FR SCA
MapCode FR
RegisterItemTypeName Automatic Frequency Restoration Reserve (aFRR) tr:reserveType <type/Business/>: match A95 FCR, A96 aFRR, A97 mFRR, A98 RR
TypeOfProduct A01 tr:marketProduct <type/MarketProduct/>: straight A01 Standard, A02 Specific, A04 Local
PriceType AVERAGE tr:priceCategory <type/PriceCategory/>: match A06 "Average bid price" (AVERAGE), A07 "Single marginal bid price" (MARGINAL)
Currency EUR tr:currency Values: EUR (10x more popular than all the rest), BAM, CZK, HUF, PLN, RON, UAH
UpdateTime 2022-01-14 03:46:00 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

Emit all these fields as tr:price with datatype xsd:float and the following dimension values:

Field tr:direction tr:assetType
LoadUpPrice A01 "UP" A05 "Load"
LoadDownPrice A02 "DOWN" A05 "Load"
GenerationUpPrice A01 "UP" A04 "Generation"
GenerationDownPrice A02 "DOWN" A04 "Generation"
NotSpecifiedUpPrice A01 "UP" B20 "Other unspecified"
NotSpecifiedDownPrice A02 "DOWN" B20 "Other unspecified"

We determine the minimal set of independent fields with experiments like this:

# UNIQUE:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d

# Remove AreaTypeCode: DUPS:
csvtk cut -t -f DateTime,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-22 18:30:00.000.*10YFR-RTE------C.*Replacement Reserve (RR)" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-22 18:30:00.000 PT15M  10YFR-RTE------C  SCA  FR    SCA  FR     Replacement Reserve (RR)                        247.00  247.00  A01  AVERAGE  EUR  2022-01-22 18:31:13
2022-01-22 18:30:00.000 PT30M  10YFR-RTE------C  MBA  FR    MBA  FR     Replacement Reserve (RR)                        245.27  245.27       AVERAGE  EUR  2022-01-22 20:31:11
# The same area "FR" is reported as SCA and as MBA

# Remove RegisterItemTypeName: DUPS:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-05 00:15:00.000.*10Y1001A1001A82H.*MBA" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-05 00:15:00.000 PT15M  10Y1001A1001A82H  MBA  DE-LU MBA  DE_LU  Manual Frequency Restoration Reserve (mFRR)     0.00    0.00         AVERAGE  EUR  2022-01-04 00:30:55
2022-01-05 00:15:00.000 PT15M  10Y1001A1001A82H  MBA  DE-LU MBA  DE_LU  Automatic Frequency Restoration Reserve (aFRR)  224.91  47.71        AVERAGE  EUR  2022-01-05 02:00:56

The minimal set is AreaTypeCode,AreaCode,DateTime,RegisterItemTypeName to which we must add the dimensions direction,assetType

We add a computed field tr:priceInEUR, based on the current conversion rate of Currency to EUR

2.7.10.1 PricesOfActivatedBalancingEnergy_17.1.F Model

RDF URL and fixed data:

<dataObs/balancing/PricesOfActivatedBalancingEnergy/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(assetType)>
  a tr:DataObservation;
  tr:dataItem <data/balancing/PricesOfActivatedBalancingEnergy>;

See data/model/PricesOfActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:

2.7.11 UnavailabilityOfGenerationUnits_15.1.A_B

4366 samples.

  • Each unavailability is identified by MRID.
  • There can be multiple versions (tr:version) of each unavailability. We've shown several examples to illustrate these versions.
    • In this particular case, only the Status is changed
    • We retain only the latest version
Field Example1 Example2 RDF Comment
StartTS 2022-01-28 19:00:00.000 2022-01-28 19:00:00.000 Ignored (*)
EndTS 2022-01-31 07:00:00.000 2022-01-31 07:00:00.000 Ignored (*)
TimeZone WET WET tr:timeZone String: "WET, CET, EET"
MRID zzGVOR7oEd5SOJnhsAiapw zzGVOR7oEd5SOJnhsAiapw tr:ident Also use in URL. Separate field to allow matching subsidiary table
Type Planned Planned tr:typeText String: "Planned, Forced"
Status Active Cancelled tr:statusText String: "Active, Withdrawn, Canceled"
AreaCode 10YGB----------A 10YGB----------A tr:controlArea or tr:biddingZone Depending on AreaTypeCode. Must match declared zone/area of the energy resource: Outage-GenerationUnit-area-conform
AreaTypeCode CTA CTA "CTA, BZN" (**). Reflected in the selection of the previous link
AreaName UK(National Grid) CTA UK(National Grid) CTA Matches name of AreaCode
MapCode GB GB Matches notation of AreaCode
PowerResourceEIC 48W000000DIDCB5C 48W000000DIDCB5C tr:energyResource Must exist in Production and Generation Units: Outage-ProductionUnit-exists
UnitName DIDCB5 DIDCB5 Matches notation of PowerResourceEIC
ProductionType Fossil Gas Fossil Gas Matches assetType of PowerResourceEIC
InstalledCapacity 780.00 780.00 tr:installedOutput Convert to datatype xsd:float. Must match the declared installedCapacity of the resource: Outage-GenerationUnit-installedCapacity-conform
AvailableCapacity 370.00 370.00 tr:availableOutput Convert to datatype xsd:float. Must be less than installedCapacity: Outage-GenerationUnit-LT-installedCapacity
Version 1 2 tr:version Retain only the latest version. See next section
Reason Foreseen Maintenance Foreseen Maintenance Ignored (*)
UpdateTime 2018-10-02 14:29:59 2018-10-02 17:26:11

2.7.11.1 UnavailabilityOfGenerationUnitsReasons_15.1.A_B

4996 samples.

Field Example1 Example2 RDF Comment
StartTS 2022-01-28 19:00:00.000 2022-01-28 19:00:00.000 tr:dateStart Convert to datatype xsd:dateTime and valid format
EndTS 2022-01-31 07:00:00.000 2022-01-31 07:00:00.000 tr:dateEnd Convert to datatype xsd:dateTime and valid format
MRID zzGVOR7oEd5SOJnhsAiapw zzGVOR7oEd5SOJnhsAiapw tr:mrid Use in URL.
version 2 2 tr:version Convert to datatype xsd:integer. Separate field to allow picking latest version: retain only the latest version
ReasonCode A95 B19 tr:reason URL in <type/ReasonCode/>
Reason Complementary Information Foreseen Maintenance tr:reasonText Matches the name of codelist value "ReasonCode". Skip "Complementary Information"
ReasonText Outage tr:reasonText Could include long, even bilingual text, not very well formatted
UpdateTime 2018-10-02 17:26:11 2018-10-02 17:26:11 tr:dateUpdated Convert to datatype xsd:dateTime and valid format

2.7.11.2 Unavailability Model

We use the same "synthetic" data item UnavailabilityOfProductionOrGenerationUnits for both this, and UnavailabilityOfProductionUnits (see next).

  • This is possible since the link tr:energyResource is the same in both cases, and that resource should know whether it's a Production or Generation Unit (which is a non-trivial question, given the confusion between the two)
  • It is useful since the app only needs to consult one item when displaying Outages on a map of Production and Generation Units
  • It also obviates the need to duplicate Validation Rules for the two data items

RDF URL and fixed data:

<outage/UnavailabilityOfProductionOrGenerationUnits/(MRID)/(Version)>
  a tr:Outage;
  tr:dataItem <data/outages/UnavailabilityOfProductionOrGenerationUnits>

The RDF model is shown below, but please read subsequent sections regarding intricacies of the conversion process.

data/model/Unavailability.ttl:

2.7.11.3 Unavailability Redundant Records

Each unavailability is reported twice: for the controlArea ("CTA") and the biddingZone ("BZN") of the generator. An example with a generator in Bulgaria's Maritsa Iztok 2 TPP:

Field Example1 Example2
StartTS 2022-01-03 16:57:00.000 2022-01-03 16:57:00.000
EndTS 2022-01-03 18:30:00.000 2022-01-03 18:30:00.000
TimeZone CET CET
MRID 7jf8VaSweKQI27w73v8p8w dcadb3Ls6XlBSYhhQxvItQ
Status Active Active
Type Forced Forced
AreaCode 10YCA-BULGARIA-R 10YCA-BULGARIA-R
AreaTypeCode CTA BZN
AreaName BG CTA BG BZN
MapCode BG BG
PowerResourceEIC 32W001100100045G 32W001100100045G
UnitName TPP_MI2_G5 TPP_MI2_G5
ProductionType Fossil Brown coal/Lignite Fossil Brown coal/Lignite
InstalledCapacity 230.00 230.00
AvailableCapacity 0.00 0.00
Version 1 1
Reason Failure Failure
UpdateTime 2022-01-04 09:15:58 2022-01-04 09:15:58

As you can see the two unavailabilities are precisely the same; except MRID, AreaCode, AreaTypeCode (and MapCode, AreaName derived from them) So each unavailability is reported twice:

  • With different MRID but same Version, UpdateTime
  • Even when the two areas are co-exensive, eg the above is reported against two different roles of 10YCA-BULGARIA-R Bulgaria: as "CTA" and as "BZN"

Optionally, merge the records (so we'll have one record with two outgoing links: both controlArea and biddingZone):

  • Identify the two records by equality of the data fields: StartTS, EndTS, TimeZone, Status, Type, PowerResourceEIC, InstalledCapacity, AvailableCapacity, Version, Reason, UpdateTime
  • Discard one of the MRIDs (eg the one with AreaTypeCode="CTA") and all its data except AreaCode, AreaTypeCode
  • Record its controlArea or biddingZone link (computed from AreaCode, AreaTypeCode) against the URL of the other record

This is non-trivial but will help with displaying Outage data.

2.7.11.4 UnavailabilityReasons Subsidiary Table

This table should be "joined" to the main table by "MRID" (which can be accomplished by using consistent URLs when RDFizing). Examining data for 2022_01 (taken on 2022-01-05):

  • There are "MRID" in the subsidiary table that don't match "MRID" in the main table.
    • Such subsidiary records are useless since they don't mention the "PowerResourceEIC"
    • It is possible the problem is due to data being split across months, so we should get outage data for several years to increase the time window
    • Examples:
      • Outage 0pXGWG97HoHWd2NzlbSmmw (2 versions) is missing in the main table
      • Outage 5TmlidNqpxU_LYlWfJ5bMg (9 versions) is missing in the main table
  • Some outages have more versions in the subsidiary table than the main table:
    • The main table never has more versions than the subsidiary table
    • We checked a few records, and the times reported in the initial matching versions, also match.
    • Example: outage 1F67oMiU54aDdqPoUMdJGg has only 1 version in the main table, but 4 in the subsidiary table.
    • As you can see, subsidiary records iteratively refine the "StartTS, "EndTS" fields.
    • In this case the "Reason" fields remain the same, but in other cases they could change
Field main subsidiary1 subsidiary2 subsidiary3 subsidiary4
StartTS 2022-01-05 00:00:00.000 2022-01-05 00:00:00.000 2022-01-05 07:00:00.000 2022-01-05 07:00:00.000 2022-01-05 06:00:00.000
EndTS 2022-01-06 00:00:00.000 2022-01-06 00:00:00.000 2022-01-05 09:00:00.000 2022-01-05 09:00:00.000 2022-01-05 07:00:00.000
TimeZone CET
MRID 1F67oMiU54aDdqPoUMdJGg 1F67oMiU54aDdqPoUMdJGg 1F67oMiU54aDdqPoUMdJGg 1F67oMiU54aDdqPoUMdJGg 1F67oMiU54aDdqPoUMdJGg
Type Active
Status Forced
AreaCode 10YCZ-CEPS-----N
AreaTypeCode CTA
AreaName CZ CTA
MapCode CZ
PowerResourceEIC 27W-GU-EPVR-B1-L
UnitName EPVR.B1
ProductionType Fossil Gas
InstalledCapacity 200.00
AvailableCapacity 0.00
Version 1 1 2 3 4
ReasonCode B18 B18 B18 B18
Reason Failure Failure Failure Failure Failure
ReasonText
UpdateTime 2022-01-05 07:00:48 2022-01-05 07:00:48 2022-01-05 08:00:57 2022-01-05 08:00:59 2022-01-05 08:00:59
  • The subsidiary table carries more detailed "Reason" info than the main table:

2.7.11.5 Retaining the Latest Unavailability Version

For each MRID of the main (UnavailabilityOfGenerationUnits_15.1.A_B) and subsidiary (UnavailabilityOfGenerationUnitsReasons_15.1.A_B) tables, we want to retain only the latest Version.

  • Both of these are used in the URL and also represented as separate fields.
  • Version (and UpdateTime) is correlated between the tables

That's non-trivial since:

  • Normal RDFization would accumulate all attributes against that URL but we need to remove values from the older version
  • Data fields are split between the two tables
  • The records in both tables need to be sorted (sorting by UpdateTime or by Version produces the same result)
  • The same field is spelled differently between the two tables: Version in main, version in subsidiary

2.7.12 UnavailabilityOfProductionUnits_15.1.C_D

This data item is mapped in exactly the same way as UnavailabilityOfGenerationUnits_15.1.A_B, and using the same synthetic data item URLs. The same special processing applies.

Field Example RDF Comment
StartTS 2022-01-01 00:00:00.000 Use value from the Reasons subsidiary table
EndTS 2023-01-01 00:00:00.000 Use value from the Reasons subsidiary table
TimeZone CET tr:timeZone
MRID ROgezRGFNz5CJzUSUkx2-Q tr:ident Also use in URL
Status Active tr:typeText
Type Planned tr:statusText
AreaCode 10YHU-MAVIR----U tr:controlArea or tr:biddingZone Depending on AreaTypeCode
AreaTypeCode BZN Reflected in the previous link
AreaName HU BZN Matches name of AreaCode
MapCode HU Matches notation of AreaCode
PowerResourceEIC 15WVERTES----PPX tr:energyResource
UnitName Oroszlányi Eromu Matches notation or notationAlt of PowerResourceEIC
ProductionType Fossil Brown coal/Lignite Matches assetType of PowerResourceEIC
Version 1 tr:version Retain only the latest version
VoltageConnectionLevel 120.00 Matches highVoltageLimit of PowerResourceEIC
InstalledCapacity 220.00 tr:installedOutput
AvailableCapacity 0.00 tr:availableOutput
Reason Shutdown Use value from the Reasons subsidiary table
UpdateTime 2021-12-14 10:01:32 Use value from the Reasons subsidiary table

UnavailabilityOfProductionUnitsReasons_15.1.C_D fields:

Field Example RDF Comment
StartTS 2022-01-01 00:00:00.000 tr:dateStart Convert to xsd:dateTime and correct format
EndTS 2023-01-01 00:00:00.000 tr:dateEnd Convert to xsd:dateTime and correct format
MRID BGaTG2bh6VYl7K4w2RyHmw tr:mrid Use in URL
version 1 tr:version
ReasonCode B20 tr:reason URL in <type/ReasonCode/>
Reason Shutdown tr:reasonText Skip "Complementary Information"
ReasonText tr:reasonText
UpdateTime 2021-12-14 10:01:36 tr:dateUpdated

3 Data Validation

Validating Transparency data is the most important objective of the project. We'll elaborate up to 40 data validation and quality criteria over various data items.

Based on them we will provide:

  • A DQA Dashboard (Data Quality Assessment) to display the count of data issues per rule and area, drill down to individual issues, (optionally) show trends over time
  • Data quality recommendations that may be used to recommend regulatory changes.

Improving data quality will have positive long-term effects on the energy market. Furthermore, by having more accurate master data, it will provide a foundation for a better Energy KG in the future.

3.1 Describing Validation Rules

We describe validation rules in a strict way, allowing us to then extract them from this document and serve as the basis for implementation. Rules are expressed in a semantic way using the SHACL ontology (W3C standard), which allows us to use a number of existing validators. Each rule is represented as sh:NodeShape and has the following fields:

  • Rule URL: from the heading name in this document
  • Named graph: each rule is emitted in its own graph to be passed to the SHACL engine one by one (using sh:shapeGraph)
  • Name (sh:name): derived from the rule URL by discarding dashes (eg "parentResource semiInverse generationUnit")
  • Order (sh:order): order of rule execution
    • IMPORTANT: in the UI, sort groups and rules alphabetically not by sh:order
  • Applies to (tr:appliesTo): kind of area the rule applies to, used for grouping (see next section)
  • Rule Group (sh:group): used as second level of grouping (for categorization and better UI)
  • Description (sh:description): detailed description in the form of a "should" statement
  • Message (sh:message): a template with SPARQL variables, in case additional details should be provided in the validation results
  • Data Items (tr:dataItem): data item(s) being validated (converted to several URLs as per kb.ttl)
  • Fields (tr:fields): CSV or XML field(s) being validated (using XPath notation for XML) (a single string)
  • Severity (sh:severity):
    • Violation: hard constraints, eg PowerUnit and its GenerationUnits should be in the same country
    • Warning: soft constraints, eg actuals should not deviate from forecasts more than 15%
  • Implementation as SHACL triples, possibly including "owned" sh:PropertyShape nodes and blank nodes
  • Correction (tr:sparqlUpdate): the next subsubsection after the rule: which Data Correction to apply

Taking the rule parentResource-semiInverse-generationUnit as example, here's an RDF model of representing rules. This also shows the implementation (sh:property triples and blank nodes).

See data/model/ValidationRule.ttl, though this is emitted as .trig in graph <graph/shape/parentResource-semiInverse-generationUnit>

3.2 Rule Applicability

ENTSOE data is "indexed" by Area and/or Country Code (see sections Areas and Countries for details about these entities).

We'd like each validation result to point to the Area or Country related to it, in order to have a better summary of errors per Area/Country. Examples:

  • Production Units are reported in duplicate, against both controlArea and biddingZone
  • Each unavailability (outage) is reported in duplicate, against both controlArea and biddingZone
  • The EIC File specifies only countryCode for each resource. In particular, when validating Trader VAT numbers, we can only link to country code.

In order to deal with the variety of areas/countries and with missing values, validation results will have a field tr:displayArea (always populated).

Each rule specifies tr:appliesTo, which is tr:biddingZone, tr:controlArea, tr:country, tr:countryCode (there can be multiple values).

  • If a resource is reported in two areas in duplicate, we use only one of them to avoid reporting the same error twice.
  • But installedCapacity-Aggregated-vs-Per-Unit is checked in both tr:biddingZone, tr:controlArea

3.2.1 AppliesTo CountryCode

Counts:

  • 67 country codes are used in EIC data. This applies mostly to VAT checking rules (see below).
  • The file countries.csv has 42 countries with power resources in ENTSOE (plus "SEM" Ireland and Northern Ireland, which is not really a country)
  • We consider the 35 country codes without power resources to be a "long tail".

There are many Trader countries outside of the ENTSOE jurisdiction.

We populate tr:displayArea of validation results as follows, dealing with both missing country codes and the "long tail":

  • Get Node.countryCode where Node is sh:focusNode (the node that caused the error)
  • If countryCode is missing: "none"
  • If countryCode is not found in countries.csv: "other"
  • Otherwise: countryCode

3.2.2 AppliesTo Area

The Areas that data is related to are controlArea, biddingZone, country (others listed below are not yet being validated):

  • Production Units are reported in duplicate: one tr:controlArea and one tr:biddingZone (though the data model permits multiple bidding zones). We have checked this with the following query:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?CTA ?BZN (count(*) as ?c) {
  {select (count(?cta) as ?CTA) (count(?bzn) as ?BZN) {
    ?x a tr:ProductionUnit
    optional {?x tr:controlArea ?cta}
    optional {?x tr:biddingZone ?bzn}
  } group by ?x}
} group by ?CTA ?BZN
  • Generation Units don't have direct links to areas, but the areas can be reached through their parent (^tr:generationUnit)
  • Unavailabilities (Outages) are reported in duplicate, against both tr:controlArea and tr:biddingZone
  • AcceptedAggregatedOffers, ActivatedBalancingEnergy are reported in tr:marketBalanceArea
  • AggregatedBalancingEnergyBids are reported in tr:schedulingArea
  • ActualGenerationOutputPerGenerationUnit are reported in tr:controlArea
  • CurrentGenerationForecastForWindAndSolar are reported in tr:controlArea, tr:biddingZone and tr:country
  • InstalledGenerationCapacityAggregated is reported in tr:biddingZone, tr:controlArea, tr:country.
    • Its respective InstalledGenerationCapacityComputed in tr:biddingZone, tr:controlArea.
    • Although the base Production Units capacity numbers are in duplicate, summing them across tr:biddingZone, tr:controlArea does not produce duplicate numbers
    • Therefore the rule installedCapacity-Aggregated-vs-Per-Unit is checked in both tr:biddingZone, tr:controlArea

Counts:

Populating tr:displayArea of tr:ValidationResult:

  • Get tr:sourceShape/tr:appliesTo as ?areaProp. There can be multiple values
  • Get sh:focusNode as ?node (the node that caused the error)
  • If ?node is tr:GenerationUnit, get its parent Production Unit (^tr:generationUnit) because Generation Units are not directly attached to areas
  • Use ?areaProp/tr:notation of ?node
  • Otherwise, use "none"
  • Save ?areaProp as ValidationCount.appliesTo

3.3 Summary Validation Results

The Summary Results are counts of validation results that enable

  • Grouping per applicability, group and rule
  • Breakdown and Totals per rule and area
  • Indication of severity (Violation vs Warning)
  • (CANCELED): Calculating prevalence (percentage of errors compared to total records)
  • Drill-down to individual validation results

Summaries are represented as tr:ValidationCount and have the following fields (another option would be to use the Data Quality Vocabulary (DQV)):

  • Rule (sh:sourceShape): validation rule (resource, from which the full rule description can be obtained, including Definition and Severity)
  • Area (tr:displayArea): country/zone/area (string). See section Rule Applicability
  • Count (tr:count): count of errors/warnings (integer)
  • Date (tr:date): when the counting was done (full xsd:dateTime). Please note that we retain only one set of validation results

An RDF model of summary results is in data/model/ValidationCount.ttl and the following diagram:

3.3.1 Summary Validation Results Mockup

Rules per Country Code BG DE .. RS other none Total
EIC
.. function not null (i) 5 3 2 10
.. function spelling (i) 3 3 1 7
.. function specific
.. function compatible with EIC hard
.. function compatible with EIC soft
VAT
.. VAT country prefix 5 4 9
.. VAT per country syntax 8 8
.. VAT country exists 10 10
.. VAT country conform
.. VAT per country exists
TOTAL 8 6 .. 1 23 6 44
Rules per Control Area BG CA-DENMARK DE-50HERTZ DE-AMPRION-SCHED .. UA-IPS none Total
ProdUnits
.. ProductionUnit cannot be GenerationUnit 4 4
.. parentResource semiInverse generatingUnit 2 2
.. ProductionUnits and GenerationUnits in EIC 5 100 5
.. EIC ProductionUnits GenerationUnits single
.. EIC ProductionUnits GenerationUnits assetType 5 3 8
.. EIC ProductionUnits nominalP highVoltageLimit
.. EIC GenerationUnits nominalP
.. ProductionUnit highVoltageLimit not zero 3 3
.. ProductionUnit nominalP not zero
.. only ProductionUnit or GenerationUnit 12 12
.. no GenerationUnit at top level
.. ProductionUnit and GenerationUnit same responsibleParticipant
.. ProductionUnit and GenerationUnit same country
.. ProductionUnit Zone or Area same country
.. generatingUnit function ProductionUnit
.. generatingUnit function GenerationUnit
.. location informative 23 23
.. ProductionUnit GenerationUnit capacity
Transactions
.. installedCapacity Aggregated vs Per Unit
.. actualOutput vs nominalP 10
TOTAL 18 6 8 121 .. 23 100 157

Notes:

  • Rules are shown grouped by Applies To, then by Group
    • Applies To and Groups are sorted alphabetically
  • Rules are sorted alphabetically by name
  • (i) indicates an icon: red for Violation, orange for Warning
    • On hover over the name or icon, show the rule Description
    • On click over the name or icon, jump to the respective section in this document (open in another window)
  • Table columns are tr:displayArea sorted alphabetically, but "other" and "none" come last
  • Cells show the count with a hyperlink
  • Clicking on a count displays the individual validation results for that rule and country/area (see next)
  • Totals are computed for each row and column

3.4 Individual Validation Results

Individual results (exceptions) are represented as sh:ValidationResult and include the following fields.

We'll use this example: consider the rightmost parentResource relation in this diagram, which is wrong (should be inverse of generationUnit):

  • Rule (sh:sourceShape): rule that was violated
  • Node (sh:focusNode): node that caused the violation (eg EIC of "NPP_KOZLODUY_G10")
  • Value (sh:value): erroneous value (eg EIC of "TPP_MI_2", the object of parentResource)
  • (CANCELED) Expected: expected value, if any (eg EIC of "NPP_KOZLODUY", the subject of generationUnit)
  • Display Area (tr:displayArea): country/zone/area where the violation occurred (eg "BG"). Computed according to section Rule Applicability, can be "none" or "other"
  • Country (tr:countryCode): only for rules that apply to Country, provides extra detail if displayArea is "other"
  • Severity (sh:resultSeverity): severity level of the violation: Violation or Warning (copied from the respective rule)
  • Message (sh:resultMessage): additional details, use only if the source shape has sh:message because the standard messages generated by the SHACL engine are most often not useful.
    • Use this check: if <result>/sh:sourceShape/sh:sparql?/sh:message then use <result>/sh:resultMessage

An RDF model of individual results is in data/model/ValidationResult.ttl and the following diagram:

For Node, Value (and CANCELED: Expected) we print:

  • If resource:
    • If tr:EnergyResource: eic, and also notation, name to ease comprehension
    • Otherwise: the last 2 components of the URL, eg 32W001100100017L/2022-01-01T11:00:00.000
      • Also fetch notation, name of the linked tr:EnergyResource
    • Include a link to GDB Workbench so the user can examine the data: https://transparency.ontotext.com/graphdb/resource?uri=<node>
  • If literal (string or number): just the literal

3.4.1 Individual Validation Results Mockup for EIC VAT

Rule: EIC-VAT: VAT country conform [back]

  • Description: The first two chars of VAT must equal the country code (except "GR" which is spelled "EL" in VAT codes)
  • Data Items: EIC file (allocated-eic-codes XML)
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
  • Area: other
  • Count: 2 Violations (as of 2022-02-27T10:23:34)
Resource Notation Name Value Area
59XREALPETROL11F REALPETROL REAL PETROL HOLDING KFT HU24189514 IT
22X20110811----W BE_INEOS_CV_LVM INEOS CHLORVINYLS LIMITED GB768506886 BE

<< < 1 of 5 > >>

Notes: given a tr:ValidationCount, shows all individual results with that Rule (sh:sourceShape) and tr:displayArea. Header:

  • Show rule Group
  • Show rule Name in bold, and a sa hyperlink to the respective section in this document (open in another window)
  • Show link [back] to return to the summary results
  • Show Description
  • Show each Data Item, with hyperlinks (see above, and the next subsection)
  • Show the string Fields
  • Show displayArea
  • Show count
  • Show severity in bold and colored icon: red (Violations), orange (Warnings)
  • Show date

Table:

The columns depend on the kind of item being validated (EIC, ProductionAndGenerationUnits, Data Observations, Outages).

  • The next subsection shows another example.

3.4.2 Individual Validation Results Mockup for Actual Generation Output Per Generation Unit

Rule: Arithmetics: ActualGenerationOutputPerGenerationUnit actualOutput LTE installedOutput [back]

  • Description: ActualGenerationOutput of each Generation Unit should not be greater than InstalledGenCapacity for each date
  • Data Items: Actual Generation Output per Generation Unit (ActualGenerationOutputPerGenerationUnit_16.1.A CSV), portal, description
  • Fields: ActualGenerationOutput, InstalledGenCapacity
  • Area: BG
  • Count: 2 Violations (as of 2022-02-27T10:23:34)
Resource Value Area Message
32W001100100017L/2022-01-01T11:00:00.000 1001 Should be less than 1000
32W001100100048A/2022-01-01T09:00:00.000 231 Should be less than 230

<< < 1 of 5 > >>

The same notes apply as in the previous section, except data columns:

  • The node to blame (sh:focusNode) corresponds to a dataObs and the hyperlink shows only the "suffix" (last 2 URL components): EIC and dateTime
  • Notation and Name come from the energy resource linked to the node (sh:focusNode/tr:energyResource)
  • Value is the wrong value (sh:value): to illustrate, we've shown Actual Output that exceeds Installed Capacity by 1 MW
  • Area is the displayArea.
    • Since this rule appliesTo controlArea, it comes from the controlArea linked to the node (sh:focusNode/tr:controlArea/tr:notation)
  • Message: sh:resultMessage, but used only if the shape has sh:message

3.5 Data Correction

We use some inference (SPARQL updates) to:

  • Improve the structure of data by making implicit info explicit (eg eicCode)
  • Make validation easier by making explicit fields, and deriving extra fields
  • Correct some key data so that:
    • Subsequent validation rules don't report "false positives", i.e. errors that have the same root cause as already reported errors
    • All subsequent rules are triggered, so we don't miss exceptions ("false negatives")

Further subsections define and implement data corrections as SPARQL Updates, and the sequence and interleaving of validation rules and corrections.

  • Each correction is described in a subsection after the respective rule, and attached as tr:sparqlUpdate to it. We do not use SHACL Rules (part of SHACL Advanced Fetaures) because these are limited to only invalid nodes.
  • All validations and corrections are run after initial data loading, and after each data update
  • ValidationResults capture the original wrong value in sh:value, and corrections don't overwrite this captured value, so it can be reported in the DQA Dashboard.

3.6 Validation Rules

This section describes precisely all validation rules implemented by TEKG. The semantic definition and SHACL implementation of each rule is extracted from this section.

3.6.1 function-not-null

  • Rule Group: EIC-function
  • Description: Each EIC resource should have a non-null "function" ("Valid EIC Function needed" is effectively null)
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
  • Severity: Violation
  • Applies to: countryCode
sh:targetClass tr:EnergyResource;
sh:property [
  sh:path tr:function;
  sh:minCount 1;
  sh:not [sh:hasValue "Valid EIC Function needed"]].

SPARQL check:

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * where {
    ?s a tr:EnergyResource .
    {
        FILTER NOT EXISTS {
            ?s tr:function []
        }
    } UNION {
        ?s tr:function "Valid EIC Function needed"
    }
}

3.6.2 function-spelling

  • Rule Group: EIC-function
  • Description: Functions of EIC resources should be spelled consistently
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
  • Severity: Violation
  • Applies to: countryCode

According to the following correction table (data/turtle/small/function-valid.ttl):

functionInvalid functionValid
balance group Balance Group
It-System IT-system
LNG terminal LNG Terminal
Generation Generation Unit
Production Plant Production Unit

Notes about the first 3 lines (case normalization):

  • There is no lower/upper case sensitivity required for LIOs to upload EIC data
  • The Transparency Portal UI always capitalizes each word of the function
  • We think that this case normalization should be done by the data storage layer, not at the UI

Notes about the last 2 lines:

  • doc Functions p4 lists both variants
  • However, ENTSOE communicated to us: "these functions are under review, and it is foreseen to have only Generation Unit and Production Unit at the end"
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select $this ?s2 {
        $this a tr:EnergyResource; tr:function ?invalid.
        ?s2 a tr:FunctionValid; tr:functionInvalid ?invalid}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "Will be corrected to {?valid}";
  sh:select """
    select $this (tr:function as ?path) (?invalid as ?value) ?valid {
      $this tr:function ?invalid.
      [] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid}"""].

SPARQL check:

select ?this {
      ?this a tr:EnergyResource; tr:function ?invalid.
      [] a tr:FunctionValid; tr:functionInvalid ?invalid}

3.6.2.1 correct-function-spelling

Misspellings of functions (eg "Production Plant", "Generator") are corrected to enable further checks. We use an RDF mapping table that incorporates correct and misspelled functions, with rows like this:

[] a tr:FunctionValid; tr:functionInvalid "Production Plant"; tr:functionValid "Production Unit".
[] a tr:FunctionValid; tr:functionInvalid "Generation";       tr:functionValid "Generation Unit".

The spelling correction is done by this SPARQL update:

base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

delete {graph <graph/allocated-eic-codes> {?x tr:function ?invalid}}
insert {graph <graph/allocated-eic-codes> {?x tr:function ?valid}}
where {
  ?x a tr:EnergyResource; tr:function ?invalid.
  [] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid
}

3.6.3 function-specific

  • Rule Group: EIC-function
  • Description: An EIC resource with a specific function doesn't also need "Resource Object" because that's unspecific, so it should be elided
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
  • Severity: Warning
  • Applies to: countryCode

This query finds 11 "Resource Objects" that have a more specific function:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x tr:function "Resource Object", ?fun
  filter(?fun != "Resource Object")
}

Examples:

  • 30W-CEE-COGEA--T: "Generation Unit", "Resource Capacity Market Unit": elide "Resource Object"
  • 45W000000000141O: "Production Unit", "Load": elide "Resource Object"
sh:targetClass tr:EnergyResource;
sh:or (
  [sh:path tr:function; sh:maxCount 1]
  [sh:path tr:function; sh:not [sh:hasValue "Resource Object"]]).

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
    {
        SELECT ?this (COUNT(?fun) as ?cnt) {
            ?this a tr:EnergyResource;
                  tr:function ?fun .
        } GROUP BY ?this
    }
    ?this tr:function ?fun .
    FILTER (?cnt > 1 && ?fun = "Resource Object")
}

3.6.4 Top-Level-must-have-only-function-ProductionUnit

  • Rule Group: ProductionUnit-Structure
  • Description: The top level resources in Production and Generation Units must have function "Production Unit", and not any other function
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
  • Severity: Violation
  • Applies to: biddingZone

Production and Generation Units data is supposed to have Production Units at the top level, and Generation Units at the bottom level. In practice, there are many "Production Units" mislabeled with function "Generation Unit" and vice versa.

This query counts all invalid situations:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select
  (count(?prodNotProd) as ?prodNotProd1)
  (count(?prodIsGen)   as ?prodIsGen1)
  (count(?genNotGen)   as ?genNotGen1)
  (count(?genIsProd)   as ?genIsProd1)
{
  {?prodNotProd a tr:ProductionUnit filter not exists{?x tr:function "Production Unit"}} union
  {?prodIsGen   a tr:ProductionUnit filter     exists{?x tr:function "Generation Unit"}} union
  {?genNotGen   a tr:GenerationUnit filter not exists{?x tr:function "Generation Unit"}} union
  {?genIsProd   a tr:GenerationUnit filter     exists{?x tr:function "Production Unit"}}
}
prodNotProd1 prodIsGen1 genNotGen1 genIsProd1
0 2499 0 3140
sh:targetClass tr:ProductionUnit;
sh:property [
  sh:path tr:function;
  sh:maxCount 1;
  sh:hasValue "Production Unit"].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select distinct ?x ?cc {
    {
        SELECT (COUNT(?function) as ?count) ?x {
            ?x a tr:ProductionUnit ;
               tr:biddingZone/tr:notation ?cc .
            OPTIONAL {
                ?x tr:function ?function .
            }
        } GROUP BY ?x
    }
    filter (not exists {
            ?x tr:function "Production Unit"
        } || ?count > 1)
} limit 1000

3.6.5 Bottom-Level-must-have-only-function-GenerationUnit

  • Rule Group: ProductionUnit-Structure
  • Description: The bottom level resources in Production and Generation Units must have function "Generation Unit", and not any other function
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names
  • Severity: Violation
  • Applies to: biddingZone
sh:targetClass tr:GenerationUnit;
sh:property [
  sh:path tr:function;
  sh:maxCount 1;
  sh:hasValue "Generation Unit"].

SPARQL check

"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit
  optional {?x tr:function ?fun filter (?fun !=""Generation Unit"")}
  filter (not exists {?x tr:function ""Generation Unit""}
        || bound(?fun))
} limit 100"

3.6.6 parentResource-semiInverse-generationUnit

  • Rule Group: ProductionUnit-Structure
  • Description: The relation parentResource (in EIC) should be "semi-inverse" of generationUnit (in Production and Generation Units), i.e. GeneratingUnit.parentResource should be inverse of generationUnit
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICParent_MarketDocument.mRID, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
  • Severity: Violation
  • Applies to: biddingZone
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select $this ?s2 {
    $this a tr:GenerationUnit ;
       ^tr:generationUnit ?s2 ;
        tr:parentResource ?parent2 .
      FILTER (?s2 != ?parent2)
      ?s2 a tr:ProductionUnit .
  }
  """];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:select """
      select $this ?value {
        $this a tr:GenerationUnit ;
           ^tr:generationUnit ?value ;
            tr:parentResource ?parent2 .
          FILTER (?value != ?parent2)
          ?value a tr:ProductionUnit .
      }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit ;
     ^tr:generationUnit ?parent ;
      tr:parentResource ?parent2 .
    FILTER (?parent != ?parent2)
    ?parent a tr:ProductionUnit .
}

3.6.7 ProductionUnits-and-GenerationUnits-in-EIC

  • Rule Group: ProductionUnit-Structure
  • Description: All Production Units and Generation Units must be described in the master EIC file, thus have EIC code, name, notation, function.
  • Data Items: generation/ProductionAndGenerationUnits, basic/allocated-eic-codes
  • Fields: Configuration_MarketDocument/TimeSeries/mRID, EIC_MarketDocument/EICCode_MarketDocument/mRID
  • Severity: Violation
  • Applies to: biddingZone

There are 938 power units (Production or Generation Units) that are missing from the EIC file:

base  <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  values ?type {tr:ProductionUnit tr:GenerationUnit}
  ?x a ?type
  filter not exists {
    {graph <graph/allocated-eic-codes> {?x tr:eic []}} 
  }
}

Eg 47W000000000318I has assetType, biddingZone, controlArea, providerParticipant, generatorUnit, highVoltageLimit, installedOutput, location, notationAlt but not EIC data.

sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property [
  sh:path tr:eic;
  sh:minCount 1].

Notes:

  • Multiple SHACL targets are allowed: "union of terms produced by the individual targets that are declared by the shape"
  • Note: this below is no longer relevant, because tr:ProductionUnit, tr:GenerationUnit are disjoint

SPARQL check:

"base  <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  values ?type {tr:ProductionUnit tr:GenerationUnit}
  ?x a ?type
   filter not exists {
     {graph <graph/allocated-eic-codes> {?x tr:eic []}} # exists in <graph/correction/prodUnit-add-basic-data-to-EIC>
   }
}"

3.6.7.1 prodUnit-add-basic-data-to-EIC

For Power Units missing from the EIC file, we add the following basic EIC fields:

  • rdf:type tr:EnergyResource
  • function from the subclass ProductionUnit or GenerationUnit. Note: The Production and Generation Units conversion emits one of these subclasses of tr:EnergyResource:
    • top level: tr:ProductionUnit
    • bottom level: tr:GenerationUnit
  • eic from the URL (and next section calculates eicType)
  • notation from notationAlt
  • (CANCEL: no such field: countryCode in biddingZone or controlArea)
base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

clear silent graph <graph/correction/prodUnit-add-basic-data-to-EIC>;
insert {graph <graph/correction/prodUnit-add-basic-data-to-EIC> {
  ?x a tr:EnergyResource;
    tr:function ?func;
    tr:eic ?eic;
    tr:notation ?notation;
}} where {
  values (?type ?func) {
    (tr:ProductionUnit "Production Unit")
    (tr:GenerationUnit "Generation Unit")
  }
  ?x a ?type
  filter not exists {?x tr:eic []}
  bind((replace(str(?x),".*/","")) as ?eic)
  optional {?x tr:notationAlt ?notation}
}

3.6.8 ProductionUnits-GenerationUnits-described-once

  • Rule Group: ProductionUnit-Structure
  • Description: Production and Generation Units should be described only once across their applicable areas, or if multiple times then all fields should be reported consistently
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP
  • Severity: Violation
  • Applies to: biddingZone

For example, the following Units are reported with different fields in Bidding Zone vs Control Area

  • 18WEGREEN-1234-3: different installedOutput
  • 47W000000000355C: different installedOutput
  • 47W000000000356A: different installedOutput
  • 18WEGREEN-1234-3: different dateImplemented
  • 11W0-0000-0026-Y: different location
  • 49W0000000000342: different notationAlt

Note: highVoltageLimit, assetType, providerParticipant are always consistent. We checked with a query like this:

select * {
  ?x tr:highVoltageLimit ?y1,?y2
  filter(str(?y1)<str(?y2))
}
sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property <shape/property/100>, <shape/property/101>, <shape/property/102>, <shape/property/103>.

<shape/property/100> a sh:PropertyShape; sh:path tr:installedOutput; sh:maxCount 1.
<shape/property/101> a sh:PropertyShape; sh:path tr:dateImplemented; sh:maxCount 1.
<shape/property/102> a sh:PropertyShape; sh:path tr:location;        sh:maxCount 1.
<shape/property/103> a sh:PropertyShape; sh:path tr:notationAlt;     sh:maxCount 1.

SPARQL check

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
SELECT * WHERE {
    {
        select ?x (COUNT(?installed) as ?installedCount) (COUNT(?date) as ?dateCount) (COUNT (?loc) as ?locationCount) (COUNT(?not) as ?notationCount) {
            ?x a tr:ProductionUnit, tr:GenerationUnit ;
               tr:installedOutput ?installed ;
               tr:dateImplemented ?date ;
               tr:location ?loc ;
               tr:notationAlt ?not .
        } GROUP BY ?x
    } 
    FILTER(?installedCount > 1 || ?dateCount > 1 || ?locationCount > 1 || ?notationCount > 1)
}

3.6.9 EIC-in-ProductionUnit-data

  • Rule Group: ProductionUnit-Structure
  • Description: EIC resources with functions "Production Unit" and "Generation Unit" should be described in the Production and Generation Units data item
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, Configuration_MarketDocument/TimeSeries/MktPSRType/psrType, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/generatingUnit_PSRType.psrType
  • Severity: Violation
  • Applies to: countryCode
sh:targetClass tr:EnergyResource;
sh:or (
  [sh:not [sh:path tr:function; dash:hasValueIn ("Production Unit" "Generation Unit")]]
  [        sh:path rdf:type;    dash:hasValueIn (tr:ProductionUnit tr:GenerationUnit)]).

SPARQL check:

"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dash: <http://datashapes.org/dash#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
  ?x a tr:EnergyResource; tr:function ?fun.
  filter(?fun in (""Production Unit"", ""Generation Unit""))
  filter not exists {
    ?x a ?type
    filter(?type in (tr:ProductionUnit, tr:GenerationUnit))
  }
}"

3.6.9.1 add-eicType

This correction adds field eicType based on the third char of eic.

  • It connects each EnergyResource to codelist <type/Eic/> (where notation is the char, name is the type).
  • For example, <type/Eic/W> is "Resource Object"
  • This can be seen in the models EIC Mapping and Combined Mapping.
base       <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>

clear silent graph <graph/correction/eicType>;
insert {graph <graph/correction/eicType> {
  ?x tr:eicType ?type
}} where {
  ?x tr:eic ?eic
  bind(substr(?eic,3,1) as ?notation)
  ?type tr:codeList <type/Eic>; tr:notation ?notation
}

(There is no particular reason to run this right after the previous validation rule.)

3.6.10 EIC-compatible-with-function

  • Rule Group: EIC-function
  • Description: Each EIC type (third char of EIC) should be compatible with the function(s) of the resource as per table below. To fix this, the function or EIC of these resources would need to be changed. But changing EIC is not a good idea (nor it is a good idea to embed information in identifiers)
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, EIC_MarketDocument/EICCode_MarketDocument/mRID
  • Severity: Violation
  • Applies to: countryCode

According to the following table (data/turtle/small/eicType-valid.ttl:

function eicTypeInvalid eicTypeValid
System Operator W Resource Object X Party
Control Block X Party Y Area or Domain
Market Area X Party Y Area or Domain

For the implementation we use SPARQL-based Constraints.

  • We first find all offending nodes using one query (sh:SPARQLTarget)
  • Then a second query (a sh:SPARQLConstraint) is ran for each offending node. This "double-query" approach reduces execution time because the offenders are a small subset of all tr:EnergyResource
  • Note: the inline-bind (tr:function as ?path) doesn't work in GDB (GDB-6713)
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select distinct $this ?s2 {
        $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
        ?s2 a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.} """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:function as ?path) (sample(?func) as ?value) {
      $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
      [] a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.
    } group by $this ?path"""].

SPARQL check:

select distinct $this ?s2 {
        $this a tr:EnergyResource;    tr:eicType ?type; tr:function ?func.
        ?s2 a tr:EicTypeValid;  tr:eicTypeInvalid ?type; tr:function ?func.}

3.6.11 function-compatible-with-EIC

  • Rule Group: EIC-function
  • Description: Each function of a resource should be compatible with its EIC type (third char of EIC) as per "List of allowed functions for the EIC codes". Misspellings are not listed here. This is a soft constraint.
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/Function_Names, EIC_MarketDocument/EICCode_MarketDocument/mRID
  • Severity: Warning
  • Applies to: countryCode

According to turtle/small/eicType-function.ttl, which is RDFized from docs/eicType-function-allowed.tsv, which is extracted from "List of allowed functions for the EIC codes".

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
      $this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
      filter not exists {?s2 tr:functionValid ?func}} """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:function as ?path) (sample(?func) as ?value) {
      $this tr:eicType ?type; tr:function ?func
      filter not exists {?type tr:functionValid ?func}
    } group by $this ?path"""].

SPARQL check:

      select distinct $this ?s2 {
        $this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
        filter not exists {?s2 tr:functionValid ?func}} """];

3.6.12 ProductionUnits-installedOutput-highVoltageLimit

  • Rule Group: ProductionUnit-Structure
  • Description: Production Units should have installedOutput and highVoltageLimit. This is a soft constraint
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/production_PowerSystemResources.highVoltageLimit
  • Severity: Warning
  • Applies to: biddingZone
sh:targetClass tr:ProductionUnit;
sh:property <shape/property/104>, <shape/property/105>.

<shape/property/104> a sh:PropertyShape; sh:path tr:installedOutput;  sh:minCount 1.
<shape/property/105> a sh:PropertyShape; sh:path tr:highVoltageLimit; sh:minCount 1.

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:ProductionUnit
  filter (not exists {?x tr:installedOutput []}
       || not exists {?x tr:highVoltageLimit []})
}

3.6.13 GenerationUnits-installedOutput

  • Rule Group: ProductionUnit-Structure
  • Description: Generation Units should have installedOutput. This is a soft constraint
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
  • Severity: Warning
  • Applies to: biddingZone
sh:targetClass tr:GenerationUnit;
sh:property [sh:path tr:installedOutput;  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?x a tr:GenerationUnit
  filter not exists {?x tr:installedOutput []}
}

3.6.14 ProductionUnit-highVoltageLimit-not-zero

  • Rule Group: ProductionUnit-Data
  • Description: highVoltageLimit should not be zero: such data should be omitted
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/production_PowerSystemResources.highVoltageLimit
  • Severity: Violation
  • Applies to: biddingZone
sh:targetClass tr:ProductionUnit;
sh:not [sh:path tr:highVoltageLimit; sh:hasValue "0"^^xsd:float].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:highVoltageLimit ?hvl .
    FILTER (?hvl = "0"^^xsd:float)
}

3.6.15 ProductionUnit-installedOutput-not-zero

  • Rule Group: ProductionUnit-Data
  • Description: installedOutput should not be zero, except in Production Units that are offline (perhaps kept as "cold reserve")
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
  • Severity: Warning
  • Applies to: biddingZone
sh:targetSubjectsOf tr:installedOutput;
sh:not [sh:path tr:installedOutput; sh:hasValue "0"^^xsd:float].

Some examples:

SPARQL check:

BASE         <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    GRAPH <graph/ProductionAndGenerationUnits> {
        $this tr:installedOutput ?io .
        FILTER (?io = "0"^^xsd:float)
    }
}

3.6.16 ProductionUnit-and-GenerationUnit-same-responsibleParticipant

  • Rule Group: ProductionUnit-Structure
  • Description: A Production Unit and all its Generation Units should have the same Responsible Participant. This is a soft constraint since exceptions are possible
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICResponsible_MarketParticipant.mRID, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
  • Severity: Warning
  • Applies to: biddingZone
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
      $this a tr:ProductionUnit ;
            tr:generationUnit/tr:responsibleParticipant ?genRP ;
                             tr:responsibleParticipant ?RP .
      FILTER (?genRP != ?RP)
      $this tr:generationUnit ?s2
  }
  """];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:select """
select distinct $this ?value {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:responsibleParticipant ?value ;
                           tr:responsibleParticipant ?RP .
    FILTER (?value != ?RP)
}
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:responsibleParticipant ?genRP ;
                           tr:responsibleParticipant ?RP .
    FILTER (?genRP != ?RP)
}

3.6.17 ProductionUnit-and-GenerationUnit-same-country

  • Rule Group: ProductionUnit-Structure
  • Description: A Production Unit and all its Generation Units should have the same country code
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources
  • Severity: Violation
  • Applies to: countryCode
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
        $this a tr:ProductionUnit ;
              tr:generationUnit ?s2 ;
              tr:countryCode ?RP .
        ?s2 tr:countryCode ?genRP .
        FILTER (?genRP != ?RP)
    }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?value {
        $this a tr:ProductionUnit ;
              tr:generationUnit ?s2 ;
              tr:countryCode ?RP .
        ?s2 tr:countryCode ?value .
        FILTER (?value != ?RP)
    }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          tr:generationUnit/tr:countryCode ?genRP ;
                           tr:countryCode ?RP .
    FILTER (?genRP != ?RP)
}

3.6.18 ProductionUnit-Zone-or-Area-same-country

  • Rule Group: ProductionUnit-Structure
  • Description: The country code of a Production Unit and all its Bidding Zones and Control Areas should be the same (when present). This is a soft constraint since areas and zones are not co-extensive with countries
  • Data Items: basic/allocated-eic-codes, generation/ProductionAndGenerationUnits
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country, Configuration_MarketDocument/TimeSeries/biddingZone_Domain.mRID, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
  • Severity: Warning
  • Applies to: countryCode
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
        $this a tr:ProductionUnit ;
              (tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
                                            tr:countryCode ?RP .
        FILTER (?genRP != ?RP)
        $this (tr:biddingZone | tr:controlArea) ?s2 .
        ?s2 tr:countryCode ?genRP ;
    }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?value {
        $this a tr:ProductionUnit ;
              (tr:biddingZone|tr:controlArea)/tr:countryCode ?value ;
                                            tr:countryCode ?RP .
        FILTER (?value != ?RP)
    }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this a tr:ProductionUnit ;
          (tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
                           tr:countryCode ?RP .
    FILTER (?genRP != ?RP)
}

3.6.19 location-informative

  • Rule Group: ProductionUnit-Data
  • Description: Locations should carry informative place names (e.g. city, region, country name/code), and should not be digits only, an EIC, "intra_zonal", or "locName" (but "internal" is ok)
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/registeredResource.location.name
  • Severity: Warning
  • Applies to: countryCode

Discovered:

cd data/turtle/prodUnit
perl -lne 'm{:location +"(.*)"} and do {$_=$1; s{^\d{2}[A-Z][A-Z0-9-]{13}$}{EIC}; s{^\d+$}{digits}; print}' *|sort|uniq -c|sort -rn|less
sh:targetSubjectsOf tr:location ;
sh:property [
  sh:path tr:location;
  sh:not [sh:pattern "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"]].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
    $this tr:location ?loc .
    FILTER (REGEX(?loc, "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"))
}

3.6.20 ProductionUnit-capacity-GTE-GenerationUnit-capacity

  • Rule Group: ProductionUnit-Data
  • Description: The capacity (Nominal Power) of a Production Unit should equal the sum of its Generating Units; or should be greater (in case some Generation Units are not described)
  • Data Items: generation/ProductionAndGenerationUnits
  • Fields: Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
  • Severity: Warning
  • Applies to: biddingZone

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  $this a tr:ProductionUnit; tr:installedOutput ?value
  {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
  filter(?value<?value2)
}

Implementation:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
      select $this ?s2 {
        $this a tr:ProductionUnit; tr:installedOutput ?value ; tr:generationUnit ?s2 .
        {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
        filter(?value<?value2)}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:message "Should be greater than or equal to {?value2}";
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:installedOutput as ?path) ?value ?value2 {
      $this a tr:ProductionUnit; tr:installedOutput ?value
      {select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
      filter(?value<?value2)}"""].

3.6.21 VAT-country-prefix

  • Rule Group: EIC-VAT
  • Description: VAT numbers of market participants should start with a country code, not digits. This is a soft constraint since the country code (if present) can be prepended to the VAT
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
  • Severity: Warning
  • Applies to: countryCode

Example: 6326035O is invalid (IE6326035O would be valid)

sh:targetSubjectsOf tr:vatNumber;
sh:property [
  sh:path tr:vatNumber;
  sh:pattern "^[A-Z][A-Z]"].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this ?vat {
    $this tr:vatNumber ?vat .
    FILTER (!REGEX(?vat, "^[A-Z][A-Z]"))
}

3.6.21.1 VAT-add-country-prefix

This correction normalizes VAT codes: those starting with digit are prefixed with the country code, enabling VAT-per-country-syntax check and VAT-per-country-exists check (in VIES).

base        <https://transparency.ontotext.com/resource/>
prefix tr:  <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

delete {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?old}}
insert {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?new}}
where {
  values (?co ?co1 ?regex) {
    ("AL" "AL"  "^[JKLM][0-9]"      )
    ("AR" "AR"  "^[0-9]"            )
    ("AT" "AT"  "^U[0-9]"           )
    ("BA" "BA"  "^[0-9]"            )
    ("BE" "BE"  "^[0-9]"            )
    ("BG" "BG"  "^[0-9]"            )
    ("CH" "CHE" "^(CH)?[0-9]"       )
    ("CY" "CY"  "^[0-9]"            )
    ("CZ" "CZ"  "^[0-9]"            )
    ("DE" "DE"  "^[0-9]"            )
    ("DK" "DK"  "^[0-9]"            )
    ("EE" "EE"  "^[0-9]"            )
    ("ES" "ES"  "^[A-Z][0-9]"       )
    ("FI" "FI"  "^[0-9]"            )
    ("FR" "FR"  "^[0-9]"            )
    ("GB" "GB"  "^[0-9]"            )
    ("GE" "GE"  "^[0-9]"            )
    ("GR" "EL"  "^(GR|GREL)?[0-9]"  )
    ("HR" "HR"  "^[0-9]"            )
    ("HU" "HU"  "^[0-9]"            )
    ("IE" "IE"  "^[0-9]"            )
    ("IT" "IT"  "^[0-9]"            )
    ("IS" "IS"  "^[0-9]"            )
    ("KY" "KY"  "^[0-9]"            )
    ("LI" "LI"  "^[0-9]"            )
    ("LT" "LT"  "^[0-9]"            )
    ("LU" "LU"  "^[0-9]"            )
    ("LV" "LV"  "^[0-9]"            )
    ("MD" "MD"  "^[0-9]"            )
    ("ME" "ME"  "^[0-9]"            )
    ("MK" "MK"  "^[0-9]"            )
    ("MT" "MT"  "^[0-9]"            )
    ("NL" "NL"  "^[0-9]"            )
    ("NO" "NO"  "^[0-9]"            )
    ("PL" "PL"  "^[0-9]"            )
    ("PT" "PT"  "^[0-9]"            )
    ("RO" "RO"  "^[0-9]"            )
    ("RS" "RS"  "^[0-9]"            )
    ("RU" "RU"  "^[0-9]"            )
    ("SE" "SE"  "^[0-9]"            )
    ("SG" "SG"  "^[0-9]"            )
    ("SI" "SI"  "^[0-9]"            )
    ("SK" "SK"  "^[0-9]"            )
    ("TR" "TR"  "^[0-9]"            )
    ("UA" "UA"  "^[0-9]"            )
    ("US" "US"  "^[0-9]"            )
    ("XK" "XK"  "^[0-9]"            )
  }
  ?x tr:countryCode ?co; tr:vatNumber ?old.
  filter(regex(?old,?regex))
  bind(replace(?old,"^(CH|GR|GREL)","") as ?vat1)
  bind(concat(?co1,?vat1) as ?new)
}

3.6.22 VAT-per-country-syntax

  • Rule Group: EIC-VAT
  • Description: VAT numbers of market participants should be syntactically valid, according to specific rules per country-code prefix. Prefixes GBP, UK, LEI, NONE are invalid.
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
  • Severity: Violation
  • Applies to: countryCode

Examples:

  • IE8F52100V is valid syntax
  • ES20470001 is invalid syntax (ESA20470001 is valid)
sh:targetSubjectsOf tr:vatNumber;
sh:path tr:vatNumber ;
sh:or (
  [sh:pattern "^ADU\\d{6}[A-Z]$"                 ]
  [sh:pattern "^AL[JKLM]\\d{8}[A-Z]$"            ] 
  [sh:pattern "^AR\\d{14}$"                      ]
  [sh:pattern "^ATU\\d{8}$"                      ]
  [sh:pattern "^AU\\d{11}$"                      ]
  [sh:pattern "^BA\\d{12,13}$"                   ]
  [sh:pattern "^BE\\d{10}$"                      ]
  [sh:pattern "^BG\\d{9,10}$"                    ]
  [sh:pattern "^CHE\\d{9}$"                      ]
  [sh:pattern "^CY\\d{8}[A-Z]$"                  ]
  [sh:pattern "^CZ\\d{8,10}$"                    ]
  [sh:pattern "^DE\\d{9}$"                       ]
  [sh:pattern "^DK\\d{8}$"                       ]
  [sh:pattern "^EE\\d{9}$"                       ]
  [sh:pattern "^EL\\d{9}$"                       ]
  [sh:pattern "^ES[A-Z]\\d{7}[\\dA-Z]$"          ]
  [sh:pattern "^FI\\d{8}$"                       ]
  [sh:pattern "^FL\\d{11}$"                      ]
  [sh:pattern "^FR\\d{11}$"                      ]
  [sh:pattern "^GB\\d{9}$"                       ]
  [sh:pattern "^HR\\d{11}$"                      ]
  [sh:pattern "^HU\\d{8}$"                       ]
  [sh:pattern "^IE\\d[\\dA-Z]\\d{5}[A-Z]{1,2}$"  ]
  [sh:pattern "^IS\\d{5}$"                       ]
  [sh:pattern "^IT\\d{10,11}$"                   ]
  [sh:pattern "^JE\\d{10}$"                      ]
  [sh:pattern "^KY\\d{6}$"                       ]
  [sh:pattern "^LI\\d{5}$"                       ]
  [sh:pattern "^LT(\\d{9}|\\d{12})$"             ]
  [sh:pattern "^LU\\d{8}$"                       ]
  [sh:pattern "^LV\\d{11}$"                      ]
  [sh:pattern "^MA\\d{7}$"                       ]
  [sh:pattern "^MD\\d{7}$"                       ]
  [sh:pattern "^ME(\\d{8}|\\d{12})$"             ]
  [sh:pattern "^MK\\d{13}$"                      ]
  [sh:pattern "^MR\\d{8}$"                       ]
  [sh:pattern "^MT\\d{8}$"                       ]
  [sh:pattern "^NL\\d{9}B\\d{1,2}$"              ]
  [sh:pattern "^NO\\d{9}(M|MVA)?$"               ]
  [sh:pattern "^PL\\d{10}$"                      ]
  [sh:pattern "^PT\\d{9}$"                       ]
  [sh:pattern "^RO\\d{7,8}$"                     ]
  [sh:pattern "^RS\\d{9}$"                       ]
  [sh:pattern "^RU\\d{10}$"                      ]
  [sh:pattern "^SE\\d{12}$"                      ]
  [sh:pattern "^SG[A-Z]?\\d{9}[A-Z]$"            ]
  [sh:pattern "^SI\\d{8}$"                       ]
  [sh:pattern "^SK\\d{10}$"                      ]
  [sh:pattern "^SM\\d{5}$"                       ]
  [sh:pattern "^TR\\d{10}$"                      ]
  [sh:pattern "^UA\\d{8,12}$"                    ]
  [sh:pattern "^US\\d{9}([A-Z]{2}\\d)?$"         ]
  [sh:pattern "^XK\\d{9}$"                       ]
).

3.6.23 VAT-country-exists

  • Rule Group: EIC-VAT
  • Description: If a VAT is present then country code should also be present (so the VAT can be checked against that country)
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
  • Severity: Violation
  • Applies to: countryCode
sh:targetSubjectsOf tr:vatNumber;
sh:property [
  sh:path tr:countryCode;
  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/> select * {    ?this tr:vatNumber [] .    FILTER NOT EXISTS {        ?this tr:countryCode ?cc .}} limit 10

3.6.24 VAT-country-conforms

  • Rule Group: EIC-VAT
  • Description: The first two chars of VAT must equal the country code (except "GR" which is spelled "EL" in VAT codes, and "CH" which is spelled "CHE")
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
  • Severity: Violation
  • Applies to: countryCode

Examples:

  • 59XREALPETROL11F "REAL PETROL HOLDING KFT" with VAT "HU24189514": country "IT" is wrong
  • 22X20110811----W "INEOS CHLORVINYLS LIMITED" with VAT "GB768506886": country "BE" is wrong

More example for traders in AE (United Arab Emirates), in particular the Dubai DMCC:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?eic ?co ?vat ?name ?notation ?function ?descr {
  ?x tr:countryCode "AE"
  optional {?x tr:eic ?eic}
  optional {?x tr:countryCode ?co}
  optional {?x tr:name ?name}
  optional {?x tr:notation ?notation}
  optional {?x tr:function ?function}
  optional {?x tr:vatNumber ?vat}
  optional {?x tr:description ?descr}
}
eic co vat name notation function descr
48X000000000255O AE LUZIRA DMCC BUGOLOBI Interconnection Trade Responsible A VAT number is not available for this company, so we are providing the Legal Entity Identifier (LEI) company registration number which is 984500O3EFBA8613AA78.
48X0000000000432 AE GB383911772 COBBLESTONE ENERGY DMCC COBBLESTONEDMCC Balance Responsible Party UK VAT Code not available. Value in above field is the registered company number.
11X0-0000-0554-Q AE NONE ENERGETECH TRADING DMCC ENERGETECH Balance Responsible Party
53XPL000000ININY AE Infusion International INC INFUSION_INTL Network User The company registered in UAE. According to local (UAE) regulations they are treated as offshore company and they function in so called free zone. No possibility for them to get the VAT code.
59XVORTICES--017 AE Vortices Energy Ltd. VORTICESENERGY Balance Responsible Party UAE Company; EU Value not inserted because non-european company.

This indicates some trouble regarding the filling of VAT information for non-European parties. Going row by row:

  • "LEI": it's better to extend the EIC File and data collection systems to be able to carry company identifiers other than vatNumber, including LEI
  • "UK VAT not available": it's unclear why the field vatNumber has the prefix "GB" given that it's an AE company. What forced the data entry user to enter this misleading value?
  • "NONE": it's better to leave a field null rather than enter such vacuous value. What forced the data entry user to enter this vacuous value?
  • "free zone, No possibility for them to get VAT code": surely such companies have a registered company number, and it should be possible to specify it
  • "EU Value not inserted because non-european company": surely all countries have registered company numbers, and it should be possible to specify it
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this tr:countryCode ?co; tr:vatNumber ?vat
      bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
      filter(!strstarts(?vat,?co1))}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "Country code is {?co}";
  sh:select """
    select $this (tr:vatNumber as ?path) (?vat as ?value) ?co {
      $this tr:countryCode ?co; tr:vatNumber ?vat}"""].

SPARQL check:

select $this {
        $this tr:countryCode ?co; tr:vatNumber ?vat
        bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
        filter(!strstarts(?vat,?co1))}

3.6.25 VAT-exists-in-VIES

  • Rule Group: EIC-VAT
  • Description: VAT numbers should exist when checked in external sources (EU VIES), or the market participant should have Deactivation Requested Date, or Status "Passive"
  • Data Items: basic/allocated-eic-codes
  • Fields: EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name
  • Severity: Violation
  • Applies to: countryCode

A python script queries VIES in bulk, then RDFize VIES Checks records that as RDF.

  • Currently we check in EU VIES only for EU and IE
  • A future enhancement could use NO and UK services (or their open trade register data) to check those important countries
  • We don't check for Northern Ireland since VIES uses the country code XI but most such companies in EIC data are recorded with code GB (except 2)
  • VIES reports many ES VAT numbers as non-existent, perhaps the respective companies are not registered for VAT. An example is 18XFERL-12345--K Ferloga, SL (VAT ESB24049272)
    • Can be found in OpenCorporates as es/24049272
    • Can be found in Kompass as ES CIF B24049272, VAT ESB24049272, Kompass ES1074724
    • Can be found in Registradores de Espana business registry (enter company name "Ferloga" and Business Registry Office "Ourense") as NIF B24049272
    • But cannot be found in VIES using either of B24049272, 24049272, A24049272
  • Some countries may not have open data or checking service available, e.g. RS

We use SHACL-SPARQL in order to put the wrong VAT number in ?value:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this tr:vatInVies false}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:vatNumber as ?path) ?value {
      $this tr:vatInVies false; tr:vatNumber ?value}"""];

SPARQL check:

select $this {
      $this tr:vatInVies false}

3.6.26 installedCapacity-Aggregated-vs-Per-Unit

  • Rule Group: Arithmetics
  • Description: Aggregated capacity per area and asset (production) type should be greater than the installed capacities of individual Production Units in that area, within a 30% bound
  • Data Items: generation/AggregatedGenerationPerType, generation/InstalledGenerationCapacityComputed, , generation/InstalledGenerationCapacityAggregated
  • Fields: AggregatedInstalledCapacity, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
  • Severity: Warning
  • Applies to: biddingZone, controlArea

Notes:

  • The two values are not expected to be equal due to differences in capture requirements:
    • ProductionAndGenerationUnits data (16.1.A) is expected to report installed capacity only for units greater than 100 MW
    • AggregatedGenerationPerType data (14.1.A), is expected to report aggregate capacity of all units greater than 1 MW
    • ProductionAndGenerationUnits data represents current capacity, whereas AggregatedGenerationPerType represents the capacity on Jan 1 of the respective year
  • So this rule is intended to capture only substantial deviations that are due to data errors (see below).
  • The rule will check whether AggregatedGenerationPerType is within 100...130% of the sum of generation capacity per unit.

Example of a data mistake: Installed Capacity per Production Type for France on 21-Jan-2022 showed this:

Production Type 2021 MW 2022 MW
Other 1120 7900729

This means that 7.9 TW (7.9 million MW!) of "Other" capacity was newly installed in France. Have the French tamed some Dark Energy source that would solve all our energy problems?

Checking Installed Capacity Per Production Unit shows only 1 "Other" asset:

Production Type Code Name Installed Capacity at the beginning of the year Current Installed Capacity Location Voltage Connection Level Commissioning Date
Other 17W100P100P0352E CYCOFOS TV2 62 62 France 225 01.09.2009

It was installed in 2009 and there's no change in capacity (62 MW) in the last two years. So unfortunately the 7.9 TW is not a miracle but a data error.

Implementation:

SPARQL check:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select ?aggr ?comp ?aggrOutput ?compOutput ?compOutputHigh {
  ?aggr a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;
    tr:controlArea|tr:biddingZone ?area;
    tr:assetType ?assetType;
    tr:installedOutput ?aggrOutput.
  ?comp a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;
    tr:controlArea|tr:biddingZone ?area;
    tr:assetType ?assetType;
    tr:installedOutput ?compOutput;
    tr:installedOutputHigh ?compOutputHigh.
  filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))
} limit 1000

Implementation with SHACL-SPARQL. We return extra info using sh:message

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select (?aggr as $this) ?s2 {
      ?aggr a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?aggrOutput.
      ?s2 a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?compOutput;
        tr:installedOutputHigh ?compOutputHigh.
      filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:message "Must be between {?compOutput} and {?compOutputHigh}";
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this (?aggrOutput as ?value) ?compOutput ?compOutputHigh {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?aggrOutput.
      ?comp a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
        tr:controlArea|tr:biddingZone ?area;
        tr:assetType ?assetType;
        tr:installedOutput ?compOutput;
        tr:installedOutputHigh ?compOutputHigh}"""].

3.6.27 ActualGenerationOutputPerGenerationUnit-controlArea-conform

  • Rule Group: Observations-Structure
  • Description: The Control Area of the observation must match the Control Area of the Generation Unit. This finds too many violations, so we return only the first 1000.
  • Data Items: generation/ActualGenerationOutputPerGenerationUnit, generation/ProductionAndGenerationUnits
  • Fields: AreaCode, GenerationUnitEIC, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
  • Severity: Violation
  • Applies to: controlArea

Out of 4.5M observations over 3 months, there are 3.3M violations:

  • 11.5k where the Generation Unit has a matching controlArea, but that's because it was submitted at the top level of Production and Generation Units, i.e. that is a discrepancy
  • 3.3M involving a Generation Unit that has no controlArea, neither itself or through its Production Unit (parentResource)
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this ?s2 ?s3 {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:controlArea ?area; tr:generationUnit ?s2 .
      optional {?s2 tr:parentResource? ?s3}
      filter not exists {$this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
    } limit 1000"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:select """
    select $this (tr:controlArea as ?path) (?area as ?value) {
      $this tr:controlArea ?area}"""].

SPARQL check:

base <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
  ?this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>; tr:controlArea ?area.
  filter not exists {?this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
  optional {
    ?this tr:generationUnit ?gen
    optional {?gen tr:controlArea ?genArea}}
  optional {
    ?this tr:generationUnit/tr:parentResource ?prod
    optional {?prod tr:controlArea ?prodArea}}
} limit 1000

3.6.28 ActualGenerationOutputPerGenerationUnit-installedOutput-conform

  • Rule Group: Observations-Structure
  • Description: The InstalledGenCapacity of the observation must match the declared nominalP of the Generation Unit
  • Data Items: generation/ActualGenerationOutputPerGenerationUnit, generation/ProductionAndGenerationUnits
  • Fields: InstalledGenCapacity, GenerationUnitEIC, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP
  • Severity: Violation
  • Applies to: controlArea

SPARQL check:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select $this (?output1 as ?value) ?genUnitOutput {
  $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
    tr:installedOutput ?output1.
  optional {$this tr:generationUnit/tr:installedOutput ?output2}
  filter (!bound(?output2) || !(?output1 = ?output2))
  bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)
} limit 200

SPARQL count:

base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select (count(*) as ?c) (count(?output2) as ?c2) {
    $this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:installedOutput ?output
    filter not exists {$this tr:generationUnit/tr:installedOutput ?output2 filter (?output2=?output)}
    optional{$this tr:generationUnit/tr:installedOutput ?output2}
} 

Violations:

  • generationUnit has different installedOutput 22.3k of 4.5M observations over 3 months; 25k over 4 months
  • generationUnit doesn't have any installedOutput 34.7k of 4.5M observations over 3 months; 63k over 4 months

Implementation:

sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    base <https://transparency.ontotext.com/resource/>
    select $this ?s2 {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:installedOutput ?output1; tr:generationUnit ?s2.
      filter not exists {?s2 tr:installedOutput ?output2
        filter(?output1 = ?output2)}}"""];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The GenerationUnit installed capacity (nominalP) {?genUnitOutput}";
  sh:select """
    select $this (tr:installedOutput as ?path) (?output1 as ?value) ?genUnitOutput {
      $this tr:installedOutput ?output1.
      optional {$this tr:generationUnit/tr:installedOutput ?output2}
      bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)}"""].

3.6.29 ActualGenerationOutputPerGenerationUnit-LTE-installedOutput

  • Rule Group: Arithmetics
  • Description: ActualGenerationOutput should be less than or equal to the InstalledGenCapacity for each Generation Unit and date
  • Data Items: generation/ActualGenerationOutputPerGenerationUnit
  • Fields: ActualGenerationOutput, InstalledGenCapacity
  • Severity: Violation
  • Applies to: controlArea
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select $this {
      $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
        tr:actualOutput ?actual; tr:installedOutput ?installed
      filter(!(?actual <= ?installed))}"""];
  sh:sparql [a sh:SPARQLConstraint;
    sh:prefixes tr: ;
    sh:message "The actual generation output, `{?value}` of this observation is greater than the installed output, `{?installed}` for its Generation Unit." ;
    sh:select """
      select distinct $this ?installed ?value {
        $this a tr:DataObservation ;
              tr:actualOutput ?value; tr:installedOutput ?installed .     
        filter(!(?value <= ?installed))}  
    """].

SPARQL check:

base <https://transparency.ontotext.com/resource/>
select $this {
  $this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
    tr:actualOutput ?actual; tr:installedOutput ?installed
  filter(!(?actual <= ?installed))};

3.6.30 Outage-controlArea-conform

  • Rule Group: Outage
  • Description: The area of an Outage must match the declared area of the Production Unit
  • Data Items: outages/UnavailabilityOfProductionUnits, generation/ProductionAndGenerationUnits
  • Fields: AreaCode, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/ControlArea_Domain/mRID
  • Severity: Violation
  • Applies to: controlArea
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
    select distinct $this ?s2 {
    $this a tr:Outage ;
          tr:controlArea ?ca ;
          tr:energyResource/tr:controlArea ?eca .
    FILTER (?ca != ?eca)
    $this tr:energyResource ?s2 .
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has the control area {?ca}, but its energy resource has the control area {?value}";
  sh:select """
    select distinct $this ?ca ?value {
      $this a tr:Outage ;
            tr:controlArea ?ca ;
            tr:energyResource/tr:controlArea ?value .
      FILTER (?ca != ?value)
    }      
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:controlArea ?ca ;
          tr:energyResource/tr:controlArea ?eca .
    FILTER (?ca != ?eca)
}

3.6.31 Outage-biddingZone-conform

  • Rule Group: Outage
  • Description: The zone of an Outage must match the declared zone of the Production Unit
  • Data Items: outages/UnavailabilityOfProductionUnits, generation/ProductionAndGenerationUnits
  • Fields: AreaCode, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/biddingZone_Domain.mRID
  • Severity: Violation
  • Applies to: biddingZone
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
      $this a tr:Outage ;
            tr:biddingZone ?ca ;
            tr:energyResource/tr:biddingZone ?eca .
      FILTER (?ca != ?eca)
      $this tr:energyResource ?s2
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has the bidding zone {?ca}, but its energy resource has the bidding zone {?value}";
  sh:select """
  select distinct $this ?value ?ca {
      $this a tr:Outage ;
            tr:biddingZone ?ca ;
            tr:energyResource/tr:biddingZone ?value .
      FILTER (?ca != ?value)
      $this tr:energyResource ?s2
  }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:biddingZone ?ca ;
          tr:energyResource/tr:biddingZone ?eca .
    FILTER (?ca != ?eca)
}

3.6.32 Outage-Unit-exists

  • Rule Group: Outage
  • Description: The Production/Generation Unit reported in an Outage must be described in Production And Generation Units
  • Data Items: outages/UnavailabilityOfProductionUnits, outages/UnavailabilityOfGenerationUnits, generation/ProductionAndGenerationUnits
  • Fields: PowerResourceEIC, Configuration_MarketDocument/TimeSeries/registeredResource.mRID
  • Severity: Violation
  • Applies to: controlArea
sh:targetClass tr:Outage;
sh:property [
  sh:path (tr:energyResource tr:eic);
  sh:minCount 1].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage .
    FILTER NOT EXISTS {
        $this tr:energyResource/tr:eic ?eic
    }
}

3.6.33 Outage-installedCapacity-conform

  • Rule Group: Outage
  • Description: Installed Capacity reported in an Outage must match the Installed Capacity as described in Production And Generation Units
  • Data Items: outages/UnavailabilityOfProductionUnits, outages/UnavailabilityOfGenerationUnits, generation/ProductionAndGenerationUnits
  • Fields: InstalledCapacity, PowerResourceEIC, Configuration_MarketDocument/TimeSeries/MktPSRType/nominalIP_PowerSystemResources.nominalP, Configuration_MarketDocument/TimeSeries/MktPSRType/GeneratingUnit_PowerSystemResources/nominalP
  • Severity: Violation
  • Applies to: controlArea
sh:target [a sh:SPARQLTarget;
  sh:prefixes tr: ;
  sh:select """
  select distinct $this ?s2 {
    $this a tr:Outage ;
          tr:installedOutput ?ca ;
          tr:energyResource/tr:installedOutput ?eca .
    FILTER (?ca != ?eca)
    $this tr:energyResource ?s2
  }
  """];
sh:sparql [a sh:SPARQLConstraint;
  sh:prefixes tr: ;
  sh:message "The outage has an installed capacity {?ca}, but its energy resource has the installed capacity {?value}";
  sh:select """
  select distinct $this ?ca ?value {
    $this a tr:Outage ;
      tr:installedOutput ?ca ;
      tr:energyResource/tr:installedOutput ?value .
    FILTER (?ca != ?value)
  }
  """].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:installedOutput ?ca ;
          tr:energyResource/tr:installedOutput ?eca .
    FILTER (?ca != ?eca)
}

3.6.34 Outage-availableCapacity-LT-installedCapacity

  • Rule Group: Outage
  • Description: Available Capacity reported in an Outage must be less than the Installed Capacity
  • Data Items: outages/UnavailabilityOfGenerationUnits
  • Fields: AvailableCapacity, InstalledCapacity
  • Severity: Violation
  • Applies to: controlArea
sh:targetClass tr:Outage;
sh:property [
  sh:path tr:availableOutput;
  sh:lessThan tr:installedOutput].

SPARQL check:

PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
    $this a tr:Outage ;
          tr:availableOutput ?ao ;
          tr:installedOutput ?io .
    FILTER (?io <= ?ao)
}

3.7 More Validation Rules

Here are ideas for more validation rules that are not yet defined. As we define them, we move them to the section above:

  • The forecasts and actuals of Generation in an area should be less than the max capacity of Production Units in that region
  • The actuals of Generation in an area should not deviate from forecasts more than a certain threshold (15%)

The following rules will not be implemented:

  • Locations should be meaningful, eg a City/Town name. We've implemented a limited variant, see location-informative. Could be implemented through integration with OSM.

3.8 Already Checked Rules

The following rules were checked quickly and no errors were found, so we found no need to implement them:

  • Each resource should be described in EIC once (or if multiple times then with consistent data)
    • grep "<mRID>" allocated-eic-codes.xml|sort|uniq -d
  • If Production and Generation Units are described multiple times, the following fields are always consistent:
    • highVoltageLimit, assetType, controlArea, biddingZone
    • However, installedOutput is not consistent and we have a validation rule for that
  • Each EIC resource should have name and notation (short name)
    • ?x tr:eic [] filter (!exists {?x tr:notation []} || !exists {?x tr:name []})
  • All quantities should use the same unit (installedOutput, actualOutput, availableOutput: MAW, highVoltageLimit: KVT)
    • select ?unit (count(*) as ?c) {?x tr:unit ?unit} group by ?unit
    • Therefore we can simplify the representation by omitting the unit
  • The nominalP unit of a "Production Unit" and all its "Generation Units" is always specified (and by the above-checked rule, is the same). Note: we've now eliminated tr:unit so the query below will not work
select ?powUnit ?powUnitN ?powUnitUOM ?genUnit ?genUnitN ?genUnitUOM {
    ?powUnit tr:generationUnit ?genUnit.
    optional {?powUnit tr:installedOutput/tr:unit ?powUnitUOM}
    optional {?genUnit tr:installedOutput/tr:unit ?genUnitUOM}
    filter (!bound(?powUnitUOM) || !bound(?genUnitUOM) || ?powUnitUOM != ?genUnitUOM)
}

3.9 Validation Service

Validation service options are currently under investigation. There are two validators under consideration: TopQuadrant SHACL API and GraphDB's ShaclSail.

The chief questions to be investigated are:

  • How expressive is the validator?
  • What is the validator's performance?
  • Can we use a hybrid approach integrating several validators?
  • How often to run validation? How to update validation results?
  • Do we need a TEKG API to initiate validation?
  • Will we check incrementally (only changed data points) or totally?

3.9.1 TQ SHACL API

The TQ SHACL API is an open-source API developed by TopQuadrant. It is based on Apache Jena.

  • It's a very flexible validator, with full support of SHACL-SPARQL and partial support of SHACL Advanced.
  • Slow performance, especially with SPARQL constraints. The validator uses internal data structures for selecting validation targets. SPARQL constraints are executed for each focus node. This can lead to substantial slowdowns.
  • SHACL definitions and report formats are the same as ShaclSail, but the Apache Jena models are different from RDF4J models. We would need an integration layer with GraphDB.
  • The validator is bulk, works on the complete data model.
  • There is no support for sh:annotationProperty, which would make reporting harder.

The performance issue could be mitigated by clever target definitions, i.e., using SPARQL for targeting.

Since we store data in GraphDB, we would need to fetch all data to be validated, store it in a Jena model (can be in-memory), then validate.

3.9.2 RDF4J ShaclSail

ShaclSail is implemented in RDF4J and is part of GraphDB. It is native to our database, so we would need no integration layer.

  • Partial support of core SHACL.
  • Has a targeting extension mechanism with RSX, which emulates a lot of sh:SPARQLTarget functionality more efficiently.
  • Better performance.
  • The validator can be bulk or incremental.
  • Insertions are always rejected when they contain invalid data.

Since we never want to reject data, and only want to record validation errors, we need to run with the validator toggled off, then do a bulk validation. This can be achieved in one of two ways:

  • Have SHACL always loaded in the database. Do all insertions first with validation turned on, to produce a report. Then with validation turned off, to store the data.
  • Do not have SHACL loaded in the database. Post-insertion, try inserting it, triggering a bulk validation. Store the violation report and, if there are no errors in the whole database, clear the SHACL shapes. If there were errors, SHACL shapes would not have been persisted.

Of the two, the first option is notably better performance-wise, except for very large files.

3.9.3 Custom SPARQL validations

Custom SPARQL validations are very flexible and offer better performance than SHACL-SPARQL. The downside is that we would need custom logic to implement them. Custom SPARQL validation also can easily be used in conjunction with one of the two SHACL validators.

3.10 DQA Dashboard

The DQA (Data Quality Assessment) Dashboard displays validation results.

The functions (scope) of the DQA dashboard include:

  • Navigation of rules by applicability (country or area), group (category)
  • Display validation result counts per area/country, rule, severity (Violation, Warning)
  • (CANCELED) Display %prevalence (percent of errors compared to all records of that kind)
  • (CANCELED) Display trends in time
  • Drilldown to individual violations
    • Pagination
    • Display enough info for each violation to be able to understand it
    • Hyperlink to jump to the RDF data for the violating node, to be able to diagnose in details

DQA Mockups are shown in textual form in preceding sections:

4 External Data Integration

This section specifies Integrations and/or Validations based on external data to be integrated into the KG. In addition to the external data sources described in subsections, we also considered the following sources:

  • Wikidata (WD) is a global crowd-sourced knowledge base with encyclopedic coverage
    • It has info about 100M items, about 5B claims, which translates to about 16B RDF triples
    • It has about 10k descriptive properties, of which 6.5k are links to external databases. WD is therefore a coreferencing hub for integrating different data sources
    • WD has about 16k power plants or generators world-wide, of which 8220 are in Europe (see query https://w.wiki/4dqA)
    • 7222 of European power plants have geo-coordinates (see query https://w.wiki/4fKq)
    • 920 of European power plants have EIC (see query https://w.wiki/4dq8)
    • One WD power plant may have several EIC, the Bellevue NPP (France) has 4 EIC: 2 Production Units and 2 Generation Units. Thus, the modeling granularity is higher than in ENTSOE
    • We decided not to use WD because OSM (see below) has deeper geographic info and comparable other info
    • In a future project it's certainly worth to explore WD integration because it has excellent additional info, eg administrative areas where the power plant is located, and their populations. WD can be used together with OSM and TEKG by using SPARQL Federation
  • National data sources

4.1 External VAT Validation

Over 10000 VAT numbers are present in the data. We will validate them using the VIES-on-the-Web system. It is a free web service provided by the EC, running on top of national VAT databases corresponding to EC Member States and Northern Ireland.

The service is a simple SOAP API where two parameters are sent as XML elements: countryCode and vatNumber. The response is a boolean value whether the VAT number is valid, and if valid then some basic information about the entity it corresponds to.

Example response for VAT IT13433711002:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
            <countryCode>IT</countryCode>
            <vatNumber>13433711002</vatNumber>
            <requestDate>2022-01-12+01:00</requestDate>
            <valid>true</valid>
            <name>ARCADIA ITALIA S.R.L.</name>
            <address>VIA PERUGINO 4 00196 ROMA RM </address>
        </checkVatResponse>
    </soap:Body>
</soap:Envelope>

4.1.1 VATs in ENTSOE Data and VIES Coverage

An important limitation of VIES is that not all countries relevant for ENTSOE are present. A future project should evaluate the possibility to use an additional free service such as VATApp, or use directly open data dumps provided by the respective countries (UK and NO in particulr).

  • Find all countries in the ENTSOE dataset:
select (count(*) as ?c) ?co {
    ?x tr:vatNumber ?vat
    optional {?x tr:countryCode ?co}
} group by ?co order by desc(?c)
  • The query returns 57 results. One is blank (" ") which is not a country. So we are left with 56 countries.
  • VIES does not support the following: GB, CH, UA, MK, RS, GR, AL, NO, BA, MD, XK, ME, US, TR, LI, SG, KY, AE, GE, IS, AD, AR, AU, MA, MY, NC, PR, RU, SM, UK (Total 30)
  • Therefore VIES Supports 26 out of 56 countries, which is 46%
  • 17 of 56 countries have less than 9 VAT numbers: TR LI SG KY AE GE IS AD AR AU MA MY NC PR RU SM UK. They'll will be ignored for VAT format analysis (see below)
  • Others have 11 or more

VAT Number Statistics: Out of 9,919 VAT numbers

  • 50 start with 1 letter, e.g. K42101801N, country AL
  • 5,472 start with 2 letters
  • 3,733 start with 3 letters
  • There's 1 that starts with 4 letters: GREL099790528, country GR.
  • 659 start with numbers, i.e. lack the country prefix
    • Some of these are invalid, e.g. have extra or missing digits

4.1.2 VIES Validation Statistics

countryCode total valid invalid names
AT 128 110 18 110
BE 133 107 26 107
BG 187 169 18 169
CY 22 10 12 10
CZ 326 199 127 199
DE 1027 969 58
DK 92 85 7 85
EE 57 47 10 47
EL 93 90 3 90
ES 3455 1499 1956
FI 229 224 5 224
FR 124 107 17 107
HR 152 106 46 106
HU 109 75 34 75
IE 57 49 8 49
IT 611 349 262 349
LT 88 69 19 69
LU 25 21 4 21
LV 70 52 18 52
MT 11 11 11
NL 202 165 37 165
PL 341 235 106 235
PT 102 95 7 95
RO 228 186 42 186
SE 34 31 3 31
SI 117 79 38 79
SK 274 195 79 195
XI 2 1 1 1
TOTAL 8296 5335 2961 2867

#+TBLFM: @>$2..$> = vsum(@I..@-1)

  • VIES covers 30 countries, and 8.3k of the 10k VAT numbers present
  • The number of invalid VATs is surprisingly high: 2.9k of 8.3k or 35.7%
  • DE and ES never report names (and addresses), even for valid VATs

4.1.3 Per Country VAT Format

VAT format was researched on:

  • Wikipedia for most countries (referred to as WP)
  • This is another reference that was used for VAT Numbers, referred to as EU-TID:

Format and structure of tax identification numbers (TINs) in the EU

  • AL (Albania): 10 characters, first char following the prefix is [JKL], and the last character is a letter. E.g. K99999999L, L99999999G
    • 44 out of 50 are valid according to the above format
    • Invalid VATs: ALL11731504A, ALJ61820031J, ALL32130008F, M12221008I, ALK11624001V
  • AT (Austria): WP: 'ATU'+8 digits. E.g. ATU99999999. EU-TID 9 digits.
    • 127/130: valid
    • Invalid: U50568407, U49637200 and ATU6729404 (7 digits)
  • BA (Bosnia and Herzegovina)
  • BE (Belgium): WP: 'BE' + 8 digits + 2 check digits. E.g. BE09999999XX. EU-TID: 10 digits
    • 125/138 comply
    • 8/138: 9 digits. Invalid examples: GB768506886, 0711797282, 0754605263
  • BG (Bulgaria): WP: 9-10 digits. E.g. BG999999999. EU-TID : 10 digits
    • 187/188 have 9 digits
  • BY (Belarus): Not present in the dataset
  • CH (Switzerland): 'CHE' + 9 digits with optional punctuation. E.g. CHE-123.456.788. The last digit is a MOD11 checksum
    • 173/292 start with CHE followed by 9 digits
    • 92/292 start with CH followed by 6 digits
    • 11/292 start with CH followed by 9 digits
    • 2/292 start with CH followed by 11 digits
    • 7/292 start with CHE followed by 8 digits
    • 1/292 start with CHE followed by 7 digits
    • As we can see, many CH VATs in the dataset don't follow the format definition
  • CY (Cyprus): WP: 9 characters. E.g. CY99999999L. EU-TID: the same for individuals but 8 digits for legal entities.
    • 21/23 comply with the official format
    • 2 Invalid: 10375510G and 10390426G, miss CY prefix
  • CZ (Czech Republic): WP: 'CZ'+ 8 to 10 digits. EU-TID 8 digits.
    • 321/332: 'CZ'+8 digits
    • 8/332: 'CZ'+9 digits
    • 2/332 are wrong: DE289523572 and DE814987657
  • DE (Germany): WP: 9 digits. E.g. DE999999999. EU-TID : 11 digits.
    • 1029/1044 comply
    • 2/1044: 8 digits, e.g. DE29149497 and DE29535215
    • 2/1044: 10 digits, e.g. DE4370403223 and DE3503951816
    • 9/1044 don't start with DE: 6 of them have 11 digits, 3 of them have 10 digits
  • DK (Denmark): WP: 8 digits, last digit is a checksum. E.g. DK99999999. EU-TID: 8 digits
    • 95/100 comply
    • 3/100 miss DK prefix
    • 2 are completely wrong: GB684966762 and CZ07292015
  • EE (Estonia): WP: 9 digits. EU-TID : 8 digits for legal entities and 11 digits for individuals.
    • 57/58 comply
    • 1 is 14912868 which misses prefix as well as it has 8 digits instead of 9
  • ES (Spain)
    • Format for companies: either 'ES'+letter+8 digits or 'ES'+letter+7 digits+letter. EU-TID:same. Where the first letter defines the type of company and the following first 2 digits define the province where the company was registered. The last character is a control digit.
    • Format for individual people/freelancers: either 'ES'+8 digits+letter (for Spaniards) or 'ES'+letter+7 digits+letter (for foreigners). E.g. ESX9999999R
    • 3363/3464 comply with the first format: 'ES'+letter+8 digits
    • 45/3464 comply with the second format: 'ES'+letter+7 digits+letter
    • 3/3464 miss 1 digit, e.g. ESA0879906, ESB9159561, ESA5840219
    • 1/3464 is ESB588111980 (9 digits instead of 8)
    • 3 don't start with ES: B95713541, PT980633745 and PT508193117
  • FI (Finland): WP: FI + 7 digits + check digit. E.g. FI99999999. EU-TID:same
    • 230/234 comply
    • 4/234 miss FI prefix
  • FR (France): WP: 'FR'+ 2 digits (as validation key) + 9 digits (as SIREN), the first and/or the second value can also be a character – e.g. FRXX999999999. EU-TID: 9 digits for legal entities and completely different thing for individuals.
    • 118/125 comply
    • 2/125 are wrong: DE813871435 and 0000000000000
    • 2/125 miss one digit: FR5950773519 and FR2783328587
    • 2/125 miss two digits: FR572221034 and FR440117620
    • 1/125 misses 3 digits: FR69448572
  • GB (Great Britain): 9 digits, sometimes written with spaces eg 123 4567 89
    • 361/395 comply
    • 11 miss GB prefix
    • 7 miss one digit
    • 1 misses 2 digits
    • 3 have 10 digits instead of 9
    • Several others miss prefix + some digits
  • GR/EL (Greece). EU-TID: 9 digits.
    • 93/99 with format EL + 9 digits
    • 5 miss prefix EL
    • 1 is GREL099790528
  • HR (Croatia): WP: 'HR'+ 11 digits. EU-TID: 11 digits.
    • 152/156 comply
    • 3 miss HR prefix
    • 1 misses a digit: HR1642377552
  • HU (Hungary): WP: 8 digits (the first 8 digits of the national tax number), e.g. HU12345678. EU-TID: 10 digits.
    • 105/109 comply
    • 1 misses prefix
    • 2 miss 1 digit
    • 1 is 10728068244 (too many digits)
  • IE (Ireland)
    • Format: WP: Two standards: 'IE'+7 digits+2 letters, e.g. IE1234567FA; or 'IE'+7 digits+1 letter, optionally followed by 'W' for married women, e.g. IE1234567T or IE1234567TW. EU-TID: the same both for legal entities and individuals.
    • 26/72 end with two letters (first format)
    • 29/72 end with 1 letter (second format)
    • 6 miss prefix
    • 6 start with GB
    • strange occurrence IE9Y66I020
  • IT (Italy): WP: 11 digits (the first 7 digits is a sequential number, the following 3 indicate the province of residence, the last digit is a checksum. EU-TID: the same.
    • 597/742 comply
    • 116/742 miss IT prefix
    • 22/472 miss 1 digit, e.g. IT1374910113 (10 digits instead of 11)
    • 5 miss 1 digit as well as IT prefix, e.g. 2822840605
    • 1 wrong: HU24189514
  • LT (Lithuania): WP: 9 or 12 digits. EU-TID: the same.
    • 47/88 with format LT860632610 (9 digits)
    • 39/88 with format LT1106284811 (10 digits)
    • 1 is with 11 digits: LT10000580981
  • LU (Luxembourg): WP: 8 digits. EU-TID: 11 digits.
    • 25/26 comply
    • 1 missess LU prefix
  • LV (Latvia): WP: 11 digits. EU-TID: the same.
    • 69/81
    • 12/81 miss LV prefix
  • MD (Moldova): 7 digits
    • 16/17 with format 0203943 (no MD prefix)
    • 1/17 is MD05754540655
  • ME (Montenegro): 8 or 12 digits
    • 10/11 have 8 digits, e.g. 02751372 (without ME prefix)
    • 1/11 has 11 digits: 40310007516
  • MK (North Macedonia): 'MK'+13 digits. E.g. MK4032013544513
    • 130/132 with format 4080009501086 (without MK prefix)
    • 1/132 is MK403000452960 (12 digits after prefix)
    • 1/132 is 40430008038555 (14 digits)
  • NL (The Netherlands): WP: 'NL'+9 digits+B+2 digits. E.g. NL999999999B01. EU-TID: 9 digits.
    • 201/207 comply
    • 1/207 don't have the prefix
    • 4 completely wrong: 32117527, 801424250RT000, GB115163840 and IT01831490766
  • NO (Norway): 9 digits, optionally followed by 'MVA' to indicate VAT registration
    • 6/43 comply, e.g. NO989795848MVA
    • 8/43 don't have prefix NO
    • 6/43 have just 9 digits , e.g. 981355210
    • 1 has 7 digits
    • 1 has 12 digits
    • 1 wrong: GB894770371
  • PL (Poland): WP: 'PL'+10 digits. EU-TID: the same.
    • 339/349 comply
    • 8 miss prefix
  • PT (Portugal): WP: 'PT'+9 digits (last digit is a checksum). EU-TID: the same.
    • 99/102 comply
    • 2/102 miss prefix
  • RO (Romania): WP: 'RO' (optional) + 10 digits. EU-TID: the same.
    • 188/231 with format RO13328043 (8 digits)
    • 33/231 have 7 digits, e.g. RO1092690
    • 6/231 have 6 digits, e.g. RO943038
    • 3/231 have 7 digits and miss prefix
    • 1/231 has 9 digits: RO291111546
  • RS (Serbia): 9 digits
    • 92/113 comply
    • 9/113 start with RS, e.g RS107350223
    • Some have SR and SK prefix: SR109027050, SK2022490800, SR105523323, SR107634440, SR104217641, SR104613706
  • SE (Sweden): WP: 12 digits. EU-TID: 10 digits.
    • 33/37 comply
    • 3/37 have 10 digits, e.g. 5561085688
  • SI (Slovenia): WP:'SI'+8 digits. EU-TID: the same.
    • 117/117 comply, e.g. SI20874731
  • SK (Slovakia): WP: 'SK'+10 digits. EU-TID the same.
    • 268/279 comply (with the prefix)
    • 5/579 miss the prefix
    • 1 has 8 digits: 36699624
  • UA (Ukraine): 12 digits
    • 122/236 comply
    • 92/236 miss the prefix
    • 11/236 have just 8 digits, e.g. 40298595
    • 2 have 9 digits
  • XK (Control Area Kosovo): 9 digits
    • 16/16 comply

4.1.4 VAT Format Summary

Most VATs comply with their official definitions. The majority of numbers start with their corresponding country code.

However, there are VATs which are valid but miss their country prefix. The inconsistencies are of several types:

  • countryCode different from vatNumber prefix, e.g. DE289523572 appears in CZ VATs; GB appears in VATs of countries like NO, NL, IE, DK, BE
  • Some VATs miss digits, others have additional digits
  • Strange inconsistencies like IE9Y66I020 where the format doesn't allow for letters between the country code and digits

4.1.5 VAT Validation Python Script

For easier verification of VAT Numbers (both format and existence in VIES), a python script was developed. It:

  • Accepts a tabular data file and the column name where the VAT numbers are present. They should start with the country prefix, e.g. DE289523572.
  • Performs bulk validation
  • The output is a CSV file with all VAT numbers that have valid format.

It can also accept a single VAT number, validate it, and retrieve all the info from the VIES service.

4.2 RDFize VIES Checks

The above script also queries EU VIES for VAT codes in EU+IE and records it as CSV: The query etl_scripts/VAT-from-VIES.ru RDFizes this data and attaches it to EIC nodes:

  • tr:viesCheckDate (request date): when the check was made
  • tr:vatInVies (VAT validity): whether the VAT is found and valid (not expired)
  • tr:nameInVies (company name): Legal company name as reported by VIES
  • tr:addressInVies (address): Company address as reported by VIES

4.3 Open Street Map

Open Street Map (OSM) is a global crowd-sourced database of geographic information, including power plants and generators. E.g. the screenshot below shows a coal power station and some of the OSM data fields that describe it.

OSM has three element types:

  • node - represents a specific point on the earth's surface defined by its latitude and longitude. Each node comprises at least an id number and a pair of coordinates.
  • way - ordered list of between 2 and 2,000 nodes that define a polyline. Ways are used to represent linear features such as rivers and roads.
  • relation - multi-purpose data structure that documents a relationship between two or more data elements (nodes, ways, and/or other relations)

The following screenshots show Varna Power Plant with its three generators. Note that the generators are of type node and they are part of the relation corresponding to te power plant.

4.3.1 Planned use of OSM

We'll use it to complement ENTSOE Production Unit data with detailed geo-information.

OSM includes detailed data such as:

  • Coordinates of selected power plants and generators
  • Detailed outline maps of power plants and generators
  • Descriptive data such as power output, fuel, technology
  • EIC identifiers using tag Key:ref:EU:ENTSOE_EIC
  • WD identifiers and Wikipedia links
    • Optionally, extra info such as images can be obtained through these links

We've tried several different services to provide OSM data:

  • Geo mapping and visualization services
  • The Overpass query service, and Overpass Turbo as a wizard for constructing queries.
  • The Sophox SPARQL endpoint that in addition to OSM querying allows federated use together with WD and TEKG ENTSOE data.
    • But it turns out that Sophox has significantly less data: 35k Plants and 600k Generators, versus 48k Plants and 1.5M Generators in Overpass. The reason is that the Sophox semantic repository is not updated often enough from OSM data:

Another reason why we chose Overpass over Sophox is that the SPARQL endpoint did not always work properly. Eg 20k Plants have property osmt:name, but when you try to download all the Plants along with other properties, only the first 2k records had the osmt:name field.

Although the world-wide coverage of power plants in OSM is very good, its number of EIC ids is not so large. Therefore:

  • We'll use additional databases (see next) to correlate EIC ids to coordinates
  • Then match these to OSM plants and generators
  • Then enrich OSM with EIC ids. This enriched dataset will be published openly (as part of OSM), allowing others to also use our work

4.3.2 Contributing to OSM

In order to contribute to OSM:

  • First you need to create an account
  • Pass the tutorial in which they explain very good how to edit current tagged location or how to create a new one.
  • Bulk edit many locations will be done via API endpoint.

Also there are third party editors which we can use as alternatives. These are the most popular:

4.3.3 Tag Info

OSM Tag Info is a series of dashboards allowing to explore the distribution of different tags. We used it to explore the distribution of objects with a EIC id (ref:EU:ENTSOE_EIC) and objects tagged as power:plant. The Timelines display the gradual contribution of this type of objects to the OSM database.

Geography and Chronology of tag power=plant (61.5k); plus tag power=generator (1.84M)

Map with objects with tag power=plant and power=generator Timeline with object tag power=plant and power=generator

Geography and Chronology of key ref:EU:ENTSOE_EIC (3667). Our recent contributions are also visible on this timeline.

Map with objects with EIC id in Europe Map with objects with EIC id in Europe

4.3.4 Overpass API

Data about the Plants has been downloaded in JSON format from Overpass by using the below query:

/*
This has been generated by the overpass-turbo wizard.
*/
[out:json][timeout:3000];
(
  // query part for: “power=plant
  node["power"="plant"];
  way["power"="plant"];
  relation["power"="plant"];
);
// print results
out body;
>;
out skel qt;

Generators have been downloaded with wget request to http://overpass-api.de/api/interpreter because the Overpass workbench was crashing due to the large size of the data.

First you should create file generator.osm which contains the following query:

/*
This has been generated by the overpass-turbo wizard.
*/
[out:json];
(
  // query part for: “power=generator
  way["power"="generator"];
);
// print results
out body;
>;
out skel qt;

After that run below command:

wget -O generator.json --post-file=generator.osm "http://overpass-api.de/api/interpreter"

You have to repeat above steps for node and relation, save the ouput in different json files and then merge them into one. We have to do this due to large size of generators. Other option is to download the generator for each country because in OSM you can't filter by continent.

Note: There are Plants and Generators which have output electricity with values yes or no instead of number.

We've researched how accurate are the coordinates for the Plants and Generators when we have cascades, where the dam/weir and pipeline can be far removed. We have gone through several examples and we can say that the pinpoints are good.

For example, below is a comparison of the outline and

Whakamaru
WhakamaruPoint

4.3.5 Comparison of Detailed Coordinates Against Centroid

Also, we have found an exception where we have a hydro plant which covers a large area, but even then we have close point to the facility:

Some other useful Overpass queries:

Search by EIC

[out:json][timeout:300];
(
  way["ref:EU:ENTSOE_EIC"~"32W001100100089X"] ;
);
out body;
>;
out skel qt;

Search for centroid

[out:csv(::type,::id,name,::lat,::lon)][timeout:20];
(rel(2865507);) -> .object;
.object out center;

4.3.6 OSM Validation

The following screenshots show some excellent OSM issue/validation reports

A trend with the number of power plant related issues

4.4 External Power Plant Databases

We also investigate a number of other external databases. We analyse them and evaluate the possibility to import the missing generation and production units into Open Street Map.

4.4.1 FRESNA (PowerPlantMatcher)

github

Data fusion of multiple power plant databases. 7 databases, including ENTSO Transparency, of which 6 are free (Platts WEPP is paid).

Source

Summary by country

csvtk summary -f id -g Country matched_data_red.csv |csvtk sort -k 2:rn
Country,id:count
Germany,1193
Norway,1009
France,993
Spain,761
Italy,575
Switzerland,528
United Kingdom,464
Portugal,288
Finland,212
Austria,201
Sweden,166
Romania,142
Poland,120
Czech Republic,55
Netherlands,55
Greece,50
Bulgaria,49
Slovenia,46
Belgium,45
Ireland,39
Slovakia,36
Denmark,32
Hungary,30
Croatia,27
"Macedonia, Republic of",12
Estonia,11
Lithuania,5
Latvia,4
Luxembourg,2

Summary by project ID

csvtk cut -f projectID matched_data_red.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
   5159 CARMA
   3455 JRC
   2728 OPSD
   1370 GPD
   1324 ENTSOE
   1197 GEO

4.4.2 Global Power Plant Database

WRI GPPD (World Resources Initiative, Global Power Plant Database) a comprehensive, global, open source database of power plants. The database covers approximately 35,000 power plants from 167 countries.

website

Available fields: country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,other_fuel3,commissioning_year,owner,source,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,generation_gwh_2018,generation_gwh_2019,generation_data_source,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017

The latest version is form June 2021. Approximatly 10765 powerplants are in ENTSOE countries

Summary by country

csvtk summary -f gppd_idnr:count -g country global_power_plant_database.csv|csvtk sort -k 2:nr|head -21
country,gppd_idnr:count
USA,9833
CHN,4235
GBR,2751
BRA,2360
FRA,2155
IND,1589
DEU,1309
CAN,1159
ESP,829
RUS,545
JPN,522
AUS,486
PRT,469
CZE,462
ITA,396
CHL,315
NOR,306
MEX,277
VNM,236
ARG,236
THA,196
POL,189

Summary by ENTSOE country, marked with "*" are countries where we are not sure of relevant for ENTSOE

csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv |csvtk summary -f gppd_idnr -g iso3
iso3,gppd_idnr:count
ALB,8
AUT,103
BEL,69
BGR,43
BIH,20
BLR,24  (*)
CHE,168
CYP,3
CZE,462
DEU,1309
DNK,47
ESP,829
EST,17
FIN,185
FRA,2155
GBR,2751
GRC,90
HRV,24
HUN,18
IRL,59
ISL,20
ITA,396
LTU,6
LUX,2
LVA,5
MDA,6   (*)
MKD,12
MNE,3
NLD,71
NOR,306
POL,189
PRT,469
ROU,68
RUS,545 (*)
SRB,12
SVK,30
SVN,8
SWE,168
UKR,64
csvtk summary -f capacity_mw:min,capacity_mw:q1,capacity_mw:q2,capacity_mw:median,capacity_mw:q3,capacity_mw:mean,capacity_mw:max,capacity_mw:stdev,capacity_mw:variance global_power_plant_database.csv
min, q1,  q2,   median,q3,   mean,  max,     stdev, variance
1.00,4.90,16.74,16.74, 75.34,163.36,22500.00,489.64,239743.48

```bash
csvtk summary -f year_of_capacity_data:min,year_of_capacity_data:max -i global_power_plant_database.csv
min,    max
2000.00,2019.00

Breakdown by all fuels

csvtk cut -f primary_fuel,other_fuel1,other_fuel2,other_fuel3 global_power_plant_database.csv|perl -pe "s{,}{\n}g"|sort|uniq -c|sort -rn
  10718 Solar
   7191 Hydro
   5358 Wind
   4512 Gas
   3568 Oil
   2420 Coal
   1506 Biomass
   1182 Waste
    195 Nuclear
    189 Geothermal
    186 Storage
    130 Other
     48 Cogeneration
     35 Petcoke
     10 Wave and Tidal

Breakdown by primary fuel in ENTSOE countries:

csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv | csvtk cut -f primary_fuel | sort|uniq -c|sort -rn
   3921 Solar
   2329 Wind
   2056 Hydro
    779 Gas
    503 Biomass
    443 Waste
    420 Coal
    125 Oil
     74 Nuclear
     46 Geothermal
     31 Storage
     22 Other
      8 Wave and Tidal
      7 Cogeneration

4.4.3 PyPSA-Eur

PyPSA-Eur, the first open model dataset of the European power system at the transmission network level to cover the full ENTSO-E area, is presented.

  • Complete European data-set for generation and transmission expansion planning studies from freely available data.
  • Publication of the composition pipeline from downloaded data to an electricity system model ready for load-flow analyses.
  • An automatically updatable free power plant data-set covering all European countries using a modern record-matching algorithm.
  • New methodology to compare geo-referenced network datasets against one another.

A power plant database is presented using a sophisticated algorithm that matches records from a wide range of available sources and includes geo-data

5151 records

Fields: id,Name,Fueltype,Technology,Set,Country,Capacity,Duration,YearCommissioned,Retrofit,lat,lon,File,projectID,bus

Example row: 705,Ec łódź,Hard Coal,Steam Turbine,PP,Poland,403.0,0.0,,, 51.74050670000001,19.440413600000007,, "{'CARMA': ['CARMA25606', 'CARMA25608', 'CARMA25607'], 'ENTSOE': ['19W000000000107C', '19W000000000106E'], 'GEO': ['GEO42495']}",4403

Summary by fuel type

csvtk summary -f id -g Fueltype PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
Hydro,3594
OCGT,406
CCGT,257
Hard Coal,197
Bioenergy,188
Oil,132
Waste,129
Other,79
Lignite,72
Nuclear,62
Geothermal,29
"CCGT, Thermal",2
Storage Technologies,1
Pv,1
Caes,1

Summary by country

csvtk summary -f id -g Country PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
France,830
Spain,734
Norway,581
Switzerland,555
Germany,552
Italy,507
United Kingdom,305
Finland,202
Austria,163
Sweden,145
Portugal,126
Poland,56
Netherlands,48
Slovenia,46
Greece,38
Romania,35
Slovakia,32
Belgium,31
Bulgaria,30
Czech Republic,28
Croatia,24
Ireland,23
Denmark,23
Hungary,20
Lithuania,5
Estonia,5
Latvia,4
Luxembourg,2

Summary by source file

csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe "s{\, }{\n}g"$ csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe 's{\, }{\n}g; s{"}{}g'|sort|uniq -c|sort -rn|head -20
   2232
    727 SEDE
    417 BFE
    400 ENTSOE
    230 IWPDCY.csv
    220 GOV
    198 EnergyAuthority
    147 energy_storage_exchange
    144 Department for Business Energy & Industrial Strategy
    130 https://www.verbund.com/de-at/ueber-verbund/kraftwerke/unsere-kraftwerke
     98 Energias Endogenas de Portugal
     96 RTE
     70 Nordpool
     53 Red Eléctrica de España
     43 Terna
     30 SEAS
     24 Vattenfall
     22 GPI
     15 Tennet_Q4
     15 Energinet DK

Summary by source dataset

csvtk cut -f projectID PyPSA-Eur-powerplants.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
   4072 CARMA
   2734 OPSD
   1730 ENTSOE
    883 GEO
    816 GPD
    230 IWPDCY
    147 ESE

4.4.4 JRC-PPDB-OPEN

github

In 2017 the Joint Research Centre developed a Power Plant Database for energy systems modelling (JRC-PPDB) in order to support the unit activities in energy systems modelling and knowledge management.

Size: Production and Generation units: 7118, of which 3961 unique Production Unit EIC

A mapping between identifiers is provided in JRC_OPEN_LINKAGES.csv.

Unique ID counts

csvtk summary -f eic_p:countunique,eic_g:countunique,eprtr_facilityID:countunique,WRI_id:countunique,GEO_id:countunique
,fresna_id:countunique JRC_OPEN_LINKAGES.csv
eic_p, eic_g, eprtr,WRI, GEO, fresna
1967,  3359,  592,  983, 597, 1306

Breakdown of WRI identifiers

csvtk cut -f WRI_id JRC_OPEN_LINKAGES.csv |tr 0-9 d|sort|uniq -c
      4 BRAddddddd
      2 CANddddddd
    213 GBRddddddd
     55 GEODBddddddd
      2 USAddddddd
   2171 WRIddddddd

4.4.5 Summary and EIC overlap

The table summarises the contents of the datasests above, the number of records with EIC identifiers and the number of coordinate pairs in each of the datasets.

Also are counted the EIC codes present in each dataset which we also find in Open Street Map and the other external datasets

SPARQL query for entities with ref:EU:ENTSOE_EIC on OSM.

Data Source Items with EIC Distinct EIC ids Coords Total OSM Match
OSM TagInfo 3364 - 3364 -
Sophox 3540 3533 3540 -
PyPsa 5061 5049 1975 3541
Open Power System 4277 3944 997 3639
JRC Open Plants 3961 3961 4865 993
JRC Open Generators 6809 6809 4722 59
Wikidata 1267 1267 1120 791

5 Analytics

The following analytics will be provided, using items from data domains EIC, Generation, Load, and Outages.

5.1 Faceted Search for Production and Generation Units

A faceted search will allow searching for production and generation units based on their location and fuel type. The following facets will be included:

5.1.1 Search Parameters

  • Bidding Zone
  • Control Area
  • Country (hierarchical)
    • ADM1 administrative subdivision
  • Fuel Type (hierarchical)
    • fossil
      • coal
      • gas...
    • renewable
      • solar
      • wind
      • hydro...
    • nuclear

5.1.2 Display of Aggregated Values

Aggregated values for number of units and cumulative capacity will be displayed on each element of the search.

The results of the search will be displayed as a list. It is however possible to also combine the search with other modalities and display the result on a map or on a chart

5.2 (Canceled) Actual and Forecasted Load Timeline

A timeline showing all the data from the load domain (actual and projected, 5 individual tables) for a given Control Area, Bidding Zone, Country

Below is a mockup of this chart realized using Google Charts.

  • It displays data for the month of December for BZA BG
  • Actual and Day Ahead data aree shown as a line chart.
  • Week/Month/Year -ahead forecasts, possibly as superimposed upper and lower bound on the timeline.

An interactive version of the chart is available here. N.B it is not available for mobile browsers.

The mockup is limited by Google Charts' features but shows how the data looks when superimposed. Of particular interest are the occasions when the forecast and actual load are mismatched. This is easily visible on the chart and we will emphasise on them in the final version, using the available functionalities of the Vega charting library, (e.g this example)

5.3 Wind and Solar Actual vs Forecasted Generation

A Timeline showing day ahead wind and solar and actual generation wind and solar.

The timeline will be analogous to the previous example.

  • The forecasted data is provided aggregated by the TSOs.
  • Actual generation will be calculated based on the Actual Generation data and the fuel type.

5.4 Production Units on a Map

Zoomable and navigable map with the production and generation units.

Example of a map showing power plants by capacity and fuel type:

img/plants-map.png

5.4.1 Data Visible on the Map Markers

  • current generation
  • installed capacity
  • fuel type
  • existence of a future planned outage

5.4.2 Drill-down

Drill-down data is available when interacting with a marker. This can be:

  • A pop-up or tooltip will display detailed information about the unit, gathered from ENTSOE data and augmented with OpenStreetMap data
  • Outages: current or future, planned or forced, active or canceled
  • Links to external data sources (such as Wikidata, Wikipedia, OSM)
  • Detailed power plant outline on a map (whenever available from OSM)

5.5 Outages on a Map

Outages displayed on a map: current or future, planned or forced, active or canceled.

  • Shown per Bidding Zone or Control Area
  • Filterable by time range

5.6 Balancing Energy Timeline

A timeline showing Prices Of Activated Balancing Energy and ActivatedBalancingEnergy for any given area. The diagram consists of 2 vertically symmetrical zones, one for "up" regulation and one "down" regulation. Each zone superimposes - 4 line charts for the price of each resource type - A stacked histogram for the volume of each activated resource

The following transformations need to be applied

  • Temporal harmonisation
    • All values are converted to hourly or daily
  • Values aggregation
    • Volumes are summed
    • Prices are averaged
  • Currency transformations
    • Non EUR currencies are converted to EUR using the daily rate

An example of a similar diagram can be seen in this vega example

5.7 Future accepted offers bubble plot timeline

A timeline chart with circular markers showing future accepted offers from AcceptedAggregatedOffers_17.1.D data item The chart will display the following variables: - temporal dimension (x-axis) - area concerned by the bid (y-axis): this will create a swimlane effect - Volume: size of the marker - direction: shape of the marker (a circular marker with a protrusion directed up or down) - type of asset: color of the marker - a summary of the above variables displayed in the popup

An example of a similar diagram can be seen in this vega-lite example

5.8 Area price/volume bubble plot

A timeline chart combining ActivatedBalancingEnergy_17.1.E and PricesOfActivatedBalancingEnergy_17.1.F

Similar to the chart above the price/volume bubble chart will show price instead of time.

  • price of the product (x-axis)
  • area concerned by the bid (y-axis): this will create a swimlane effect
  • Volume: size of the marker
  • Direction: shape of the marker (a circular marker with a protrusion directed up or down)
  • Type of asset: color of the marker
  • a summary of the above variables displayed in the popup

5.9 Analytics Design

Technologies to use for Analytics:

  • The web application will be built via React or Angular.
  • The visualisations will be created via Kibana and embedded in the web application.
  • Kibana dashboards offer great tooling for visualisations. Among the built-in tools it offers rich custom visualisation options via Vega and Vega-lite
  • Data will be stored in ElasticSearch for quick access and aggregations and will be accessed directly from Kibana for the visualisations and via the Ontotext Platform for the facets.

5.10 Update Process

The data is updated automatically from the ENTSOE SFTP and REST services on a daily basis

6 Semantic Models

The semantic models is in the form of turtle examples and diagrams of all semantic data areas. They are shown in previous sections:

6.1 Basic Semantic Data

"Manual" RDFization

  • Eg1: doc SFTP Appendix B: Area Naming Convention has the zone codes used on ENTSO portal.
    • Eg EIC 10Y1001A1001A869 is BZN|UA-DobTPP (bidding zone Ukraine-Dobrotvirska TPP)
    • BZN is a prefix that is displayed for the particular time series, not an attribute of that EIC
    • But the EIC file has notation UA-DOB_TPP (different spelling) and functions "Control Area, Market Balance Area, Scheduling Area" but not "Bidding Zone"
  • Eg2: the "knowledge base" (kb.ttl) describes the Data Items, more details are needed. See section above

6.2 TEKG Ontology

The TEKG ontology is available in tr.ttl and covers the full scope of the semantic models.

The ontology is also available in the Annex of this document.

6.3 TEKG SOML (GraphQL) Schema

7 System Architecture

We have revised and elaborated the conceptual architecture compared to the proposal. It presents the technologies and services that TEKG will use and implement to achieve its objectives:

  • ETL Application: responsible for entire process of download, transformation and import of the transparency data.
    • Data will be fetched from the ENTSOE transparency platform on a scheduled basis. We'll use XML for master data (EIC, code lists and Installed Capacity), and CSV for transactional (time series) data.
    • The RDFization process will be done using GraphDB OntoRefine tool and its Mapping UI to transform the loaded data to RDF.
  • Validation: provides data validations using a combination of standard SHACL and advanced SHACL-SPARQL rules and integrating external data validations
  • Semantic Storage: RDF is loaded or updated to a semantic repository in GraphDB. Modest inference is implemented (GraphDB rules and/or SPARQL Updates)
  • ElasticSearch: RDF data is automatically indexed to ElasticSearch for full-text search, faceting, and analytics.
  • Elastic Index Monitoring: Kibana is used on top of Elastic to provide easy index management and monitoring.
  • TEKG Application: provides UI for visualizations and validation reporting on the transparency data that has been ingested and analysed in the different components.
  • Monitoring: InfluxDB and Grafana are used to monitor the overall infrastructure and performance of the system.

All components will be packaged and deployed in an enterprise-ready fashion using Docker, Kubernetes, and Helm charts.

The programing languages and frameworks used for development of the different components, services and tests are:

  • Java: used for development of the data processing components (data fetchers, ETL processing, RDF data validation? and import)
    • Spring Boot: allows quick building of services. It provides a lot of flexibility and functionalities out of the box
  • JavaScript: used for the web application and the acceptance tests of the components/services
    • Angular: development platform, which includes various tools, libraries and frameworks for building and scaling web applications
    • Cypress: framework for end-to-end testing
    • Cucumber.js: test framework for behavior-driven development
  • Python: used for scripting several small functionalities. For example VAT numbers validation
    • AIOHTTP: Asynchronous HTTP client/server framework
    • pandas: data analysis and manipulation tool

7.1 Data Fetching

Source data is obtained from ENTSOE transparency platform on a scheduled basis (frequency to be discussed) via:

  • REST API: master data in XML
  • SFTP server: transactional (time series) data in tab delimited flat files saved with the *.csv file extension

7.2 Semantic Conversion Service

The service will convert the ingested XMLs and CSVs and produce RDF data. The initial assumption was that we are going work only with the XMLs from the REST API and the main tool that we proposed was XSPARQL. After careful exploration of the data and its sources, we discovered additional data in CSV format that we need.

To achieve flexible and generic service that can handle the required data, we've considered using additional tools like OntoRefine and TARQL. In order to measure the performance of the different tools and to pick the right one for the service, we've done some experiments. The results are presented in the Conversion Performance Comparison section.

7.2.1 XSPARQL

XSPARQL is a language for transforming data between XML and RDF.

  • It is built by combining the strengths of two query languages: XQuery for XML, and SPARQL for RDF.

XSPARQL Github contains the implementation of the tools that we are using.

7.2.1.1 XSPARQL Example

Data

<?xml version="1.0" encoding="UTF-8"?>
<Configuration_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0">
    <mRID>8be8471a92f345ce8129102d965c19d7</mRID>
    <type>A95</type>
    <process.processType>A39</process.processType>
    <sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
    <sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
    <receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
    <receiver_MarketParticipant.marketRole.type>A32</receiver_MarketParticipant.marketRole.type>
    <createdDateTime>2022-01-17T12:50:49Z</createdDateTime>
    <TimeSeries>
        <mRID>87546cb0270a4ea8</mRID>
        <businessType>B11</businessType>
        <implementation_DateAndOrTime.date>2021-10-01</implementation_DateAndOrTime.date>
        <biddingZone_Domain.mRID codingScheme="A01">10YUA-WEPS-----0</biddingZone_Domain.mRID>
        <registeredResource.mRID codingScheme="A01">62W875768058757F</registeredResource.mRID>
        <registeredResource.name>KALUSHCHPP</registeredResource.name>
        <registeredResource.location.name>Kalush</registeredResource.location.name>
        <ControlArea_Domain>
            <mRID codingScheme="A01">10YUA-WEPS-----0</mRID>
        </ControlArea_Domain>
        <Provider_MarketParticipant>
            <mRID codingScheme="A01">10X1001C--00001X</mRID>
        </Provider_MarketParticipant>
        <MktPSRType>
            <psrType>B05</psrType>
            <production_PowerSystemResources.highVoltageLimit unit="KVT">110</production_PowerSystemResources.highVoltageLimit>
            <nominalIP_PowerSystemResources.nominalP unit="MAW">200</nominalIP_PowerSystemResources.nominalP>
            <GeneratingUnit_PowerSystemResources>
                <mRID codingScheme="A01">62W2081564720502</mRID>
                <name>KALUSHCHPP-V</name>
                <nominalP unit="MAW">200</nominalP>
                <generatingUnit_PSRType.psrType>B05</generatingUnit_PSRType.psrType>
                <generatingUnit_Location.name>Kalush</generatingUnit_Location.name>
            </GeneratingUnit_PowerSystemResources>
        </MktPSRType>
    </TimeSeries>
</Configuration_MarketDocument>

Script

prefix ns:  <urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0>
prefix tr:  <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

declare variable $input as xs:string external;
declare option saxon:output "method=text";

for $data in doc($input)/ns:Configuration_MarketDocument/ns:TimeSeries
let $BASE := "https://transparency.ontotext.com/resource/"
let $TYPE := fn:concat($BASE,"type/")
let $UNIT := fn:concat($TYPE,"UnitSymbol/") # TODO or "UnitOfMeasure/" ?
let $EIC  := fn:concat($BASE,"eic/")
let $url  := fn:concat($EIC,$data/ns:registeredResource.mRID/text())

construct {
  <{$url}>
    tr:dateImplemented {$data/ns:implementation_DateAndOrTime.date/text()}^^xsd:date;
    tr:notationAlt {$data/ns:registeredResource.name/text()};
    tr:location {$data/ns:registeredResource.location.name/text()};
    tr:assetType <{fn:concat($TYPE,"Asset/",$data/ns:MktPSRType/ns:psrType/text())}>.
    {
      for $x in $data/ns:biddingZone_Domain.mRID/text()                                  # 0-1
        construct {<{$url}> tr:biddingZone <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:ControlArea_Domain/ns:mRID/text()                               # 1-many
        construct {<{$url}> tr:controlArea <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:Provider_MarketParticipant/ns:mRID/text()                       # 1-many
        construct {<{$url}> tr:providerParticipant <{fn:concat($EIC,$x)}>},
      for $x in $data/ns:MktPSRType/ns:production_PowerSystemResources.highVoltageLimit  # 0-1
        construct {
          <{$url}> tr:highVoltageLimit {$x/text()}^^xsd:float
        },
      for $x in $data/ns:MktPSRType/ns:nominalIP_PowerSystemResources.nominalP           # 0-1
        construct {
          <{$url}> tr:installedOutput {$x/text()}^^xsd:float
        },
      for $gen in $data/ns:MktPSRType/ns:GeneratingUnit_PowerSystemResources             # 0-many
        let $url1 := fn:concat($EIC,$gen/ns:mRID/text())
        construct {
          <{$url}> tr:generationUnit <{$url1}>.
          <{$url1}>
            tr:notationAlt {$gen/ns:name/text()};
            tr:assetType <{fn:concat($TYPE,"Asset/",$gen/ns:generatingUnit_PSRType.psrType/text())}>;
            tr:location {$gen/ns:generatingUnit_Location.name/text()};
            tr:installedOutput {$gen/ns:nominalP/text()}^^xsd:float
        }
    }
}

Result

@base <https://transparency.ontotext.com/resource/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tr: <https://transparency.ontotext.com/resource/tr/> .

<eic/62W875768058757F> tr:dateImplemented  "2021-10-01"^^xsd:date .
<eic/62W875768058757F> tr:notationAlt  "KALUSHCHPP" .
<eic/62W875768058757F> tr:location  "Kalush" .
<eic/62W875768058757F> tr:assetType  <type/Asset/B05> .
<eic/62W875768058757F> tr:biddingZone  <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:controlArea  <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:providerParticipant  <eic/10X1001C--00001X> .
<eic/62W875768058757F> tr:highVoltageLimit  "110"^^xsd:float .
<eic/62W875768058757F> tr:installedOutput  "200"^^xsd:float .
<eic/62W875768058757F> tr:generationUnit  <eic/62W2081564720502> .

<eic/62W2081564720502> tr:notationAlt  "KALUSHCHPP-V" .
<eic/62W2081564720502> tr:assetType  <type/Asset/B05> .
<eic/62W2081564720502> tr:location  "Kalush" .
<eic/62W2081564720502> tr:installedOutput  "200"^^xsd:float .

7.2.1.2 XSPARQL Service Implementation

Ontotext has packaged XSPARQL as a web service (WAR file). The benefit of using a web service is that it saves Java startup time, which is needed for every invocation of the command-line tool.

  • The WAR with XSPARQL service can be loaded directly to the embedded web server (Tomcat) that Spring Boot uses to boot the main application.
  • The implementation requires a factory class, which builds and provides a context for the service endpoint.
  • The factory also plays the role for interceptor, when the server starts, which triggers the WAR file provisioning.
  • The service is invoked by providing a dataset in XML format and the transformation query.

As a further optimization, we considered precompiling the various conversion and putting them into a Registry. This would save the transpilation time (from XSPARQL to XQuery) and compilation time (from XQuery to executable transformation).

7.2.1.3 XSPARQL Issues

  • Uses log4j, which needs to be updated to the latest version due to security vulnerabilities.
  • Maintenance of the code will be hard as it was written some years ago and the community looks inactive.
    • Hard to improve the code or to extend its functionality.
    • Hard to fix issues related to the transformations.
  • Lack of batch processing/transformation
  • Works only with XML file types.

7.2.2 OntoRefine

OntoRefine is a user-friendly tool for cleaning data and converting it to RDF.

  • It's an adaptation of the popular OpenRefine tool developed by Ontotext and integrated in GraphDB Workbench.
  • It allows visual development of data conversions with a Mapping UI, therefore is suitable for non-programmers.
  • It can process various file formats, including CSV, Google sheets, XML, JSON

The fact that the OntoRefine handles various file formats, including XML, CSV, JSON, etc., makes it a perfect candidate for the current project. It is the preferred option because it is developed and maintained by Ontotext, and shows best overall performance.

Issues:

  • There are bugs present in the Mapping UI that prevent defining blank nodes (not an issue for this project).
  • To process a file, the tool creates a project (workspace), which should be cleared afterwards. It adds time and complexity to the process.

Note: the rest of this section describes Reconciliation, which is not used in the current project.

Another big advantage is matching of tabular data to KGs via different reconciliation services that OntoRefine supports. Reconciliation services provide semantic matching functionality.

There are various free reconciliation services that can be used by OntoRefine. The Reconciliation Testbench provides a list of some of these services. We host and support three such services based on a subset of Wikidata:

7.2.2.1 OntoRefine Example

The OntoRefine Mapping UI allows visual creation of semantic transformations. Here's a transformation for the same XML data as in the XSPARQL example:

Using the same data as in the XSPARQL example, it produces a semantically equivalent result.

A conversion script can be exported from the Mapping UI (as JSON) and used as a batch process (see next section). Additionally, the script contains all operations performed over the dataset, including data cleaning and the reconciliation operations.

7.2.2.2 OntoRefine Service Implementation

We developed a conversion service using OntoRefine: a public library called ontorefine-client.

  • It exposes a large portion of OntoRefine functionalities through an intuitive API, which we use to build and integrate the transformation process.
  • The process is exposed through a REST endpoint.
  • The user invokes the service by providing a dataset and a previously saved transformation script (created in OntoRefine Mapping UI).

7.2.3 TARQL

TARQL is a highly performant tool for converting very large CSV/TSV files.

  • Tarql GitHub Project contains the source code of the tool.
  • Conversions are written in the form of SPARQL CONSTRUCT queries that iterate over every table row.
  • One can do limited data cleaning; splitting cell values is also supported.
  • It is developer-oriented since it requires SPARQL knowledge.

Issues:

  • It is a third party tool, which could be a problem, if there are issues that should be fixed quickly.
  • Lack of batch processing.
  • No out of the box web service. We have to create one from scratch.
  • Works only with CSV or TSV files.

If the project used XSPARQL for conversion of XML files, we could use TARQL for conversion of CSV files.

7.2.3.1 TARQL Example

Data (CSV example from CrunchBase)

permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
lifelock,LifeLock,,web,Tempe,AZ,1-May-07,6850000,USD,b

Mapping

PREFIX ex: <http://ex.org/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

CONSTRUCT {
  ?URI a ex:Organization;
    ex:permalink ?permalink;
    ex:name ?company;
    ex:employees ?numEmployees;
    ex:category ?category;
    ex:city ?city;
    ex:state ?state;
    ex:fundingDate ?fundedDate;
    ex:raisedAmt ?amount;
    ex:raisedCurrency ?raisedCurrency;
    ex:round ?round;
}
WHERE {
  BIND (URI(CONCAT('http://ex.org/companies/', ?permalink)) AS ?URI)
  BIND (xsd:integer(?numEmps) AS ?numEmployees)
  BIND (xsd:decimal(?raisedAmt) AS ?amount)
}

Result

<http://ex.org/companies/lifelock>
  a ex:Organization ;
  ex:permalink "lifelock" ;
  ex:name "LifeLock" ;
  ex:category "web" ;
  ex:city "Tempe" ;
  ex:state "AZ" ;
  ex:fundingDate "1-May-07" ;
  ex:raisedAmt "6850000"^^xsd:decimal ;
  ex:raisedCurrency "USD" ;
  ex:round <http://example.com/b> .

7.2.3.2 TARQL Service Implementation

TARQL does not have a web service implementation for so we would need to implement one.

  • The tool can be wrapped in a process that is called when the REST endpoint is invoked.
  • The process will trigger the execution of a standard TARQL command and provide the required arguments to it.
  • Same as other proposed solutions, the service would be implemented using Spring Boot so that it can be deployed and distributed easily.

7.2.4 Conversion Performance Comparison

We did some performance testing to ensure that the most suitable tool can be selected. We used prototypes of conversion services to measure their performance.

7.2.4.1 OntoRefine vs XSPARQL

For the comparison we use XML datasets in data/xml/Production_Unit (documents of type Configuration_MarketDocument).

Because the number and size of data files is not that large yet, we have multiplied them in order to measure at scale and reproduce the load of an actual production environment.

The first two columns show count and size of files (MB), the last two columns show time to process by 2 of the tools (seconds).

count MB XSPARQL OntoRefine
46 4.1 2 1.6
460 12.6 13.6 10.6
4600 353.9 181.9 156.5
7000 620.2 294.6 264.1

For comparison purposes we made the services work in an identical way and process the datasets one by one. There are a few optimizations possible for each service, but they are not worth doing at the moment.

7.2.4.2 OntoRefine vs TARQL

We compared the performance of TARQL and OntoRefine on a 240 MB CSV file, producing the same RDF data.

  • OntoRefine processes the file in 13-20 seconds
  • TARQL is 2-3 times slower

7.2.5 Semantic Conversion Scripts

The semantic conversion scripts are in etl_scripts/OR. They are specialized SPARQL CONSTRUCT queries, that run in a OntoRefine instance and map tabular data to a predefined graph pattern.

7.3 Semantic Data Pipeline

The data pipeline is glue code to implement Fetch> Conversion> GraphDB> (Validation, Elastic indexing).

It is a standalone Spring Boot application, which have the following components:

  • FTP Resource Downloaders
  • HTTP Resource Downloaders
  • Conversion Services
  • Data Import Services

Simple layout of the application components.

Interaction flow between the services.

TODO M4: Add Import and Validation flow

FTP Resource Downloaders

The service is responsible for retrieval of specified datasets from the SFTP. It servers as data provider for the automatic Conversion Service by retrieving the required datasets. The retrieval is done by process, which listens for changes in the FTP, more specifically upload of new dataset. When such event is detected, the service will trigger and make a copy of the file in a configured dataset store. It is possible to filter the trigger event by providing a matching pattern for the file names.

As addition to the automatic mode, the service supports manual invocation. It is convenient for testing or when another application/system want to plug into the processing pipeline.

HTTP Resource Downloaders

Similar to FTP Resource Downloader, this service provides datasets to the Conversion Service. However, unlike the other downloader, this one is not reactive. The process of retrieving the required datasets is by performing HTTP requests to specific REST API. The requests are performed at configurable fixed rate. The datasets that should be retrieved are specified by the request parameters, which are provided externally by configurations. This design allows flexibility and easy modifications, if such are necessary. It also provides the ability to change the scale of the scope of the data that the system is processing.

As the other one, this service exposes its functionality via REST endpoint, which can be invoked manually.

Conversion Service

The purpose of the Conversion Service is to transform the downloaded datasets to RDF data, which can be imported in GraphDB. Like the downloaders, this service has two aspects:

  • manual: allows invocation of the transformation on demand by calling a REST endpoint and providing specific parameters along with the dataset that should be transformed.
  • automatic: the main functionally of the service. It is trigger, when a new file is added to the dataset storage. To the dataset is applied transformation script, which contains mapping to RDF data format.

The automatic transformation process begins, when the application is started. If there are unprocessed files in the datasets store, it is picked and the transformations are applied. The transformations are predefined scripts in JSON format. When the conversion is successful, the result RDF data is stored in a file, which later is imported in GraphDB.

The transformation itself is done by using OntoRefine tool. It functionalities are invoked by the OntoRefine Service, which contains the required steps to process a single dataset.

Data Import Service

This service does the job of importing the RDF data in GraphDB and trigger the validation. Following the design of the other components, the import service will have manual and automatic aspects. Similar to the automatic conversion, the trigger of the import service is existence of a unprocessed RDF data file. If the import is successful, the file will be marked as imported and removed from the directory.

7.4 TEKG Dashboard Application

Transparency EKG (TEKG) dashboard application is a single page web application with analytical user interface that provides visualizations and validation reporting upon the transparency data that has been ingested, analyzed and validated in GraphDB, see DQA Dashboard.

Transparency EKG uses GraphDB's Elasticsearch connector to synchronize all relevant data in multiple Elasticsearch indices. This enables the dashboard to perform full text and faceted searches in order to construct visualizations as well as to limit down data requests to a single data source.

Refer to Elasticsearch GraphDB connector documentation for more information.

7.4.1 Design

TEKG Dashboard application consists of two parts: the static HTML and CSS files and a server part that serves these static files and acts as an API proxy.

The server part acts as a "backend for front end" which proxies API requests from the web and constructs queries that are then sent to Elasticsearch. This server is implemented with NodeJS and Express framework. Checkout NodeJS and Express documentations for more information.

The web part is implemented with the Angular platform and Typescript. This is a modern choice of framework stack that helps designing and building single page applications (SPA). The source code is organized in web components grouped in Angular modules that are type safe and reusable throughout the application. The Angular platform comes with its own CLI tool which helps generate various web components and modules very easily. Checkout Angular documentation for more information.

The web part will proxy all of its requests down to the server part in order to avoid direct communications from the client to the Elasticsearch server. Queries will be constructed in the server part in order to shift away the complexity from the web.

7.4.1.1 Visualization of Analytics

For analytics visualizations, the TEKG dashboard application makes use of VEGA. This is a visualization grammar with vast options for chart types, transformations and interactions. TEKG Dashboard application will fetch data from ES for each analytic, transform it and pass it to VEGA for rendering. The design of the analytics visualizations is as follows:

  • VEGA wrapper component with default settings for rendering and responsive layout.
  • A set of settings that specifies concrete loading options, transformations and visualizations for each analytics. This allows to add analytics step by step.
  • Analytics service that uses the set of settings to load and transform the data.
  • A web page that injects the analytics service and uses it to request data and render the different analytics with VEGA.

The web page will have options for filtering the analytics data which will result in re-fetching it from ES.

7.4.1.2 Visualization of Validation Reports

The TEKG dashboard application will allow the user to browse and analyze validation reports that have been performed by the Semantic Data Validation Service. The validation visualizations will consist of:

  • Validation table component that renders the content of validation reports. This will be a paginated component with standard options for sorting and filtering.
  • Validation service that performs API requests to the server with different options for paging, sorting, filtering etc.
  • A web page embedding the table with different filters and facets to narrow down the fetched validation reports.

7.4.1.3 Visualization of Map Data

For visualizing map data, the TEKG dashboard application will use Leaflet, a library for making interactive maps with OpenStreetMap data. It provides an easy to use API with a lot of options for configurations and extensions.

The dashboard application will have a wrapper component of Leaflet that can be embedded throughout the analytics to provide more context and insight of the data.

7.4.2 Layout

An example layout for the TEKG dashboard application

7.4.3 Packaging

TEKG Dashboard is packaged as a Docker image to achieve portability, ease of deployment and scalability. It can be deployed as a simple Docker container (with Docker compose for example) or as a Kubernetes deployment.

7.5 Monitoring

We use Grafana to monitor the overall infrastructure and performance of the system and its services, primarily GraphDB and Ontotext Platform (Semantic Objects service).

Monitoring data is collected with various Telegraf plugins and then stored in the InfluxDB time series database.

8 Energy Knowledge Graph

V1 of the Energy Knowledge Graph is currently available as RDF graph and SPARQL endpoint.

The Graph consists of 116 million triples and covers the selected data items for a period of three full months as well as the data from the current month (2022-01 - 2022-04).

The following table summarizes the number of observations (tr:DataObservation) per Data Item:

dataItem n_observetions
generation/ActualGenerationOutputPerGenerationUnit 3969000
generation/AggregatedGenerationPerType 2812002
balancing/AggregatedVolumes 2003930
balancing/AggregatedVolumes_HOURLY 829965
balancing/PricesOfActivatedBalancingEnergy 708636
balancing/PricesOfActivatedBalancingEnergy_HOURLY 347570
generation/CurrentGenerationForecastForWindAndSolar 283136
outages/UnavailabilityOfProductionOrGenerationUnits 79404
balancing/AggregatedVolumes_DAILY 43351
balancing/PricesOfActivatedBalancingEnergy_DAILY 17417
generation/InstalledGenerationCapacityComputed 41

A number of sample queries are available on the GraphDB Workbench home page

9 Annex

9.1 Full TEKG Ontology

Bellow is the ontology in Turtle format.

# @prefix trr:  <https://transparency.ontotext.com/resource/> .    # OMIT since this takes over all other prefixes
@prefix tr:   <https://transparency.ontotext.com/resource/tr/> .   # Ontology
@prefix eic:  <https://transparency.ontotext.com/resource/eic/> .  # EnergyResource with EIC
@prefix type: <https://transparency.ontotext.com/resource/type/> . # codelists

@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix dct:    <http://purl.org/dc/terms/> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sh:     <http://www.w3.org/ns/shacl#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .
@prefix vann:   <http://purl.org/vocab/vann/> .

tr: a owl:Ontology;
  rdfs:label "Transparency Energy ontology";
  rdfs:comment "Ontology for data from the ENTSOE Electricity Market Transparency portal";
  rdfs:seeAlso <https://transparency.entsoe.eu/>, <https://transparency.ontotext.com/>;
  dct:creator <https://ontotext.com/>, <mailto:vladimir.alexiev@ontotext.com>;
  dct:created "2021-06-02"^^xsd:date;
  dct:modified "2022-02-21"^^xsd:date;
  owl:versionInfo "1.0";
  vann:preferredNamespaceUri "https://transparency.ontotext.com/resource/tr/";
  vann:preferredNamespacePrefix "tr".

#################### classes

tr:Area a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Area";
  rdfs:comment "Area, as referenced in CSV files, described in REST API documentation and out of which resources are served by the REST API".

tr:CodeList a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Code List";
  rdfs:comment "A code list (eg Message type, UnitOfMeasure, Asset type)".

tr:CodeValue a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Code Value";
  rdfs:comment "Value in a code list".

tr:Country a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Country";
  rdfs:comment "Country (member state)".

tr:DataDomain a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Domain";
  rdfs:comment "Major area of transparency data".

tr:DataItem a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Item";
  rdfs:comment "Data item (time series) of transparency data in a particular domain".

tr:DataObservation a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Data Observation";
  rdfs:comment "Data Observation, having dataItem, date, dateUpdated and observation-specific fields".

tr:EicTypeValid a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC Type Valid";
  rdfs:comment "EIC types that are valid or invalid with the listed function".

tr:EnergyResource a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Energy Resource";
  rdfs:comment "Energy resource or participant identified with EIC and having a function".

tr:FunctionValid a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Function Valid";
  rdfs:comment "A valid function and a corresponding invalid (misspelt) function".

tr:GenerationUnit a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Generation Unit";
  rdfs:comment "Generation Unit (generator) as described at the lower level of Installed Capacity of Production and Generation Units".

tr:Outage a rdfs:Class;
  rdfs:subClassOf tr:DataObservation;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Outage";
  rdfs:comment "Outage (unavailability) of Production or Generation Unit".

tr:ProductionUnit a rdfs:Class;
  rdfs:subClassOf tr:EnergyResource;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Production Unit";
  rdfs:comment "Production Unit (power plant) as described at the higher level of Installed Capacity of Production and Generation Units".

tr:ValidationCount a rdfs:Class;
  rdfs:isDefinedBy tr: ;
  rdfs:label "Validation Count";
  rdfs:comment "Validation summary result, characterized by rule (shape), area and count".

#################### properties

tr:acerCode a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ACER code";
  rdfs:comment "Agency for Cooperation of Energy Regulators code of an energy participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:actualConsumption a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "actual consumption";
  rdfs:comment "Actual consumption of Production Unit due to technological consumption (MW)"; # or Area?
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:actualOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "actual output";
  rdfs:comment "Actual power output of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:appliesTo a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "applies to";
  rdfs:comment "Whether this validation rule applies to 'Country' or 'Area' (used for sorting them into tables)";
  rdfs:domain sh:Shape;
  rdfs:range xsd:string.


tr:assetType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "asset type";
  rdfs:comment "Asset type of a Power System Resource";
  rdfs:domain tr:EnergyResource, tr:DataObservation;
  rdfs:range tr:CodeValue;
  tr:xpath "MktPSRType/psrType".

tr:availableOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Available power output of Production or Generation Unit, reduced due to Outage (MW)";
  rdfs:domain tr:Outage;
  rdfs:range xsd:float.

tr:biddingZone a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "bidding zone";
  rdfs:comment "Bidding Zone of this Energy Resource or Outage";
  schema:domainIncludes tr:EnergyResource, tr:Outage;
  rdfs:range tr:Area;
  tr:xpath "biddingZone_Domain.mRID".

tr:codeList a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "code list";
  rdfs:comment "List this code value is part of";
  rdfs:domain tr:CodeValue;
  rdfs:range tr:CodeList.

tr:controlArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "control area";
  rdfs:comment "Control Area(s) of this Energy Resource or Outage";
  schema:domainIncludes tr:EnergyResource, tr:Outage;
  rdfs:range tr:Area;
  tr:xpath "ControlArea_Domain/mRID".

tr:count a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "count";
  rdfs:comment "Count of violations";
  rdfs:domain tr:ValidationCount;
  rdfs:range xsd:integer.

tr:countryCode  a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "country code";
  rdfs:comment "Country code of an energy resource or participant";
  schema:domainIncludes tr:EnergyResource, sh:ValidationResult, tr:ValidationCount;
  rdfs:range xsd:string;
  tr:xpath "eICCode_MarketParticipant.streetAddress/townDetail/country".

tr:currency a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "currency";
  rdfs:comment "Currency code corresponding to the 'price' field";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:string.

tr:dataDomain a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "data domain";
  rdfs:comment "Domain of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range  tr:DataDomain.

tr:dataItem a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "data item";
  rdfs:comment "Data item(s) that this observation (or validation rule) is (are) about";
  schema:domainIncludes tr:DataObservation, sh:Shape;
  rdfs:range tr:DataItem.

tr:date a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date";
  rdfs:domain tr:DataObservation;
  rdfs:comment "Date of an observation";
  rdfs:range xsd:dateTime.

tr:dateEnd a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date end";
  rdfs:domain tr:Outage;
  rdfs:comment "Ending date of an outage";
  rdfs:range xsd:dateTime.

tr:dateImplemented a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date implemented";
  rdfs:comment "Date when an Energy Resource was implemented";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:date;
  tr:xpath "implementation_DateAndOrTime.date".

tr:dateStart a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date start";
  rdfs:domain tr:Outage;
  rdfs:comment "Starting date of an outage";
  rdfs:range xsd:dateTime.

tr:dateUpdated a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "date updated";
  schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource, tr:DataObservation, tr:Outage;
  rdfs:comment "Date when a record was last updated";
  rdfs:range xsd:dateTime;
  tr:xpath "lastRequest_DateAndOrTime.date".

tr:description a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "description";
  rdfs:comment "A description of something";
  schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string.

tr:direction a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "direction";
  rdfs:comment "Direction of energy flow of this balancing volume or price (Up, Down, Up and Down)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:displayArea a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "display area";
  rdfs:comment "Area notation or country code where this validation result or count should be grouped, including the special values 'other' and 'none'";
  schema:domainIncludes sh:ValidationResult, tr:ValidationCount;
  rdfs:range xsd:string.

tr:duration a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "duration";
  rdfs:comment "Duration (time quant) of this data observation";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:duration.

tr:eic a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC";
  rdfs:comment "Energy Identification Code of an energy resource or participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:eicType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type";
  rdfs:comment "Type of Energy resource or participant derived from the third char of its EIC. It's a single-value field and is a 'supertype' of 'function'";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:CodeValue.

tr:eicTypeInvalid a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type invalid";
  rdfs:comment "EIC type that is invalid with the listed function";
  rdfs:domain tr:EicTypeValid;
  rdfs:range tr:CodeValue.

tr:eicTypeValid a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "EIC type valid";
  rdfs:comment "EIC type that is valid with the listed function";
  rdfs:domain tr:EicTypeValid;
  rdfs:range tr:CodeValue.

tr:ekgCheckDataQuality a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "TEKG checks data quality";
  rdfs:comment "Whether the TEKG project checks the quality of data of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:ekgImplementAnalytics a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "TEKG implements analytics";
  rdfs:comment "Whether the TEKG project implements analytics over this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:energyResource a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "energy resource";
  rdfs:comment "Energy resource (Production or Generation Unit) reported in this outage";
  rdfs:domain tr:Outage;
  rdfs:range tr:EnergyResource.

tr:fields a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "fields";
  rdfs:comment "Fields that this validation rule is about (listed as a single string)";
  rdfs:range sh:Shape;
  rdfs:range xsd:string.

tr:fileName a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "file name";
  rdfs:comment "Root file name of this data item";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:fileType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "file type";
  rdfs:comment "File type of this data item as consumed by the TEKG project (XML or CSV)";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:function a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function";
  rdfs:comment "Function(s) of an energy resource or participant, eg Generation Unit, Production Unit, Generation, Load, Connection Point, Internal Line, Tieline, Transformer, Substation, Trade Responsible Party, Balance Responsible Party, Production Responsible party, Consumption Responsible Party...";
  rdfs:domain tr:EnergyResource, tr:EicTypeValid, tr:FunctionValid;
  rdfs:range xsd:string.

tr:functionInvalid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function invalid";
  rdfs:comment "Function that is invalid (misspelled)";
  rdfs:domain tr:FunctionValid;
  rdfs:range xsd:string.

tr:functionValid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "function valid";
  rdfs:comment "Function that is valid, or allowed for this EIC type";
  rdfs:domain tr:CodeValue, tr:FunctionValid;
  rdfs:range xsd:string.

tr:generationUnit a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "generation unit";
  rdfs:comment "Generation Units of this Production Unit (semi-inverse of parentResource)";
  rdfs:domain tr:ProductionUnit;
  rdfs:range tr:GenerationUnit.

tr:hasProdUnits a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "has Production Units";
 rdfs:comment "Whether the area has Production/Generation Units returned from the REST API";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:highVoltageLimit a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "high voltage limit";
  rdfs:comment "High voltage limit of Production Unit";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:float;
  tr:xpath "production_PowerSystemResources.highVoltageLimit".

tr:inAPI a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in API";
 rdfs:comment "Whether the area is returned by the REST API";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inDoc a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in Documentation";
 rdfs:comment "Whether the area is decsribed in the REST API documentation";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inEIC a owl:DatatypeProperty;
 rdfs:isDefinedBy tr: ;
 rdfs:label "in EIC";
 rdfs:comment "Whether the area is described in the EIC file (we've added the missing ones in eic-extra.ttl)";
 rdfs:domain tr:Area;
 rdfs:range xsd:boolean.

tr:inVies a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "in VIES";
  rdfs:comment """Whether a Country or a particular Party's VAT Number is present in the EU VAT Information Exchange System (VIES).
No value is recorded for Party if its country is not covered by VIES""";
  rdfs:domain tr:EnergyResource, tr:Country;
  rdfs:range xsd:boolean.

tr:installedOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "installed output";
  rdfs:comment "Installed nominal power output of Production or Generation Unit (MW)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:float;
  tr:xpath "nominalP".

tr:isFreeReuse a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "is for free reuse";
  rdfs:comment "Whether the data item can be reused freely";
  rdfs:domain tr:DataItem;
  rdfs:range xsd:boolean.

tr:isVatValid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "is VAT valid";
  rdfs:comment "Whether the Value Added Tax number is syntactically valid according to per-country patterns";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:boolean.

tr:iso2 a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ISO alpha2";
  rdfs:comment "2-letter alphabetical ISO code of this country, used for linking to external datasets";
  rdfs:domain tr:Country;
  rdfs:range xsd:string.

tr:iso3 a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "ISO alpha3";
  rdfs:comment "3-letter alphabetical ISO code of this country, used for linking to external datasets";
  rdfs:domain tr:Country;
  rdfs:range xsd:string.

tr:link a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link";
  rdfs:comment "Link to page with information or direct download page (outside of portal)";
  rdfs:domain tr:DataItem.

tr:linkDescription a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link to description";
  rdfs:comment "Link to detailed Knowledge Base description on portal"; 
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:linkPortal a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "link to portal";
  rdfs:comment "Link to data serving page on portal";
  rdfs:domain tr:DataItem.

tr:location  a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "location";
  rdfs:comment "Location of an energy resource (Production Unit)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.location.name", "generatingUnit_Location.name".

tr:marketBalanceArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "market balance area";
  rdfs:comment "Market Balance Area of this balancing volume or price";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:Area.

tr:marketProduct a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "market product";
  rdfs:comment "Type of market product of this balancing volume or price (Standard, Specific, Local)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:mrid a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "message id";
  rdfs:comment "Unique message id (mRID), used in the URL";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:name a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "name";
  rdfs:comment "The name of something";
  schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.location.name". # TODO and more

tr:nameAlt a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "name alt";
  rdfs:comment "Alternative name of a code value, as present in CSV files";
  rdfs:domain tr:CodeValue;
  rdfs:range xsd:string.

tr:netOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "net output";
  rdfs:comment "Net power output (actualOutput minus actualConsumption) of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:forecastedOutput a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "forecasted output";
  rdfs:comment "Forecasted output of a Production Unit or Area (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:notation a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "notation";
  rdfs:comment """Code of something, eg A01 (a code value), EFET (European Federation of Energy Traders), CB-RO-OP (Control Block Romania Operator).
Single value, coming from EIC or code list master data""";
  schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "long_Names.name".

tr:notationAlt a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "notation alt";
  rdfs:comment """Alternative code for an Energy Resource.
Potentially multiple values, coming from messages (Configuration_MarketDocument)""";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string;
  tr:xpath "registeredResource.name".

tr:parentResource a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "parent resource";
  rdfs:comment """Parent of this Energy Resource, eg:
- Control Block   parentResource Coordination Center Zone
- Generation Unit parentResource Production Unit
""";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource.

tr:price a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price";
  rdfs:comment "Price reported in this data observation in 'currency' per MW/h (see also 'priceInEur')";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:priceCategory a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price category";
  rdfs:comment "Price category of this balancing price (Average or Marginal)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:priceInEur a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "price in EUR";
  rdfs:comment "Price reported in this data observation in EUR per MW/h (see also 'price')";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:providerParticipant a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "provider participant";
  rdfs:comment "Provider participant(s) of this Energy Resource";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource;
  tr:xpath "Provider_MarketParticipant.mRID".

tr:reason a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reason";
  rdfs:comment "Motivation of an act (in whole Message or individual TimeSeries) in coded form";
  schema:domainIncludes tr:Message, tr:TimeSeries;
  rdfs:range tr:CodeValue.

tr:reasonText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reason text";
  rdfs:comment "Motivation of an act as free text, when `reason` is A95 Complementary information";
  schema:domainIncludes tr:Message, tr:TimeSeries;
  rdfs:range xsd:string.

tr:regArticle a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "regulation article";
  rdfs:comment "Article in Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets that describes the data item";
  rdfs:seeAlso <https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0543>;
  rdfs:domain tr:DataItem;
  rdfs:range xsd:string.

tr:reserveType a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "reserve type";
  rdfs:comment "Type of reserve resource of this balancing volume or price (FCR, aFRR, mFRR, RR)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:responsibleParticipant a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "responsible participant";
  rdfs:comment "Participant that is responsible for this Energy Resource";
  rdfs:domain tr:EnergyResource;
  rdfs:range tr:EnergyResource;
  tr:xpath "eICResponsible_MarketParticipant.mRID".

tr:schedulingArea a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "scheduling area";
  rdfs:comment "Scheduling Area of this balancing volume or price";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:Area.

tr:statusText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Latest status of an Outage: 'Active, Withdrawn, Canceled'";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:timeZone a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "time zone";
  rdfs:domain tr:Outage;
  rdfs:comment "Time zone code of an Outage";
  rdfs:range xsd:string.

tr:typeText a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "available output";
  rdfs:comment "Type of an Outage: 'Planned, Forced'";
  rdfs:domain tr:Outage;
  rdfs:range xsd:string.

tr:vatNumber a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VAT number";
  rdfs:comment "Value Added Tax number of an energy participant";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:version a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "version";
  rdfs:comment "Version of the message. Only the latest version(s) of a MRID are retained. Used in the URL";
  rdfs:domain tr:Outage;
  rdfs:range xsd:integer.

tr:viesAddress a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES address";
  rdfs:comment "Party address as returned by EU VIES (only if present in VIES)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:viesCheckDate a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES check date";
  rdfs:comment "Datetime when EU VIES check was performed";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:dateTime.

tr:viesName a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "VIES name";
  rdfs:comment "Party name as returned by EU VIES (only if present in VIES)";
  rdfs:domain tr:EnergyResource;
  rdfs:range xsd:string.

tr:volume a owl:DatatypeProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "volume";
  rdfs:comment "Volume offered, accepted, activated or unavailable (MW)";
  rdfs:domain tr:DataObservation;
  rdfs:range xsd:float.

tr:volumeCategory a owl:ObjectProperty;
  rdfs:isDefinedBy tr: ;
  rdfs:label "volume category";
  rdfs:comment "Volume category of this balancing volume (offered, accepted, activated or unavailable)";
  rdfs:domain tr:DataObservation;
  rdfs:range tr:CodeValue.

tr:xpath a owl:DatatypeProperty;
  rdfs:label "xpath";
  rdfs:comment "xpath that carries XML data for an RDF property. TODO: also need namespace and enclosing elements?";
  schema:domainIncludes owl:ObjectProperty, owl:DatatypeProperty; # rdfs:Class ?
  rdfs:range xsd:string.