Developed by: |     Ontotext (Sirma AI) |
Based on data from: |     ENTSO-E Transparency Platform |
Powered by: |
This project has received funding from the European Union’s Horizon 2020 research and innovation programme
under grant agreement No 824330: INTERRFACE Open Call (cascade funding)
Version | Date | Changes Made |
---|---|---|
M4 | 2022-06-10 | Final Version |
M4 | 2022-04-08 | V1 of the TEKG
Refinement of validation rules |
M3.1 | 2022-03-23 | Started tracking revison histiory Review comment addressed in installedCapacity-Aggregated-vs-Per-Unit |
М3 | 2022-03-08 | M3 Deliverable corresponding to V1 of the TEKG |
The ENTSO-E
Transparency Platform provides information that is
crucial for the efficient and fair operation of the EU
energy market. It includes a large number of data items
(time series) that are strictly defined in
EUreg Transparency
and further elaborated in
MoP DDD
(see Project Glossary on where to
find these references).
Knowledge Graphs (KG) have numerous benefits for data integration across enterprises and disciplines. The Energy Identification Code (EIC) is a global identifier of energy resources (objects) and parties (domains/areas, market participants, exchanges, etc).
With this project we hope to make a step in the direction of Energy KGs by creating a Transparency Energy KG (TEKG) from ENTSOE Transparency data. We use GraphDB, the Ontotext Platform, and semantic data integration. We demonstrate the benefits of KG for:
This living document specifies the TEKG:
The demonstrator is availble at https://transparency.ontotext.com/
We have created and will maintain a comprehensive project glossary. Every special term and abbreviation that we encounter is added to the glossary.
It also includes a list of Sources:
MoP
) and its parts,
including DDD
Detailed Data Descriptionsdoc Free Reuse
: Data Available for Free
Re-Usedoc Functions
: List of allowed functions
for the EIC codesThe constituency of ENTSOE is broken up into a number of Domain/Area "meshes" according to different principles. See glossary#areas for a description of all kinds of Areas.
The following kinds of Areas are most important for Transparency because they are used in Data Items:
Bidding Zone, BZN
: largest geographical
area in which there is a uniform spot price, in which Market
Participants can exchange energy without Capacity
Allocation.Control Area, CA=CTA
: coherent part of the
interconnected system, operated by a single system operator
and shall include connected physical loads and/or generation
unitsMember State (Country), CTY
: EU member
state or a neighboring stateMarket Balance Area, MBA
: geographic area
in which there is a uniform balancing energy price. Consists
of one or more Metering Grid Areas with common market rules
for which the settlement responsible party carries out a
balance settlement and which has the same price for
imbalance. May also be defined due to bottlenecks.Scheduling Area, SCA
: same as Bidding Zone,
except if there is more than one Responsibility Area within
this Bidding Zone. In the latter case, the Scheduling Area
equals Responsibility Area or a group of Responsibility
Areas.Resources (Eg Production and Generation Units) of these Areas can be requested from the Transparency portal and are used as key request parameters in the REST API. For example:
CTY, CTA, BZN
MBA
(eg
for Cross-Border Balancing), SCA
(eg for
Procured Capacity):The following query finds 198 relevant Areas of the above kinds in the EIC file, and returns them with all functions:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
values ?fun {"Member State" "Control Area" "Bidding Zone" "Market Balance Area" "Scheduling Area"}
?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)
We get from EIC the 3 critical kinds
CTY, CTA, BZN
that are of interest to us (111
such Areas):
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?name ?co ?eic (group_concat(?fun; separator=", ") as ?funcs) {
values ?fun {"Member State" "Control Area" "Bidding Zone"}
?x tr:eic ?eic; tr:function ?fun; tr:notation ?name
optional {?x tr:countryCode ?co}
} group by ?eic ?name ?co order by coalesce(?co,?name)
Unfortunately there are discrepancies, see data/areas.tsv that has the following columns (with count shown):
name
: area name (121)co
: country code (49, 29 unique)eic
: EIC code (121)funcs
: which of the 3 functions BZN, CTA,
CTY are listed for the area (121)inEIC
: whether it's present in the EIC file
(111)inDoc
: whether it's present in the
documentation REST
API Guide#Areas (89)inAPI
: whether it's accepted by the REST
API request master_data
i.e. Installed Capacity Per Production Unit (87)inVIES
: whether VAT numbers of that country
can be validated in VIES. see External VAT
ValidationWe have the following combinations:
inEIC | inDoc | inAPI | count |
---|---|---|---|
0 | 1 | 1 | 10 |
1 | 0 | 0 | 31 |
1 | 0 | 1 | 1 |
1 | 1 | 0 | 3 |
1 | 1 | 1 | 76 |
not | eic | funcs | comment |
---|---|---|---|
DE | 10Y1001A1001A83F | Member State | Instead, use BZN (CZ-DE-SK, DE-AT-LU, DE-LU) or CTA (50hertz, Amprion, Tennet GER, TransnetBW) are used |
DK | 10Y1001A1001A65H | Member State | Instead, use BZN (DK-1, DK-2) is used |
UK | 10Y1001A1001A92E | Member State | Instead, use BZN (GB National Grid, IE(SEM)) or CTA (National Grid, NIE) are used |
not | eic | funcs | comment |
---|---|---|---|
GB-NI | 10Y1001A1001A016 | Control Area | NIE? |
areas.tsv
and to a manually
crafted turtle/eic-extra.ttlnotation | co | eic | funcs |
---|---|---|---|
IT-BRINDISI | IT | 10Y1001A1001A699 | Bidding Zone |
IT-FOGGIA | IT | 10Y1001A1001A72K | Bidding Zone |
IT-PRIOLO | IT | 10Y1001A1001A76C | Bidding Zone |
IT-ROSSANO | IT | 10Y1001A1001A77A | Bidding Zone |
BY | BY | 10Y1001A1001A51S | Control Area, Bidding Zone, Market Balance Area |
MD | MD | 10Y1001A1001A990 | Control Area, Bidding Zone, Market Balance Area |
RU | RU | 10Y1001A1001A49F | Control Area, Bidding Zone, Market Balance Area |
KALININGRAD | RU | 10Y1001A1001A50U | Control Area, Bidding Zone, Market Balance Area |
PL-CZ | 10YDOM-1001A082L | Control Area, Bidding Zone | |
CZ+DE+SK | 10YDOM-CZ-DE-SKK | Bidding Zone |
We find some interesting discrepancies of "Member State" areas:
tr:countryCode
): BE, CZ, DE, ES, FR, ICELAND,
IT, LU, NL, NO, SE, SK, UA, UKtr:name
are country code except
"ICELAND" which is a full nameIn other to join external power plant datasets, we need a list of ENTSOE countries with ISO2 and ISO3 codes.
The following query finds 36 countries that are members of ENTSOE. We use a Federated query to Wikidata:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?eic ?iso2 ?iso3 ?name ?wd_name where {
?x tr:function "Member State"; tr:eic ?eic; tr:notation ?n; tr:name ?name.
bind(if(?n="ICELAND","IS",?n) as ?iso2)
service <https://query.wikidata.org/sparql> {
?y wdt:P297 ?iso2; wdt:P298 ?iso3; rdfs:label ?wd_name
filter(lang(?wd_name)="en")
}
} order by ?iso2
iso2
codeiso2
eliminates the
3 extraneous LV "Member States"The ENTSOE Transparency portal includes about 80-135 data items (depending on how you count). The items cover 7 domains:
Data items are described in various documents:
EUreg Transparency
: Commission
Regulation (EU) No 543/2013ECreg Transparency
item
definitions and clause references, as well as more detailed
item descriptions, sometimes with illustrationsECreg Transparency
clause
referencesPowerSystemResourceName
, but the
respective CSV file also has
GenerationUnitEIC
doc Free Reuse
: Data Available for Free
Re-Use (2019-11).
MoP DDD
, most
often the TSO)We have reconciled the various descriptions of data items and integrated them in this Google Sheet .
From it we generate a semantic description in file data/turtle/small/kb.ttl
using the query in etl_scripts/dataItems.ru
which includes the following properties (examples given for
item
<data/load/ActualTotalLoad_6.1.A>
):
tr:name
: item name, eg "Actual Total
Load"tr:file
: base file name of XML (REST API)
or CSV (SFTP), eg "ACTUAL_TOTAL_LOAD" or
"ActualTotalLoad"tr:dataDomain
: parent data domain, eg
<data/load>
tr:linkDescription
: link to detailed
description (see "knowledge base" above), eg Total
Load - Day Ahead - Actualtr:linkPortal
: link to ENTSOE portal where
the item can be viewed/downloaded, eg totalLoadR2/showtr:linkDownload
: download link, applies
only to "static" files:
tr:link
: applies only to "external"
sourcestr:regArticle
: article of
ECreg Transparency
describing the item, eg:
6.1.A
for Actual Total Load12.3.A.d
for Explicit Allocations - Auction
Revenue (daily)12.3.A.i
for Explicit Allocations - Auction
Revenue (intraday)16.1.B
and 16.1.C
for
Aggregated Generation per Typetr:isFreeReuse
: whether the item is
available for free reusetr:ekgCheckDataQuality
: whether TEKG will
implement Data Validations over the itemtr:ekgImplementAnalytics
: whether TEKG will
implement Analytics over the itemThis is the full list of data items that will be
integrated. It includes items to be validated
(ekgCheckDataQuality
) and items to implement
analytics for (ekgImplementAnalytics
):
The following subsections provide detailed description and analysis of each item:
ActivatedBalancingEnergy
on
2022-03-01, a total of 12 CSV files need to be processed,
prefixed from 2022_03 to 2021_02AggregatedGenerationPerType
on
2022-03-01, 2 csv files need to be processed, prefixed
2022_02 and 2022_01DayAheadGenerationForecastForWindAndSolar
is also ingested 1 month in the pastException:
DayAheadGenerationForecastForWindAndSolar
for
CTA 10YAL-KESH-----5
has over 1 year of null
forecasts with 0.00
values. For this reason we
will limit future data for this data item to 1 month.
Temporal aggregation is required for producing analytics where the diagrams require a coarser level of aggregation than the raw data. This section specifies the temporal aspects of the time-series data.
Temporal aggregation is provided by creating synthetic
data items where the amounts are aggregated at the desired
temporal resolution. Eg the Balancing Energy
Timeline requires hourly or daily aggregates of the
Prices Of Activated Balancing Energy
and
Activated Balancing Energy
data items.
Depending on the source, these data items are reported on
different temporal resolutions from 15 min to 1h
(PT15M
, PT30M
and
PT60M
) These values are harmonised at:
PricesOfActivatedBalancingEnergy_HOURLY
PricesOfActivatedBalancingEnergy_DAILY
Similarly, Activated Balancing Energy
is
aggregated in ActivatedBalancingEnergy_HOURLY
and ActivatedBalancingEnergy_DAILY
Note: A similar procedure is used for spatial aggregation of individual capacities in a given area, see InstalledGenerationCapacityComputed
Summary operations differ according to the values being aggregated:
PricesOfActivatedBalancingEnergy
: the
amounts are averaged over the time periodActivatedBalancingEnergy
: the amounts are
summed over the time periodWe visualize semantic models (RDF mappings) using the
rdfpuml
tool from https://github.com/VladimirAlexiev/rdf2rml .
These are graph models that show:
(C)
=Codelist, (E)
=EIC file,
(P)
=Production and Generation Units)"..."
right after the class name
We obtained XML schemas from CIM_xsd_package.zip
(and a few others) and saved to folder xsd
rng
) and Relax NG Compact (rnc
)
because the latter format is much easier to understand than
XSD.The codelists describe the basic lookups used on the Transparency platform.
We obtained CodelistV80.zip and saved data/code-lists/urn-entsoe-eu-wgedi-codelists.xsd. The codelists are embedded in this XSD. We use only "Standard" TypeLists, eg:
xsd:simpleType name="StandardAssetTypeList">
<xsd:annotation>
<xsd:documentation>
<Uid>ET0031</Uid>
<Definition>The identification of the type of asset.</Definition>
<xsd:documentation>
</xsd:annotation>
</xsd:restriction base="xsd:NMTOKEN">
<xsd:enumeration value="A01">
<xsd:annotation>
<xsd:documentation>
<CodeDescription>
<Title>Tieline</Title>
<Definition>A high voltage line used for cross border energy interconnections.</Definition>
<CodeDescription>
</xsd:documentation>
</xsd:annotation>
</xsd:enumeration> </
We convert XML codelists to this simple RDF representation (alternatively, we could use SKOS):
@base <https://transparency.ontotext.com/resource/> .
<type/Asset> a tr:CodeList;
tr:name "Asset";
tr:notation "ET0031";
tr:description "The identification of the type of asset.".
<type/Asset/A01> a tr:CodeValue;
tr:codeList <type/Asset>;
tr:name "Tieline";
tr:notation "A01";
tr:description "A high voltage line used for cross border energy interconnections." .
A general model looks like data/model/codelist.ttl:
In order to match string values in CSV files to the
codelists, we add nameAlt
to some code values.
For example, the code value for "FCR" (a type of balancing
reserve) looks like data/turtle/small/codelists-extra.ttl:
To facilitate faceted search/display, we have added a
hierarchy to <type/Asset>
using the
tr:fuelTypeClassification
predicate. Тhe
different varieties of Hydro powered assets under a generic
Hydro asset typeare meterilized in data/model/codelist-eg.ttl.
We also add some matching info in order to match fuel type
from other databases to the ENTSOE codelist.
The EIC file provides basic information about Energy Resources.
(*)
below. We
populate a field eicType
, see Add eicTypeThe ENTSOE EIC file is available from several sources:
urn:iec62325.351:tc57wg16:451-n:eicdocument:1:0
),
2021-12-31, has grown by 3.3% in 7 monthscurl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/A_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/T_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/V_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/W_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/X_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Y_eiccodes.csv
curl -sO https://eepublicdownloads.entsoe.eu/eic-codes-csv/Z_eiccodes.csv
Counting the number of records:
grep -c "<EICCode_MarketDocument>" allocated-eic-codes.xml
perl -lne 'print $1 if m{<mRID>..(.).............</mRID>}' allocated-eic-codes.xml|sort|uniq -c
After transforming XML to RDF and loading to GraphDB we Add eicType
CSV total and breakdown per type (need to subtract 1 from each result to account for the header line)
wc -l *.csv
(*)
Counts for XML and CSV:
char | type | XML | CSV |
---|---|---|---|
"A" | "Substation" | 2447 | 2457 |
"T" | "Tieline/Transformer" | 9985 | 10104 |
"V" | "Location" | 516 | 522 |
"W" | "Resource Object" | 20116 | 20195 |
"X" | "Party" | 10115 | 10138 |
"Y" | "Area or Domain" | 1140 | 1143 |
"Z" | "Measurement point" | 1841 | 1842 |
TOTAL | 46160 | 46401 |
So the CSV has 241 records more than the XML.
The CSV has field EicStatus
and we guessed
that maybe the extra resources have status Passive. While
trying to get statistics for this field, we found that the
CSV is malformed: it is semicolon-separated but includes
fields with embedded semicolon and no quoting. For
example:
GASINDUR; S.L.
Enson tutkimustehdas; Imatra
csvtk summary -d ; -f EicCode:count -g EicStatus X_eiccodes.csv
[ERRO] record on line 2731: wrong number of fields
head -2731 X_eiccodes.csv |tail -1
18X0000000000KCL;INDUR;GASINDUR; S.L.;;;Active;47012;ES;ESB34041400;Trade Responsible Party;X
head -1051 Y_eiccodes.csv |tail -1
44Y-00000000246A;FI_EGTU00;Enson tutkimustehdas; Imatra;;44X-00000000100F;Active;;FI;;Metering Grid Area;Y
We guessed the opposite status is Passive
but found no resources with this word:
grep -c Passive *.csv
Judging from the count, the CSV is a superset of the XML. But we double-checked the particular EIC ids for the critical type "Area or Domain", and indeed CSV has 3 extra records (namely Cut Areas/Corridors):
cut -f 1 -d \; Y_eiccodes.csv | tail -n +2 | sort > eic-areas-csv.txt
perl -lne 'print $1 if m{<mRID>(..Y.............)</mRID>}' allocated-eic-codes.xml|sort>eic-areas-xml.txt
comm -3 eic-areas-csv.txt eic-areas-xml.csv
46Y000000000007M
46Y000000000008K
46Y000000000009I
grep "46Y000000000007M|46Y000000000008K|46Y000000000009I" Y_eiccodes.csv
46Y000000000007M;CUT_AREA_SE3A;Cut area SE3A;;;Active;;;;Bidding Zone;Y
46Y000000000009I;CUT_COR_SE3A-SE3;Cut corridor SE3A-SE3;;;Active;;;;Bidding Zone;Y
46Y000000000008K;CUT_AREA_SE3;Cut area SE3;;;Active;;;;Bidding Zone;Y
EIC XML has the following structure shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity:
EIC_MarketDocument =
element mRID {ID_String},
element revisionNumber {ESMPVersion_String},
element type {MessageKind_String},
element sender_MarketParticipant.mRID {PartyID_String}?,
element sender_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
element receiver_MarketParticipant.mRID {PartyID_String}?,
element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String}?,
element createdDateTime {ESMP_DateTime},
element EICCode_MarketDocument {EICCode_MarketDocument}*
EICCode_MarketDocument =
element mRID {EICCode_String}?,
element status {Action_Status}?,
element docStatus {Action_Status}?,
element attributeInstanceComponent.attribute {xsd:string}?,
element long_Names.name {Characters70_String},
element display_Names.name {Characters16_String},
element lastRequest_DateAndOrTime.date {xsd:date},
element deactivationRequested_DateAndOrTime.date {xsd:date}?,
element eICContact_MarketParticipant.name {Characters70_String}?,
element eICContact_MarketParticipant.phone1 {TelephoneNumber}?,
element eICContact_MarketParticipant.electronicAddress {ElectronicAddress}?,
element eICCode_MarketParticipant.streetAddress {StreetAddress}?,
element eICCode_MarketParticipant.aCERCode_Names.name {ACERCode_String}?,
element eICCode_MarketParticipant.vATCode_Names.name {VATCode_String}?,
element eICParent_MarketDocument.mRID {EICCode_String}?,
element eICResponsible_MarketParticipant.mRID {EICCode_String}?,
element description {Characters700_String}?,
element Function_Names {Function_Name}*
StreetAddress =
element streetDetail {StreetDetail}?,
element postalCode {Characters10_String}?,
element townDetail {TownDetail}?
StreetDetail =
element addressGeneral {Characters70_String}?,
element addressGeneral2 {Characters70_String}?,
element addressGeneral3 {Characters70_String}?
TownDetail =
element name {Characters35_String}?,
element country {Characters2_String}?
We examined actual XML instances and show below the fields that are filled and useful (not constant).
A field comparison between CSV, XML and the resulting RDF properties (which we hope are shorter and easier to understand):
CSV | XML | RDF | Note |
---|---|---|---|
EicCode | mRID | tr:eic | Also used in URL |
EicDisplayName | display_Names.name | tr:notation | |
EicLongName | long_Names.name | tr:name | |
description | tr:description | Often repeats the Functions | |
EicParent | ns:eICParent_MarketDocument.mRID | tr:parentResource | As EIC URL |
EicResponsibleParty | eICResponsible_MarketParticipant.mRID | tr:responsibleParticipant | As EIC URL |
EicStatus | docStatus/value | Always A05, so omitted | |
MarketParticipantPostalCode | Not in XML | ||
MarketParticipantIsoCountryCode | eICCode_MarketParticipant.streetAddress/townDetail/country | tr:countryCode | |
MarketParticipantVatCode | vATCode_Names.name | tr:vatNumber | |
aCERCode_Names.name | tr:acerCode | ||
EicTypeFunctionList | Function_Names/name | tr:function | |
type | tr:eicType | Generated from EIC 3rd char | |
lastRequest_DateAndOrTime.date | tr:dateUpdated |
So each file (XML vs CSV) has some extra fields compared to the other:
dateUpdated
, which can be quite
important in data update scenariosacerCode
, which can be important
for external data integration with ACERdescription
, which most often
repeats the Functions, with some informative exceptions, eg
PostalCode
, but we suspect that
many are nonsensical data, eg
For now, we use EIC XML, but later we might decide to replace or complement with EIC CSV. Unfortunately, both of these files are missing some Areas that are returned by the REST API.
The EIC file is mapped to RDF as follows (XML field names are shown in brackets).
All fields are extracted from XML, except
eicType
(see Add
eicType)
We use the Production and Generation Units REST API that returns XML data items having the following structure (shown as RelaxNG Compact (RNC), where simple fields are omitted for brevity). It consists of:
Configuration_MarketDocument
header
TimeSeries
describing Production
Units
MktPSRType
describing characteristics
of the Production UnitMktGeneratingUnit
describing Generation UnitsConfiguration_MarketDocument =
element mRID {ID_String},
element type {MessageKind_String},
element process.processType {ProcessKind_String},
element sender_MarketParticipant.mRID {PartyID_String},
element sender_MarketParticipant.marketRole.type {MarketRoleKind_String},
element receiver_MarketParticipant.mRID {PartyID_String},
element receiver_MarketParticipant.marketRole.type {MarketRoleKind_String},
element createdDateTime {ESMP_DateTime},
element TimeSeries {TimeSeries}*
TimeSeries =
element mRID {ID_String},
element businessType {BusinessKind_String},
element implementation_DateAndOrTime.date {xsd:date},
element biddingZone_Domain.mRID {AreaID_String}?,
element registeredResource.mRID {ResourceID_String},
element registeredResource.name {xsd:string},
element registeredResource.location.name {xsd:string},
element ControlArea_Domain {ControlArea_Domain}+,
element Provider_MarketParticipant {Provider_MarketParticipant}+,
element MktPSRType {MktPSRType}
MktPSRType =
element psrType {PsrType_String},
element production_PowerSystemResources.highVoltageLimit {ESMP_Voltage}?,
element nominalIP_PowerSystemResources.nominalP {ESMP_ActivePower}?,
element GeneratingUnit_PowerSystemResources {MktGeneratingUnit}*
MktGeneratingUnit =
element mRID {ResourceID_String},
element name {xsd:string},
element nominalP {ESMP_ActivePower},
element generatingUnit_PSRType.psrType {PsrType_String},
element generatingUnit_Location.name {xsd:string}
ESMP_ActivePower-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_ActivePower = ESMP_ActivePower-base, attribute unit {UnitSymbol}
ESMP_Voltage-base = xsd:float {pattern = "([0-9]+((\.[0-9])*))"}
ESMP_Voltage = ESMP_Voltage-base, attribute unit {UnitSymbol}
We map the Production and Generation Unit data item to RDF as follows:
Notes:
tr:ProductionUnit
and
tr:GenerationUnit
to the higer and lower level
resources, since we need them for Data Corrections
laterMAW
for output
(nominalP=installedOutput
,
actualOutput
, availableOutput
) and
KVA
for highVoltageLimit
The following diagram shows how the semantic data from the previous 3 sections comes together (EIC file, Codelist, Production and Generation Units).
It uses the example of Bulgaria's NPP Kozloduy power
plant and related entities (two generators; Bulgaria, the BG
TSO "ESO", the "NPP Kozloduy"
responsibleParticipant
, etc). We use color
coding to show which part of the data comes from which data
item.
The diagram is adapted from our proposal. In particular,
we added eicType
(see Add eicType).
There's no schema for the CSV files, but field names are pretty clear, and we can match them to MADES UML models.
We also do some field value investigations using the
csvtk
tool (see csvtk#177
for proposed enhancements); equivalent results can be
obtained easily with Python Pandas. For example:
# distribution of ResolutionCode
csvtk -t freq -f ResolutionCode -k 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
ResolutionCode frequency
PT15M 15144
PT30M 9456
PT60M 87606
# analyze correlation of ActualGenerationOutput and ActualConsumption
cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn
# see below
Investigations are based on 2021_01
files,
some obtained on 2022-01-05 and others on 2022-01-19
(therefore incomplete month data).
WARNINGS:
.csv
, the files are
tab-separated (TSV)od
) command shows
that the first 3 bytes of a CSV file are the BOM, followed
by the first column name and a tab.od -c -N 100 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv
0000000 357 273 277 D a t e T i m e \t
857 samples.
Field | Example | RDF | Comment |
---|---|---|---|
tr:dataItem | <data/generation/InstalledGenerationCapacityAggregated> |
||
DateTime | 2022-01-01 00:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format (" " -> "T") |
ResolutionCode | P1Y | tr:duration | always "P1Y"^^xsd:duration |
AreaCode | 10YIE-1001A00010 | tr:biddingZone,tr:controlArea,tr:country | depending on AreaTypeCode (BZN, CTA, CTY) |
AreaTypeCode | CTA | Values
BZN, CTA, CTY used to map corresponding relations | | AreaName | IE | | | | MapCode | CTA IE | | | | ProductionType | Geothermal | tr:assetType | match to tr:nameand tr:nameAltof |
|
AggregatedInstalledCapacity | 17.00 | tr:installedOutput | |
DeletedFlag | 0 | checked csv for 2021 - always 0 | |
UpdateTime | 2021-07-27 20:56:08 |
Example of values of ProductionType
with no
match in the code lists. - Hydro Pumped Storage - Hydro
Run-of-river and poundage - Hydro Water Reservoir We have
created tr:altNames in the corresponding code lists. see codeliests-extra.ttl
See InstalledGenerationCapacityAggregated.ttl
RDF URL and fixed data (where the space in
(DateTime)
is replaced with
T
):
<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
a tr:DataObservation;
tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;
This is a "synthetic" data item that holds computed totals.
We compute aggregate tr:ProductionUnit
capacities (tr:installedOutput
) from
generation/ProductionAndGenerationUnits
in
order for rule installedCapacity-Aggregated-vs-Per-Unit
to compare it to
generation/InstalledGenerationCapacityAggregated
(which reports aggregated volumes per area and asset
type).
controlArea, biddingZone
).<dataObs/generation/InstalledGenerationCapacityAggregated/(AreaTypeCode)/(AreaCode)/(DateTime)>
a tr:DataObservation;
tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;
Model: see InstalledGenerationCapacityComputed.ttl
Example | RDF | Comment |
---|---|---|
tr:dataItem | <data/generation/InstalledGenerationCapacityComputed> |
|
2022-01-01T00:00:00 | tr:date | now() as datatype
xsd:dateTime |
PT1H | tr:duration | Validity duration as datatype
xsd:duration |
<eic/10YIE-1001A00010> | tr:biddingZone,tr:controlArea | From the individual units |
tr:assetType | tr:assetType of the individual units | |
100.0 | tr:installedOutput | Computed as a sum from the individual units |
130.00 | tr:installedOutputHigh | +30% of the value in tr:installedOutput |
The computation is done by
InstalledGenerationCapacityAggregated.ru
112207 samples.
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-01 11:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT60M | tr:duration | Convert to datatype xsd:duration . Values
PT15M PT30M PT60M |
AreaCode | 10YGR-HTSO-----Y | tr:controlArea | Must match the controlArea of the
Generation Unit: ActualGenerationOutputPerGenerationUnit-controlArea-conform |
AreaTypeCode | CTA | Always "CTA" (control area) | |
AreaName | GR CTA | Matches notation of AreaCode, plus
AreaTypeCode |
|
MapCode | GR | Matches notation of AreaCode, checked 4.
Some variations: this file vs EIC, eg: "DE(TransnetBW)" vs
"DE-TRANSNETBW", "DE(TenneT DE)" vs "DE-TENNET_DE" |
|
GenerationUnitEIC | 29WGU-YISPAOOU-5 | tr:generationUnit | |
PowerSystemResourceName | P_AOOU | Matches notationAlt of GenerationUnitEIC,
checked 3. |
|
ProductionType | Hydro Water Reservoir | Matches assetType of GenerationUnitEIC,
checked 4. |
|
ActualGenerationOutput | 0.00 | tr:actualOutput | Convert to datatype xsd:float . 51%
0.00 , 4.4% missing (*). Must be <=
installedOutput : ActualGenerationOutputPerGenerationUnit-actualOutput-LTE-installedOutput |
ActualConsumption | tr:actualConsumption | Convert to datatype xsd:float . 14.8%
0.00 , 80% missing (that's the normal case)
(*) |
|
tr:netOutput | Compute as difference
ActualGeneration-ActualConsumption, treat missing as zero,
convert to xsd:float (*) |
||
InstalledGenCapacity | 210.00 | tr:installedOutput | Convert to datatype xsd:float . Must match
the declared installedOutput of the Generation
Unit: ActualGenerationOutputPerGenerationUnit-installedOutput-conform |
UpdateTime | 2022-01-02 10:30:54 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
RDF URL and fixed data (where the space in
(DateTime)
is replaced with
T
):
<dataObs/generation/ActualGenerationOutputPerGenerationUnit/(GenerationUnitEIC)/(DateTime)>
a tr:DataObservation;
tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;
(*) ActualConsumption is energy consumed by the generator for technological purposes. We analyze the correlation of ActualGenerationOutput and ActualConsumption:
cut -f10,11 2022_01_ActualGenerationOutputPerGenerationUnit_16.1.A.csv|perl -pe 's{\b0\.00}{zero}g; s{[\d.]+}{NUM}g'| sort|uniq -c|sort -rn
cnt | ActualGenerationOutput | ActualConsumption |
---|---|---|
46183 | zero | |
44082 | NUM | |
10138 | zero | zero |
5296 | NUM | zero |
3974 | NUM | |
1267 | zero | NUM |
1227 | zero | |
39 | NUM | NUM |
There is a difference between missing and zero:
actualConsumption
is legitimate
since there are generators that don't consume anythingactualOutput
is provided in each rownetOutput
as
the difference, we treat "missing" the same as "zero"It is possible to have ActualConsumption without
ActualGeneration (thus negative netOutput
),
eg:
18WMUE4B-12345-D
"MUELA 4B" IBERDROLA
GENERACION S.A.U. plant (Hydro Pumped Storage) was consuming
209.10 MW on 2022-01-01 at 03:00 while pumping water upward
into its reservoir62W373474960449Q
"SEVTECCHPP-V"
Severodonetsk Combined Heat and Power Plant (Fossil Gas) was
consuming 2.54 MW on 2022-01-03 at 17:00 while outputting no
electricityThe semantic mapping of this CSV is shown below.
Note: the ActualGenerationOutputPerGenerationUnit conversion should produce only the large node. The figure shows RDF type & EIC code in other nodes just to see the colored circles, but these should not be generated by this conversion.
Field | Sample | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-01 09:15:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT15M | tr:duration | Convert to datatype xsd:duration . Values
PT15M PT30M PT60M |
AreaCode | 10YNL----------L | tr:biddingZone tr:controlArea tr:country |
|
AreaTypeCode | CTA | Use this field to determine property for AreaCode | |
AreaName | NL CTA | ||
MapCode | NL | ||
ProductionType | Solar | tr:assetType | match to tr:name and
tr:nameAlt of <type/Asset>
code list |
ActualGenerationOutput | 10.94 | tr:actualOutput | Convert to datatype xsd:float . |
ActualConsumption | 0.00 | tr:actualConsumption | Convert to datatype xsd:float . |
UpdateTime | 2022-01-29 11:18:30 | ||
Net Output | tr:netOutput | Difference between output and consumption. Performed at conversion |
<dataObs/generation/AggregatedGenerationPerType/(AreaTypeCode)/(AreaCode)/(ProductionType)/(DateTime)>
a tr:DataObservation;
tr:dataItem <data/generation/AggregatedGenerationPerType>;
Month 2022_02, 150949 records
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-02-05 06:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT60M | tr:duration | |
AreaCode | 10YLT-1001A0008Q | tr:biddingZone tr:controlArea tr:country |
|
AreaTypeCode | BZN | Use this field to determine property for AreaCode | |
AreaName | LT BZN | ||
MapCode | LT | ||
ProductionType | Wind Onshore | tr:assetType | <type/Asset/> Match label |
AggregatedGenerationForecast | 351.99 | tr:forecastedOutput | |
UpdateTime | 2022-02-05 09:20:49 | tr:dateUpdated |
ProductionType | Frequency |
---|---|
Wind Offshore | 10752 |
Wind Onshore | 72018 |
Solar | 68178 |
AreaTypeCode | Frequency |
---|---|
CTY | 35468 |
BZN | 53464 |
CTA | 62016 |
<dataObs/generation/CurrentGenerationForecastForWindAndSolar/(AreaTypeCode)/(AreaCode)/match(ProductionType)/(DateTime)>
a tr:DataObservation;
tr:dataItem <data/generation/CurrentGenerationForecastForWindAndSolar>;
Month 2022_01, 109263 records.
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-02 23:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT15M | tr:duration | Convert to datatype xsd:duration . Values
PT15M PT30M PT60M |
AreaCode | 10YCH-SWISSGRIDZ | tr:marketBalanceArea | In namespace <eic/> |
AreaTypeCode | MBA | Always "MBA" | |
AreaName | CH MBA | ||
MapCode | CH | ||
ReserveType | Frequency Containment Reserve (FCR) | tr:reserveType | <type/Business/> : match A95 FCR, A96
aFRR, A97 mFRR, A98 RR |
DeletedFlag | 0 | Always 0 | |
UpdateTime | 2022-01-02 09:45:51 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
This and the other Balancing items (next 3 items) include a number of related (denormalized) Volume/Price fields that we normalize using the following extra fields (dimensions) and their respective code values (in parentheses is the word as it appears in the field name).
tr:direction
:
<type/Direction/>
: A01 "UP", A02 "DOWN",
A03 "UP and DOWN" (Symmetric)tr:volumeCategory
:
<type/Business/>
: A31 "Offered Capacity"
(Offered), B95 "Procured capacity" (Accepted), A45 "Schedule
activated reserves" (Activated)tr:assetType
:
<type/Asset/>
: A04 "Generation", A05
"Load", B20 "Other unspecified" (NotSpecified)Each of the numeric fields are emitted as
tr:volume
with datatype xsd:float
and the following dimension values:
Field | tr:direction | tr:volumeCategory | tr:assetType |
---|---|---|---|
LoadUpAcceptedVolume | A01 "UP" | B95 "Accepted" | A05 "Load" |
LoadDownAcceptedVolume | A02 "DOWN" | B95 "Accepted" | A05 "Load" |
LoadUpOfferedVolume | A01 "UP" | A31 "Offered" | A05 "Load" |
LoadDownOfferedVolume | A02 "DOWN" | A31 "Offered" | A05 "Load" |
LoadAcceptedVolumeSymmetric | A03 "UP and DOWN" | B95 "Accepted" | A05 "Load" |
LoadOfferedVolumeSymmetric | A03 "UP and DOWN" | A31 "Offered" | A05 "Load" |
GenerationUpAcceptedVolume | A01 "UP" | B95 "Accepted" | A04 "Generation" |
GenerationDownAcceptedVolume | A02 "DOWN" | B95 "Accepted" | A04 "Generation" |
GenerationUpOfferedVolume | A01 "UP" | A31 "Offered" | A04 "Generation" |
GenerationDownOfferedVolume | A02 "DOWN" | A31 "Offered" | A04 "Generation" |
GenerationAcceptedVolumeSymmetric | A03 "UP and DOWN" | B95 "Accepted" | A04 "Generation" |
GenerationOfferedVolumeSymmetric | A03 "UP and DOWN" | A31 "Offered" | A04 "Generation" |
NotSpecifiedUpAcceptedVolume | A01 "UP" | B95 "Accepted" | B20 "Other unspecified" |
NotSpecifiedDownAcceptedVolume | A02 "DOWN" | B95 "Accepted" | B20 "Other unspecified" |
NotSpecifiedUpOfferedVolume | A01 "UP" | A31 "Offered" | B20 "Other unspecified" |
NotSpecifiedDownOfferedVolume | A02 "DOWN" | A31 "Offered" | B20 "Other unspecified" |
NotSpecifiedAcceptedVolumeSymmetric | A03 "UP and DOWN" | B95 "Accepted" | B20 "Other unspecified" |
NotSpecifiedOfferedVolumeSymmetric | A03 "UP and DOWN" | A31 "Offered" | B20 "Other unspecified" |
The semantic mapping of this CSV is shown below.
<data/balancing/AggregatedVolumes>
.
tr:unit "MW"
to this data itemRDF URL and fixed data:
<dataObs/balancing/AggregatedVolumes/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(volumeCategory)/(assetType)>
a tr:DataObservation;
tr:dataItem <data/balancing/AggregatedVolumes>;
DataObservation
)ANY
for the missing/sum/total)(DateTime)
is replaced with
T
etl_scripts/tarql/match.h.rq
for such matching
implemented with a VALUES clause.See data/model/AcceptedAggregatedOffers.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:
Month 2022_01, 106828 samples. This table has the same common fields, which are mapped in exactly the same way as the previous section (AcceptedAggregatedOffers_17.1.D):
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-01 00:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT60M | tr:duration | Convert to datatype xsd:duration . Values
PT15M PT30M PT60M |
AreaCode | 10YCS-CG-TSO---S | tr:marketBalanceArea | In namespace <eic/> |
AreaTypeCode | MBA | Always "MBA" | |
AreaName | ME MBA | ||
MapCode | ME | ||
ReserveType | Automatic Frequency Restoration Reserve (aFRR) | tr:reserveType | <type/Business/> : match A95 FCR, A96
aFRR, A97 mFRR, A98 RR |
UpdateTime | 2021-12-30 14:31:00 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
Instead of Offered/Accepted, it has Activated amounts. They are mapped in exactly the same way:
The RDF mapping is exactly the same as in the previous section. We use the same kind of URLs, and the same data item.
See data/model/ActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:
Month 2022_01, 294943 samples.
This is very similar to the previous two sections, except:
volumeCategory
"Unavailable"Direction
is a separate field, rather than
being encoded in the Volume
field namesmarketProduct
but no
assetType
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-02 12:45:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT15M | tr:duration | Convert to datatype xsd:duration . Values
PT15M PT30M PT60M |
AreaCode | 10Y1001A1001A71M | tr:schedulingArea | In namespace <eic/> |
AreaTypeCode | SCA | always "SCA" | |
AreaName | IT-Centre-South SCA | ||
MapCode | IT-CSOUTH | ||
ReserveType | Replacement reserve (RR) | tr:reserveType | <type/Business/> : match A95 FCR, A96
aFRR, A97 mFRR, A98 RR (*) |
TypeOfProduct | Standard | tr:marketProduct | <type/MarketProduct/> : match A01
Standard, A02 Specific, A04 Local |
Direction | Up | tr:direction | <type/Direction/> : A01 "UP", A02
"DOWN" |
UpdateTime | 2022-01-02 12:31:10 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
(*) WARNING: the values in this data item are spelled in Lowercase (all other tables are in Capital Case):
csvtk -t freq -f ReserveType2022_01_AggregatedBalancingEnergyBids_12.3.E.csvv
Replacement reserve (RR) 126822
Manual frequency restoration reserve (mFRR) 75114
Automatic frequency restoration reserve (aFRR) 93006
So we use the macro
match_reserveType_lcase()
for this item, and
match_reserveType()
for all others.
Map the following fields to tr:volume
with
datatype xsd:float
, and the following dimension
values:
Field | tr:volumeCategory |
---|---|
OfferedBidVolume | A31 (Offered) |
ActivatedBidVolume | A45 (Activated) |
UnavailableBidVolume | Z99 (Unavailable) |
We use the same RDF model as before. Again, we use the same URLs and data item.
See data/model/AggregatedBalancingEnergyBids.ttl.
Volume
field is missing, do
not emit any triples about itMonth 2022_01, 158455 samples.
Field | Example | RDF | Comment |
---|---|---|---|
DateTime | 2022-01-14 02:00:00.000 | tr:date | Convert to datatype xsd:dateTime and valid
format |
ResolutionCode | PT30M | tr:duration | Convert to datatype xsd:duration |
AreaCode | 10YFR-RTE------C | tr:schedulingArea or tr:marketBalanceArea | Depending on AreaTypeCode |
AreaTypeCode | SCA | SCA or MBA. Use to select the specific relation | |
AreaName | FR SCA | ||
MapCode | FR | ||
RegisterItemTypeName | Automatic Frequency Restoration Reserve (aFRR) | tr:reserveType | <type/Business/> : match A95 FCR, A96
aFRR, A97 mFRR, A98 RR |
TypeOfProduct | A01 | tr:marketProduct | <type/MarketProduct/> : straight A01
Standard, A02 Specific, A04 Local |
PriceType | AVERAGE | tr:priceCategory | <type/PriceCategory/> : match A06
"Average bid price" (AVERAGE), A07 "Single marginal bid
price" (MARGINAL) |
Currency | EUR | tr:currency | Values: EUR (10x more popular than all the rest), BAM, CZK, HUF, PLN, RON, UAH |
UpdateTime | 2022-01-14 03:46:00 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
Emit all these fields as tr:price
with
datatype xsd:float
and the following dimension
values:
Field | tr:direction | tr:assetType |
---|---|---|
LoadUpPrice | A01 "UP" | A05 "Load" |
LoadDownPrice | A02 "DOWN" | A05 "Load" |
GenerationUpPrice | A01 "UP" | A04 "Generation" |
GenerationDownPrice | A02 "DOWN" | A04 "Generation" |
NotSpecifiedUpPrice | A01 "UP" | B20 "Other unspecified" |
NotSpecifiedDownPrice | A02 "DOWN" | B20 "Other unspecified" |
We determine the minimal set of independent fields with experiments like this:
# UNIQUE:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
# Remove AreaTypeCode: DUPS:
csvtk cut -t -f DateTime,AreaCode,RegisterItemTypeName 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-22 18:30:00.000.*10YFR-RTE------C.*Replacement Reserve (RR)" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-22 18:30:00.000 PT15M 10YFR-RTE------C SCA FR SCA FR Replacement Reserve (RR) 247.00 247.00 A01 AVERAGE EUR 2022-01-22 18:31:13
2022-01-22 18:30:00.000 PT30M 10YFR-RTE------C MBA FR MBA FR Replacement Reserve (RR) 245.27 245.27 AVERAGE EUR 2022-01-22 20:31:11
# The same area "FR" is reported as SCA and as MBA
# Remove RegisterItemTypeName: DUPS:
csvtk cut -t -f DateTime,AreaTypeCode,AreaCode 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv|sort|uniq -d
grep "2022-01-05 00:15:00.000.*10Y1001A1001A82H.*MBA" 2022_01_PricesOfActivatedBalancingEnergy_17.1.F.csv
2022-01-05 00:15:00.000 PT15M 10Y1001A1001A82H MBA DE-LU MBA DE_LU Manual Frequency Restoration Reserve (mFRR) 0.00 0.00 AVERAGE EUR 2022-01-04 00:30:55
2022-01-05 00:15:00.000 PT15M 10Y1001A1001A82H MBA DE-LU MBA DE_LU Automatic Frequency Restoration Reserve (aFRR) 224.91 47.71 AVERAGE EUR 2022-01-05 02:00:56
The minimal set is
AreaTypeCode,AreaCode,DateTime,RegisterItemTypeName
to which we must add the dimensions
direction,assetType
We add a computed field tr:priceInEUR
, based
on the current conversion rate of Currency
to
EUR
RDF URL and fixed data:
<dataObs/balancing/PricesOfActivatedBalancingEnergy/(AreaTypeCode)/(AreaCode)/(DateTime)/(reserveType)/(direction)/(assetType)>
a tr:DataObservation;
tr:dataItem <data/balancing/PricesOfActivatedBalancingEnergy>;
See data/model/PricesOfActivatedBalancingEnergy.ttl. The diagram is not very elucidating since all these records are correlated by their values, not by links:
4366 samples.
MRID
.tr:version
)
of each unavailability. We've shown several examples to
illustrate these versions.
Status
is
changedField | Example1 | Example2 | RDF | Comment |
---|---|---|---|---|
StartTS | 2022-01-28 19:00:00.000 | 2022-01-28 19:00:00.000 | Ignored (*) | |
EndTS | 2022-01-31 07:00:00.000 | 2022-01-31 07:00:00.000 | Ignored (*) | |
TimeZone | WET | WET | tr:timeZone | String: "WET, CET, EET" |
MRID | zzGVOR7oEd5SOJnhsAiapw | zzGVOR7oEd5SOJnhsAiapw | tr:ident | Also use in URL. Separate field to allow matching subsidiary table |
Type | Planned | Planned | tr:typeText | String: "Planned, Forced" |
Status | Active | Cancelled | tr:statusText | String: "Active, Withdrawn, Canceled" |
AreaCode | 10YGB----------A | 10YGB----------A | tr:controlArea or tr:biddingZone | Depending on AreaTypeCode. Must match declared zone/area of the energy resource: Outage-GenerationUnit-area-conform |
AreaTypeCode | CTA | CTA | "CTA, BZN" (**). Reflected in the selection of the previous link | |
AreaName | UK(National Grid) CTA | UK(National Grid) CTA | Matches name of AreaCode |
|
MapCode | GB | GB | Matches notation of AreaCode |
|
PowerResourceEIC | 48W000000DIDCB5C | 48W000000DIDCB5C | tr:energyResource | Must exist in Production and Generation Units: Outage-ProductionUnit-exists |
UnitName | DIDCB5 | DIDCB5 | Matches notation of PowerResourceEIC |
|
ProductionType | Fossil Gas | Fossil Gas | Matches assetType of PowerResourceEIC |
|
InstalledCapacity | 780.00 | 780.00 | tr:installedOutput | Convert to datatype xsd:float . Must match
the declared installedCapacity of the resource:
Outage-GenerationUnit-installedCapacity-conform |
AvailableCapacity | 370.00 | 370.00 | tr:availableOutput | Convert to datatype xsd:float . Must be less
than installedCapacity : Outage-GenerationUnit-LT-installedCapacity |
Version | 1 | 2 | tr:version | Retain only the latest version. See next section |
Reason | Foreseen Maintenance | Foreseen Maintenance | Ignored (*) | |
UpdateTime | 2018-10-02 14:29:59 | 2018-10-02 17:26:11 |
4996 samples.
Field | Example1 | Example2 | RDF | Comment |
---|---|---|---|---|
StartTS | 2022-01-28 19:00:00.000 | 2022-01-28 19:00:00.000 | tr:dateStart | Convert to datatype xsd:dateTime and valid
format |
EndTS | 2022-01-31 07:00:00.000 | 2022-01-31 07:00:00.000 | tr:dateEnd | Convert to datatype xsd:dateTime and valid
format |
MRID | zzGVOR7oEd5SOJnhsAiapw | zzGVOR7oEd5SOJnhsAiapw | tr:mrid | Use in URL. |
version | 2 | 2 | tr:version | Convert to datatype xsd:integer . Separate
field to allow picking latest version: retain only the
latest version |
ReasonCode | A95 | B19 | tr:reason | URL in <type/ReasonCode/> |
Reason | Complementary Information | Foreseen Maintenance | tr:reasonText | Matches the name of codelist value
"ReasonCode". Skip "Complementary Information" |
ReasonText | Outage | tr:reasonText | Could include long, even bilingual text, not very well formatted | |
UpdateTime | 2018-10-02 17:26:11 | 2018-10-02 17:26:11 | tr:dateUpdated | Convert to datatype xsd:dateTime and valid
format |
We use the same "synthetic" data item
UnavailabilityOfProductionOrGenerationUnits
for
both this, and UnavailabilityOfProductionUnits
(see next).
tr:energyResource
is the same in both cases,
and that resource should know whether it's a Production or
Generation Unit (which is a non-trivial question, given the
confusion between the two)RDF URL and fixed data:
<outage/UnavailabilityOfProductionOrGenerationUnits/(MRID)/(Version)>
a tr:Outage;
tr:dataItem <data/outages/UnavailabilityOfProductionOrGenerationUnits>
The RDF model is shown below, but please read subsequent sections regarding intricacies of the conversion process.
data/model/Unavailability.ttl:
Each unavailability is reported twice: for the
controlArea
("CTA") and the
biddingZone
("BZN") of the generator. An
example with a generator in Bulgaria's Maritsa Iztok 2
TPP:
Field | Example1 | Example2 |
---|---|---|
StartTS | 2022-01-03 16:57:00.000 | 2022-01-03 16:57:00.000 |
EndTS | 2022-01-03 18:30:00.000 | 2022-01-03 18:30:00.000 |
TimeZone | CET | CET |
MRID | 7jf8VaSweKQI27w73v8p8w | dcadb3Ls6XlBSYhhQxvItQ |
Status | Active | Active |
Type | Forced | Forced |
AreaCode | 10YCA-BULGARIA-R | 10YCA-BULGARIA-R |
AreaTypeCode | CTA | BZN |
AreaName | BG CTA | BG BZN |
MapCode | BG | BG |
PowerResourceEIC | 32W001100100045G | 32W001100100045G |
UnitName | TPP_MI2_G5 | TPP_MI2_G5 |
ProductionType | Fossil Brown coal/Lignite | Fossil Brown coal/Lignite |
InstalledCapacity | 230.00 | 230.00 |
AvailableCapacity | 0.00 | 0.00 |
Version | 1 | 1 |
Reason | Failure | Failure |
UpdateTime | 2022-01-04 09:15:58 | 2022-01-04 09:15:58 |
As you can see the two unavailabilities are precisely the
same; except MRID, AreaCode, AreaTypeCode
(and
MapCode, AreaName
derived from them) So each
unavailability is reported twice:
MRID
but same
Version, UpdateTime
10YCA-BULGARIA-R
Bulgaria: as "CTA" and as
"BZN"Optionally, merge the records (so we'll have one record
with two outgoing links: both controlArea
and
biddingZone
):
StartTS, EndTS, TimeZone, Status, Type, PowerResourceEIC, InstalledCapacity, AvailableCapacity, Version, Reason, UpdateTime
AreaTypeCode="CTA"
) and all its data except
AreaCode, AreaTypeCode
controlArea
or
biddingZone
link (computed from
AreaCode, AreaTypeCode
) against the URL of the
other recordThis is non-trivial but will help with displaying Outage data.
This table should be "joined" to the main table by "MRID"
(which can be accomplished by using consistent URLs when
RDFizing). Examining data for 2022_01
(taken on
2022-01-05):
0pXGWG97HoHWd2NzlbSmmw
(2 versions)
is missing in the main table5TmlidNqpxU_LYlWfJ5bMg
(9 versions)
is missing in the main table1F67oMiU54aDdqPoUMdJGg
has
only 1 version in the main table, but 4 in the subsidiary
table.Field | main | subsidiary1 | subsidiary2 | subsidiary3 | subsidiary4 |
---|---|---|---|---|---|
StartTS | 2022-01-05 00:00:00.000 | 2022-01-05 00:00:00.000 | 2022-01-05 07:00:00.000 | 2022-01-05 07:00:00.000 | 2022-01-05 06:00:00.000 |
EndTS | 2022-01-06 00:00:00.000 | 2022-01-06 00:00:00.000 | 2022-01-05 09:00:00.000 | 2022-01-05 09:00:00.000 | 2022-01-05 07:00:00.000 |
TimeZone | CET | ||||
MRID | 1F67oMiU54aDdqPoUMdJGg | 1F67oMiU54aDdqPoUMdJGg | 1F67oMiU54aDdqPoUMdJGg | 1F67oMiU54aDdqPoUMdJGg | 1F67oMiU54aDdqPoUMdJGg |
Type | Active | ||||
Status | Forced | ||||
AreaCode | 10YCZ-CEPS-----N | ||||
AreaTypeCode | CTA | ||||
AreaName | CZ CTA | ||||
MapCode | CZ | ||||
PowerResourceEIC | 27W-GU-EPVR-B1-L | ||||
UnitName | EPVR.B1 | ||||
ProductionType | Fossil Gas | ||||
InstalledCapacity | 200.00 | ||||
AvailableCapacity | 0.00 | ||||
Version | 1 | 1 | 2 | 3 | 4 |
ReasonCode | B18 | B18 | B18 | B18 | |
Reason | Failure | Failure | Failure | Failure | Failure |
ReasonText | |||||
UpdateTime | 2022-01-05 07:00:48 | 2022-01-05 07:00:48 | 2022-01-05 08:00:57 | 2022-01-05 08:00:59 | 2022-01-05 08:00:59 |
ReasonCode
that's a value within
codelist <type/ReasonCode/>
ReasonText
that can be a long free
textFor each MRID
of the main
(UnavailabilityOfGenerationUnits_15.1.A_B) and subsidiary
(UnavailabilityOfGenerationUnitsReasons_15.1.A_B) tables, we
want to retain only the latest Version
.
Version
(and UpdateTime
) is
correlated between the tablesThat's non-trivial since:
UpdateTime
or by Version
produces
the same result)Version
in main, version
in subsidiaryThis data item is mapped in exactly the same way as UnavailabilityOfGenerationUnits_15.1.A_B, and using the same synthetic data item URLs. The same special processing applies.
Field | Example | RDF | Comment |
---|---|---|---|
StartTS | 2022-01-01 00:00:00.000 | Use value from the Reasons subsidiary table | |
EndTS | 2023-01-01 00:00:00.000 | Use value from the Reasons subsidiary table | |
TimeZone | CET | tr:timeZone | |
MRID | ROgezRGFNz5CJzUSUkx2-Q | tr:ident | Also use in URL |
Status | Active | tr:typeText | |
Type | Planned | tr:statusText | |
AreaCode | 10YHU-MAVIR----U | tr:controlArea or tr:biddingZone | Depending on AreaTypeCode |
AreaTypeCode | BZN | Reflected in the previous link | |
AreaName | HU BZN | Matches name of AreaCode |
|
MapCode | HU | Matches notation of AreaCode |
|
PowerResourceEIC | 15WVERTES----PPX | tr:energyResource | |
UnitName | Oroszlányi Eromu | Matches notation or
notationAlt of PowerResourceEIC |
|
ProductionType | Fossil Brown coal/Lignite | Matches assetType of PowerResourceEIC |
|
Version | 1 | tr:version | Retain only the latest version |
VoltageConnectionLevel | 120.00 | Matches highVoltageLimit of
PowerResourceEIC |
|
InstalledCapacity | 220.00 | tr:installedOutput | |
AvailableCapacity | 0.00 | tr:availableOutput | |
Reason | Shutdown | Use value from the Reasons subsidiary table | |
UpdateTime | 2021-12-14 10:01:32 | Use value from the Reasons subsidiary table |
UnavailabilityOfProductionUnitsReasons_15.1.C_D fields:
Field | Example | RDF | Comment |
---|---|---|---|
StartTS | 2022-01-01 00:00:00.000 | tr:dateStart | Convert to xsd:dateTime and correct
format |
EndTS | 2023-01-01 00:00:00.000 | tr:dateEnd | Convert to xsd:dateTime and correct
format |
MRID | BGaTG2bh6VYl7K4w2RyHmw | tr:mrid | Use in URL |
version | 1 | tr:version | |
ReasonCode | B20 | tr:reason | URL in <type/ReasonCode/> |
Reason | Shutdown | tr:reasonText | Skip "Complementary Information" |
ReasonText | tr:reasonText | ||
UpdateTime | 2021-12-14 10:01:36 | tr:dateUpdated |
Validating Transparency data is the most important objective of the project. We'll elaborate up to 40 data validation and quality criteria over various data items.
Based on them we will provide:
Improving data quality will have positive long-term effects on the energy market. Furthermore, by having more accurate master data, it will provide a foundation for a better Energy KG in the future.
We describe validation rules in a strict way, allowing us
to then extract them from this document and serve as the
basis for implementation. Rules are expressed in a semantic
way using the SHACL ontology (W3C standard), which allows us
to use a number of existing validators. Each rule is
represented as sh:NodeShape
and has the
following fields:
sh:shapeGraph
)sh:name
): derived from the rule URL
by discarding dashes (eg "parentResource semiInverse
generationUnit")sh:order
): order of rule execution
sh:order
tr:appliesTo
): kind of area the
rule applies to, used for grouping (see next section)sh:group
): used as second level
of grouping (for categorization and better UI)sh:description
): detailed
description in the form of a "should" statementsh:message
): a template with
SPARQL variables, in case additional details should be
provided in the validation resultstr:dataItem
): data item(s)
being validated (converted to several URLs as per
kb.ttl
)tr:fields
): CSV or XML field(s)
being validated (using XPath notation for XML) (a single
string)sh:severity
):
sh:PropertyShape
nodes and blank
nodestr:sparqlUpdate
): the next
subsubsection after the rule: which Data Correction to applyTaking the rule parentResource-semiInverse-generationUnit
as example, here's an RDF model of representing rules. This
also shows the implementation (sh:property
triples and blank nodes).
See data/model/ValidationRule.ttl,
though this is emitted as .trig
in
graph <graph/shape/parentResource-semiInverse-generationUnit>
ENTSOE data is "indexed" by Area and/or Country Code (see sections Areas and Countries for details about these entities).
We'd like each validation result to point to the Area or Country related to it, in order to have a better summary of errors per Area/Country. Examples:
controlArea
and biddingZone
controlArea
and
biddingZone
countryCode
for
each resource. In particular, when validating Trader VAT
numbers, we can only link to country code.In order to deal with the variety of areas/countries and
with missing values, validation results will have a field
tr:displayArea
(always populated).
Each rule specifies tr:appliesTo
, which is
tr:biddingZone, tr:controlArea, tr:country, tr:countryCode
(there can be multiple values).
tr:biddingZone, tr:controlArea
Counts:
countries.csv
has 42 countries
with power resources in ENTSOE (plus "SEM" Ireland and
Northern Ireland, which is not really a country)There are many Trader countries outside of the ENTSOE jurisdiction.
We populate tr:displayArea
of validation
results as follows, dealing with both missing country codes
and the "long tail":
Node.countryCode
where
Node
is sh:focusNode
(the node
that caused the error)countryCode
is missing: "none"countryCode
is not found in
countries.csv
: "other"countryCode
The Areas that data is related to are
controlArea, biddingZone, country
(others
listed below are not yet being validated):
tr:controlArea
and one
tr:biddingZone
(though the data model permits
multiple bidding zones). We have checked this with the
following query:PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?CTA ?BZN (count(*) as ?c) {
{select (count(?cta) as ?CTA) (count(?bzn) as ?BZN) {
?x a tr:ProductionUnit
optional {?x tr:controlArea ?cta}
optional {?x tr:biddingZone ?bzn}
} group by ?x}
} group by ?CTA ?BZN
^tr:generationUnit
)tr:controlArea
and
tr:biddingZone
tr:marketBalanceArea
tr:schedulingArea
tr:controlArea
tr:controlArea
, tr:biddingZone
and
tr:country
tr:biddingZone, tr:controlArea, tr:country
.
tr:biddingZone, tr:controlArea
.tr:biddingZone, tr:controlArea
does not produce
duplicate numberstr:biddingZone, tr:controlArea
Counts:
Populating tr:displayArea
of
tr:ValidationResult
:
tr:sourceShape/tr:appliesTo
as
?areaProp
. There can be multiple valuessh:focusNode
as ?node
(the
node that caused the error)?node
is tr:GenerationUnit
,
get its parent Production Unit
(^tr:generationUnit
) because Generation Units
are not directly attached to areas?areaProp/tr:notation
of
?node
?areaProp
as
ValidationCount.appliesTo
The Summary Results are counts of validation results that enable
Summaries are represented as
tr:ValidationCount
and have the following
fields (another option would be to use the Data Quality
Vocabulary (DQV)):
sh:sourceShape
): validation rule
(resource, from which the full rule description can be
obtained, including Definition and Severity)tr:displayArea
): country/zone/area
(string). See section Rule
Applicabilitytr:count
): count of errors/warnings
(integer)tr:date
): when the counting was done
(full xsd:dateTime
). Please note that we retain
only one set of validation resultsAn RDF model of summary results is in data/model/ValidationCount.ttl and the following diagram:
Rules per Country Code | BG | DE | .. | RS | other | none | Total |
---|---|---|---|---|---|---|---|
EIC | |||||||
.. function not null (i) | 5 | 3 | 2 | 10 | |||
.. function spelling (i) | 3 | 3 | 1 | 7 | |||
.. function specific | |||||||
.. function compatible with EIC hard | |||||||
.. function compatible with EIC soft | |||||||
VAT | |||||||
.. VAT country prefix | 5 | 4 | 9 | ||||
.. VAT per country syntax | 8 | 8 | |||||
.. VAT country exists | 10 | 10 | |||||
.. VAT country conform | |||||||
.. VAT per country exists | |||||||
TOTAL | 8 | 6 | .. | 1 | 23 | 6 | 44 |
Rules per Control Area | BG | CA-DENMARK | DE-50HERTZ | DE-AMPRION-SCHED | .. | UA-IPS | none | Total |
---|---|---|---|---|---|---|---|---|
ProdUnits | ||||||||
.. ProductionUnit cannot be GenerationUnit | 4 | 4 | ||||||
.. parentResource semiInverse generatingUnit | 2 | 2 | ||||||
.. ProductionUnits and GenerationUnits in EIC | 5 | 100 | 5 | |||||
.. EIC ProductionUnits GenerationUnits single | ||||||||
.. EIC ProductionUnits GenerationUnits assetType | 5 | 3 | 8 | |||||
.. EIC ProductionUnits nominalP highVoltageLimit | ||||||||
.. EIC GenerationUnits nominalP | ||||||||
.. ProductionUnit highVoltageLimit not zero | 3 | 3 | ||||||
.. ProductionUnit nominalP not zero | ||||||||
.. only ProductionUnit or GenerationUnit | 12 | 12 | ||||||
.. no GenerationUnit at top level | ||||||||
.. ProductionUnit and GenerationUnit same responsibleParticipant | ||||||||
.. ProductionUnit and GenerationUnit same country | ||||||||
.. ProductionUnit Zone or Area same country | ||||||||
.. generatingUnit function ProductionUnit | ||||||||
.. generatingUnit function GenerationUnit | ||||||||
.. location informative | 23 | 23 | ||||||
.. ProductionUnit GenerationUnit capacity | ||||||||
Transactions | ||||||||
.. installedCapacity Aggregated vs Per Unit | ||||||||
.. actualOutput vs nominalP | 10 | |||||||
TOTAL | 18 | 6 | 8 | 121 | .. | 23 | 100 | 157 |
Notes:
(i)
indicates an icon: red for Violation,
orange for Warning
tr:displayArea
sorted
alphabetically, but "other" and "none" come lastIndividual results (exceptions) are represented as
sh:ValidationResult
and include the following
fields.
We'll use this example: consider the rightmost
parentResource
relation in this diagram, which
is wrong (should be inverse of
generationUnit
):
sh:sourceShape
): rule that was
violatedsh:focusNode
): node that caused the
violation (eg EIC of "NPP_KOZLODUY_G10")sh:value
): erroneous value (eg EIC
of "TPP_MI_2", the object of
parentResource
)generationUnit
)tr:displayArea
):
country/zone/area where the violation occurred (eg "BG").
Computed according to section Rule Applicability, can be
"none" or "other"tr:countryCode
): only for rules
that apply to Country, provides extra detail if
displayArea
is "other"sh:resultSeverity
): severity
level of the violation: Violation or Warning (copied from
the respective rule)sh:resultMessage
): additional
details, use only if the source shape has
sh:message
because the standard messages
generated by the SHACL engine are most often not useful.
<result>/sh:sourceShape/sh:sparql?/sh:message
then use <result>/sh:resultMessage
An RDF model of individual results is in data/model/ValidationResult.ttl and the following diagram:
For Node, Value (and CANCELED: Expected) we print:
tr:EnergyResource
: eic
, and
also notation, name
to ease comprehension32W001100100017L/2022-01-01T11:00:00.000
notation, name
of the linked
tr:EnergyResource
https://transparency.ontotext.com/graphdb/resource?uri=<node>
Rule: EIC-VAT: VAT country conform
[back]
EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.vATCode_Names.name, EIC_MarketDocument/EICCode_MarketDocument/eICCode_MarketParticipant.streetAddress/townDetail/country
Resource | Notation | Name | Value | Area |
---|---|---|---|---|
59XREALPETROL11F | REALPETROL | REAL PETROL HOLDING KFT | HU24189514 | IT |
22X20110811----W | BE_INEOS_CV_LVM | INEOS CHLORVINYLS LIMITED | GB768506886 | BE |
<< < 1 of 5 > >>
Notes: given a tr:ValidationCount
, shows all
individual results with that Rule
(sh:sourceShape
) and
tr:displayArea
. Header:
[back]
to return to the summary
resultsdisplayArea
Table:
eic
: skip no more "words" (eg https://transparency.ontotext.com/resource/eic/22W20200608A---8
-> 22W20200608A---8
)outage
: skip 1 more "word" (eg https://transparency.ontotext.com/resource/outage/UnavailabilityOfProductionOrGenerationUnits/KJUiHodFyfNlQTV9Ut5DJQ/57
-> KJUiHodFyfNlQTV9Ut5DJQ/57
)dataObs
: skip 2 more "words" (eg https://transparency.ontotext.com/resource/dataObs/generation/ActualGenerationOutputPerGenerationUnit/36W-TE-TUZLA4--0/2022-01-07T19:00:00.000
->
36W-TE-TUZLA4--0/2022-01-07T19:00:00.000
)sh:focusNode/tr:notation
(if any)sh:focusNode/tr:name
(if
any)sh:value
. If it's a URL,
display only the "suffix" and make it a hyperlink, same as
the first columntr:countryCode
(NOT
tr:displayArea
, which is displayed before the
table)
The columns depend on the kind of item being validated (EIC, ProductionAndGenerationUnits, Data Observations, Outages).
Rule: Arithmetics:
ActualGenerationOutputPerGenerationUnit actualOutput
LTE installedOutput [back]
ActualGenerationOutputPerGenerationUnit_16.1.A CSV
),
portal,
descriptionActualGenerationOutput, InstalledGenCapacity
Resource | Value | Area | Message |
---|---|---|---|
32W001100100017L/2022-01-01T11:00:00.000 | 1001 | Should be less than 1000 | |
32W001100100048A/2022-01-01T09:00:00.000 | 231 | Should be less than 230 |
<< < 1 of 5 > >>
The same notes apply as in the previous section, except data columns:
sh:focusNode
)
corresponds to a dataObs
and the hyperlink
shows only the "suffix" (last 2 URL components): EIC and
dateTimesh:focusNode/tr:energyResource
)sh:value
): to
illustrate, we've shown Actual Output that exceeds Installed
Capacity by 1 MWdisplayArea
.
appliesTo controlArea
, it
comes from the controlArea
linked to the node
(sh:focusNode/tr:controlArea/tr:notation
)sh:resultMessage
, but used only if
the shape has sh:message
We use some inference (SPARQL updates) to:
eicCode
)Further subsections define and implement data corrections as SPARQL Updates, and the sequence and interleaving of validation rules and corrections.
tr:sparqlUpdate
to it. We do not use SHACL Rules
(part of SHACL Advanced Fetaures) because these are limited
to only invalid nodes.ValidationResults
capture the original
wrong value in sh:value
, and corrections don't
overwrite this captured value, so it can be reported in the
DQA Dashboard.This section describes precisely all validation rules implemented by TEKG. The semantic definition and SHACL implementation of each rule is extracted from this section.
sh:targetClass tr:EnergyResource;
sh:property [
sh:path tr:function;
sh:minCount 1;
sh:not [sh:hasValue "Valid EIC Function needed"]].
SPARQL check:
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * where {
?s a tr:EnergyResource .
{
FILTER NOT EXISTS {
?s tr:function []
}
} UNION {
?s tr:function "Valid EIC Function needed"
}
}
According to the following correction table (data/turtle/small/function-valid.ttl):
functionInvalid | functionValid |
---|---|
balance group | Balance Group |
It-System | IT-system |
LNG terminal | LNG Terminal |
Generation | Generation Unit |
Production Plant | Production Unit |
Notes about the first 3 lines (case normalization):
Notes about the last 2 lines:
doc Functions
p4 lists both variantssh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this ?s2 {
$this a tr:EnergyResource; tr:function ?invalid.
?s2 a tr:FunctionValid; tr:functionInvalid ?invalid}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "Will be corrected to {?valid}";
sh:select """
select $this (tr:function as ?path) (?invalid as ?value) ?valid {
$this tr:function ?invalid.
[] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid}"""].
SPARQL check:
select ?this {
?this a tr:EnergyResource; tr:function ?invalid.
[] a tr:FunctionValid; tr:functionInvalid ?invalid}
Misspellings of functions (eg "Production Plant",
"Generator") are corrected to enable further checks. We use
an RDF mapping table that incorporates correct and
misspelled functions
, with rows like this:
[] a tr:FunctionValid; tr:functionInvalid "Production Plant"; tr:functionValid "Production Unit".
[] a tr:FunctionValid; tr:functionInvalid "Generation"; tr:functionValid "Generation Unit".
The spelling correction is done by this SPARQL update:
base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
delete {graph <graph/allocated-eic-codes> {?x tr:function ?invalid}}
insert {graph <graph/allocated-eic-codes> {?x tr:function ?valid}}
where {
?x a tr:EnergyResource; tr:function ?invalid.
[] a tr:FunctionValid; tr:functionInvalid ?invalid; tr:functionValid ?valid
}
This query finds 11 "Resource Objects" that have a more specific function:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?x tr:function "Resource Object", ?fun
filter(?fun != "Resource Object")
}
Examples:
30W-CEE-COGEA--T
: "Generation Unit",
"Resource Capacity Market Unit": elide "Resource
Object"45W000000000141O
: "Production Unit",
"Load": elide "Resource Object"sh:targetClass tr:EnergyResource;
sh:or (
[sh:path tr:function; sh:maxCount 1]
[sh:path tr:function; sh:not [sh:hasValue "Resource Object"]]).
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
{
SELECT ?this (COUNT(?fun) as ?cnt) {
?this a tr:EnergyResource;
tr:function ?fun .
} GROUP BY ?this
}
?this tr:function ?fun .
FILTER (?cnt > 1 && ?fun = "Resource Object")
}
Production and Generation Units data is supposed to have Production Units at the top level, and Generation Units at the bottom level. In practice, there are many "Production Units" mislabeled with function "Generation Unit" and vice versa.
This query counts all invalid situations:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select
(count(?prodNotProd) as ?prodNotProd1)
(count(?prodIsGen) as ?prodIsGen1)
(count(?genNotGen) as ?genNotGen1)
(count(?genIsProd) as ?genIsProd1)
{
{?prodNotProd a tr:ProductionUnit filter not exists{?x tr:function "Production Unit"}} union
{?prodIsGen a tr:ProductionUnit filter exists{?x tr:function "Generation Unit"}} union
{?genNotGen a tr:GenerationUnit filter not exists{?x tr:function "Generation Unit"}} union
{?genIsProd a tr:GenerationUnit filter exists{?x tr:function "Production Unit"}}
}
prodNotProd1 | prodIsGen1 | genNotGen1 | genIsProd1 |
---|---|---|---|
0 | 2499 | 0 | 3140 |
sh:targetClass tr:ProductionUnit;
sh:property [
sh:path tr:function;
sh:maxCount 1;
sh:hasValue "Production Unit"].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select distinct ?x ?cc {
{
SELECT (COUNT(?function) as ?count) ?x {
?x a tr:ProductionUnit ;
tr:biddingZone/tr:notation ?cc .
OPTIONAL {
?x tr:function ?function .
}
} GROUP BY ?x
}
filter (not exists {
?x tr:function "Production Unit"
} || ?count > 1)
} limit 1000
sh:targetClass tr:GenerationUnit;
sh:property [
sh:path tr:function;
sh:maxCount 1;
sh:hasValue "Generation Unit"].
SPARQL check
"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?x a tr:GenerationUnit
optional {?x tr:function ?fun filter (?fun !=""Generation Unit"")}
filter (not exists {?x tr:function ""Generation Unit""}
|| bound(?fun))
} limit 100"
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this ?s2 {
$this a tr:GenerationUnit ;
^tr:generationUnit ?s2 ;
tr:parentResource ?parent2 .
FILTER (?s2 != ?parent2)
?s2 a tr:ProductionUnit .
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select $this ?value {
$this a tr:GenerationUnit ;
^tr:generationUnit ?value ;
tr:parentResource ?parent2 .
FILTER (?value != ?parent2)
?value a tr:ProductionUnit .
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?x a tr:GenerationUnit ;
^tr:generationUnit ?parent ;
tr:parentResource ?parent2 .
FILTER (?parent != ?parent2)
?parent a tr:ProductionUnit .
}
There are 938 power units (Production or Generation Units) that are missing from the EIC file:
base <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
values ?type {tr:ProductionUnit tr:GenerationUnit}
?x a ?type
filter not exists {
{graph <graph/allocated-eic-codes> {?x tr:eic []}}
}
}
Eg 47W000000000318I
has
assetType, biddingZone, controlArea, providerParticipant, generatorUnit, highVoltageLimit, installedOutput, location, notationAlt
but not EIC data.
sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property [
sh:path tr:eic;
sh:minCount 1].
Notes:
tr:ProductionUnit, tr:GenerationUnit
are
disjoint
SPARQL check:
"base <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
values ?type {tr:ProductionUnit tr:GenerationUnit}
?x a ?type
filter not exists {
{graph <graph/allocated-eic-codes> {?x tr:eic []}} # exists in <graph/correction/prodUnit-add-basic-data-to-EIC>
}
}"
For Power Units missing from the EIC file, we add the following basic EIC fields:
rdf:type tr:EnergyResource
function
from the subclass
ProductionUnit
or GenerationUnit
.
Note: The Production and Generation Units conversion emits
one of these subclasses of tr:EnergyResource
:
tr:ProductionUnit
tr:GenerationUnit
eic
from the URL (and next section
calculates eicType
)notation
from notationAlt
countryCode
in
biddingZone
or controlArea
)base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
clear silent graph <graph/correction/prodUnit-add-basic-data-to-EIC>;
insert {graph <graph/correction/prodUnit-add-basic-data-to-EIC> {
?x a tr:EnergyResource;
tr:function ?func;
tr:eic ?eic;
tr:notation ?notation;
}} where {
values (?type ?func) {
(tr:ProductionUnit "Production Unit")
(tr:GenerationUnit "Generation Unit")
}
?x a ?type
filter not exists {?x tr:eic []}
bind((replace(str(?x),".*/","")) as ?eic)
optional {?x tr:notationAlt ?notation}
}
For example, the following Units are reported with different fields in Bidding Zone vs Control Area
18WEGREEN-1234-3
: different
installedOutput
47W000000000355C
: different
installedOutput
47W000000000356A
: different
installedOutput
18WEGREEN-1234-3
: different
dateImplemented
11W0-0000-0026-Y
: different
location
49W0000000000342
: different
notationAlt
Note:
highVoltageLimit, assetType, providerParticipant
are always consistent. We checked with a query like
this:
select * {
?x tr:highVoltageLimit ?y1,?y2
filter(str(?y1)<str(?y2))
}
sh:targetClass tr:ProductionUnit, tr:GenerationUnit;
sh:property <shape/property/100>, <shape/property/101>, <shape/property/102>, <shape/property/103>.
<shape/property/100> a sh:PropertyShape; sh:path tr:installedOutput; sh:maxCount 1.
<shape/property/101> a sh:PropertyShape; sh:path tr:dateImplemented; sh:maxCount 1.
<shape/property/102> a sh:PropertyShape; sh:path tr:location; sh:maxCount 1.
<shape/property/103> a sh:PropertyShape; sh:path tr:notationAlt; sh:maxCount 1.
SPARQL check
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
SELECT * WHERE {
{
select ?x (COUNT(?installed) as ?installedCount) (COUNT(?date) as ?dateCount) (COUNT (?loc) as ?locationCount) (COUNT(?not) as ?notationCount) {
?x a tr:ProductionUnit, tr:GenerationUnit ;
tr:installedOutput ?installed ;
tr:dateImplemented ?date ;
tr:location ?loc ;
tr:notationAlt ?not .
} GROUP BY ?x
}
FILTER(?installedCount > 1 || ?dateCount > 1 || ?locationCount > 1 || ?notationCount > 1)
}
sh:targetClass tr:EnergyResource;
sh:or (
[sh:not [sh:path tr:function; dash:hasValueIn ("Production Unit" "Generation Unit")]]
[ sh:path rdf:type; dash:hasValueIn (tr:ProductionUnit tr:GenerationUnit)]).
SPARQL check:
"PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX dash: <http://datashapes.org/dash#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * {
?x a tr:EnergyResource; tr:function ?fun.
filter(?fun in (""Production Unit"", ""Generation Unit""))
filter not exists {
?x a ?type
filter(?type in (tr:ProductionUnit, tr:GenerationUnit))
}
}"
This correction adds field eicType
based on
the third char of eic
.
<type/Eic/>
(where notation
is the char, name
is the type).<type/Eic/W>
is
"Resource Object"base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
clear silent graph <graph/correction/eicType>;
insert {graph <graph/correction/eicType> {
?x tr:eicType ?type
}} where {
?x tr:eic ?eic
bind(substr(?eic,3,1) as ?notation)
?type tr:codeList <type/Eic>; tr:notation ?notation
}
(There is no particular reason to run this right after the previous validation rule.)
According to the following table (data/turtle/small/eicType-valid.ttl:
function | eicTypeInvalid | eicTypeValid |
---|---|---|
System Operator | W Resource Object |
X Party |
Control Block | X Party |
Y Area or Domain |
Market Area | X Party |
Y Area or Domain |
For the implementation we use SPARQL-based Constraints.
sh:SPARQLTarget
)a sh:SPARQLConstraint
)
is ran for each offending node. This "double-query" approach
reduces execution time because the offenders are a small
subset of all tr:EnergyResource
(tr:function as ?path)
doesn't work in GDB (GDB-6713)sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:EnergyResource; tr:eicType ?type; tr:function ?func.
?s2 a tr:EicTypeValid; tr:eicTypeInvalid ?type; tr:function ?func.} """];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select $this (tr:function as ?path) (sample(?func) as ?value) {
$this a tr:EnergyResource; tr:eicType ?type; tr:function ?func.
[] a tr:EicTypeValid; tr:eicTypeInvalid ?type; tr:function ?func.
} group by $this ?path"""].
SPARQL check:
select distinct $this ?s2 {
$this a tr:EnergyResource; tr:eicType ?type; tr:function ?func.
?s2 a tr:EicTypeValid; tr:eicTypeInvalid ?type; tr:function ?func.}
According to turtle/small/eicType-function.ttl, which is RDFized from docs/eicType-function-allowed.tsv, which is extracted from "List of allowed functions for the EIC codes".
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
filter not exists {?s2 tr:functionValid ?func}} """];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select $this (tr:function as ?path) (sample(?func) as ?value) {
$this tr:eicType ?type; tr:function ?func
filter not exists {?type tr:functionValid ?func}
} group by $this ?path"""].
SPARQL check:
select distinct $this ?s2 {
$this a tr:EnergyResource; tr:eicType ?s2; tr:function ?func
filter not exists {?s2 tr:functionValid ?func}} """];
sh:targetClass tr:ProductionUnit;
sh:property <shape/property/104>, <shape/property/105>.
<shape/property/104> a sh:PropertyShape; sh:path tr:installedOutput; sh:minCount 1.
<shape/property/105> a sh:PropertyShape; sh:path tr:highVoltageLimit; sh:minCount 1.
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?x a tr:ProductionUnit
filter (not exists {?x tr:installedOutput []}
|| not exists {?x tr:highVoltageLimit []})
}
sh:targetClass tr:GenerationUnit;
sh:property [sh:path tr:installedOutput; sh:minCount 1].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?x a tr:GenerationUnit
filter not exists {?x tr:installedOutput []}
}
sh:targetClass tr:ProductionUnit;
sh:not [sh:path tr:highVoltageLimit; sh:hasValue "0"^^xsd:float].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
$this a tr:ProductionUnit ;
tr:highVoltageLimit ?hvl .
FILTER (?hvl = "0"^^xsd:float)
}
sh:targetSubjectsOf tr:installedOutput;
sh:not [sh:path tr:installedOutput; sh:hasValue "0"^^xsd:float].
Some examples:
SPARQL check:
BASE <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
GRAPH <graph/ProductionAndGenerationUnits> {
$this tr:installedOutput ?io .
FILTER (?io = "0"^^xsd:float)
}
}
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:ProductionUnit ;
tr:generationUnit/tr:responsibleParticipant ?genRP ;
tr:responsibleParticipant ?RP .
FILTER (?genRP != ?RP)
$this tr:generationUnit ?s2
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select distinct $this ?value {
$this a tr:ProductionUnit ;
tr:generationUnit/tr:responsibleParticipant ?value ;
tr:responsibleParticipant ?RP .
FILTER (?value != ?RP)
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
$this a tr:ProductionUnit ;
tr:generationUnit/tr:responsibleParticipant ?genRP ;
tr:responsibleParticipant ?RP .
FILTER (?genRP != ?RP)
}
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:ProductionUnit ;
tr:generationUnit ?s2 ;
tr:countryCode ?RP .
?s2 tr:countryCode ?genRP .
FILTER (?genRP != ?RP)
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select distinct $this ?value {
$this a tr:ProductionUnit ;
tr:generationUnit ?s2 ;
tr:countryCode ?RP .
?s2 tr:countryCode ?value .
FILTER (?value != ?RP)
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
$this a tr:ProductionUnit ;
tr:generationUnit/tr:countryCode ?genRP ;
tr:countryCode ?RP .
FILTER (?genRP != ?RP)
}
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:ProductionUnit ;
(tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
tr:countryCode ?RP .
FILTER (?genRP != ?RP)
$this (tr:biddingZone | tr:controlArea) ?s2 .
?s2 tr:countryCode ?genRP ;
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select distinct $this ?value {
$this a tr:ProductionUnit ;
(tr:biddingZone|tr:controlArea)/tr:countryCode ?value ;
tr:countryCode ?RP .
FILTER (?value != ?RP)
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
$this a tr:ProductionUnit ;
(tr:biddingZone|tr:controlArea)/tr:countryCode ?genRP ;
tr:countryCode ?RP .
FILTER (?genRP != ?RP)
}
Discovered:
cd data/turtle/prodUnit
perl -lne 'm{:location +"(.*)"} and do {$_=$1; s{^\d{2}[A-Z][A-Z0-9-]{13}$}{EIC}; s{^\d+$}{digits}; print}' *|sort|uniq -c|sort -rn|less
sh:targetSubjectsOf tr:location ;
sh:property [
sh:path tr:location;
sh:not [sh:pattern "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"]].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this {
$this tr:location ?loc .
FILTER (REGEX(?loc, "^([0-9]+|[0-9]{2}[A-Z][A-Z0-9-]{13}|intra_zonal|name|locName)$"))
}
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
$this a tr:ProductionUnit; tr:installedOutput ?value
{select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
filter(?value<?value2)
}
Implementation:
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this ?s2 {
$this a tr:ProductionUnit; tr:installedOutput ?value ; tr:generationUnit ?s2 .
{select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
filter(?value<?value2)}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:message "Should be greater than or equal to {?value2}";
sh:prefixes tr: ;
sh:select """
select $this (tr:installedOutput as ?path) ?value ?value2 {
$this a tr:ProductionUnit; tr:installedOutput ?value
{select $this (sum(?value1) as ?value2) {$this a tr:ProductionUnit; tr:generationUnit/tr:installedOutput ?value1} group by $this}
filter(?value<?value2)}"""].
Example: 6326035O
is invalid
(IE6326035O
would be valid)
sh:targetSubjectsOf tr:vatNumber;
sh:property [
sh:path tr:vatNumber;
sh:pattern "^[A-Z][A-Z]"].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct $this ?vat {
$this tr:vatNumber ?vat .
FILTER (!REGEX(?vat, "^[A-Z][A-Z]"))
}
This correction normalizes VAT codes: those starting with digit are prefixed with the country code, enabling VAT-per-country-syntax check and VAT-per-country-exists check (in VIES).
base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
delete {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?old}}
insert {graph <graph/allocated-eic-codes> {?x tr:vatNumber ?new}}
where {
values (?co ?co1 ?regex) {
("AL" "AL" "^[JKLM][0-9]" )
("AR" "AR" "^[0-9]" )
("AT" "AT" "^U[0-9]" )
("BA" "BA" "^[0-9]" )
("BE" "BE" "^[0-9]" )
("BG" "BG" "^[0-9]" )
("CH" "CHE" "^(CH)?[0-9]" )
("CY" "CY" "^[0-9]" )
("CZ" "CZ" "^[0-9]" )
("DE" "DE" "^[0-9]" )
("DK" "DK" "^[0-9]" )
("EE" "EE" "^[0-9]" )
("ES" "ES" "^[A-Z][0-9]" )
("FI" "FI" "^[0-9]" )
("FR" "FR" "^[0-9]" )
("GB" "GB" "^[0-9]" )
("GE" "GE" "^[0-9]" )
("GR" "EL" "^(GR|GREL)?[0-9]" )
("HR" "HR" "^[0-9]" )
("HU" "HU" "^[0-9]" )
("IE" "IE" "^[0-9]" )
("IT" "IT" "^[0-9]" )
("IS" "IS" "^[0-9]" )
("KY" "KY" "^[0-9]" )
("LI" "LI" "^[0-9]" )
("LT" "LT" "^[0-9]" )
("LU" "LU" "^[0-9]" )
("LV" "LV" "^[0-9]" )
("MD" "MD" "^[0-9]" )
("ME" "ME" "^[0-9]" )
("MK" "MK" "^[0-9]" )
("MT" "MT" "^[0-9]" )
("NL" "NL" "^[0-9]" )
("NO" "NO" "^[0-9]" )
("PL" "PL" "^[0-9]" )
("PT" "PT" "^[0-9]" )
("RO" "RO" "^[0-9]" )
("RS" "RS" "^[0-9]" )
("RU" "RU" "^[0-9]" )
("SE" "SE" "^[0-9]" )
("SG" "SG" "^[0-9]" )
("SI" "SI" "^[0-9]" )
("SK" "SK" "^[0-9]" )
("TR" "TR" "^[0-9]" )
("UA" "UA" "^[0-9]" )
("US" "US" "^[0-9]" )
("XK" "XK" "^[0-9]" )
}
?x tr:countryCode ?co; tr:vatNumber ?old.
filter(regex(?old,?regex))
bind(replace(?old,"^(CH|GR|GREL)","") as ?vat1)
bind(concat(?co1,?vat1) as ?new)
}
Examples:
IE8F52100V
is valid syntaxES20470001
is invalid syntax
(ESA20470001
is valid)sh:targetSubjectsOf tr:vatNumber;
sh:path tr:vatNumber ;
sh:or (
[sh:pattern "^ADU\\d{6}[A-Z]$" ]
[sh:pattern "^AL[JKLM]\\d{8}[A-Z]$" ]
[sh:pattern "^AR\\d{14}$" ]
[sh:pattern "^ATU\\d{8}$" ]
[sh:pattern "^AU\\d{11}$" ]
[sh:pattern "^BA\\d{12,13}$" ]
[sh:pattern "^BE\\d{10}$" ]
[sh:pattern "^BG\\d{9,10}$" ]
[sh:pattern "^CHE\\d{9}$" ]
[sh:pattern "^CY\\d{8}[A-Z]$" ]
[sh:pattern "^CZ\\d{8,10}$" ]
[sh:pattern "^DE\\d{9}$" ]
[sh:pattern "^DK\\d{8}$" ]
[sh:pattern "^EE\\d{9}$" ]
[sh:pattern "^EL\\d{9}$" ]
[sh:pattern "^ES[A-Z]\\d{7}[\\dA-Z]$" ]
[sh:pattern "^FI\\d{8}$" ]
[sh:pattern "^FL\\d{11}$" ]
[sh:pattern "^FR\\d{11}$" ]
[sh:pattern "^GB\\d{9}$" ]
[sh:pattern "^HR\\d{11}$" ]
[sh:pattern "^HU\\d{8}$" ]
[sh:pattern "^IE\\d[\\dA-Z]\\d{5}[A-Z]{1,2}$" ]
[sh:pattern "^IS\\d{5}$" ]
[sh:pattern "^IT\\d{10,11}$" ]
[sh:pattern "^JE\\d{10}$" ]
[sh:pattern "^KY\\d{6}$" ]
[sh:pattern "^LI\\d{5}$" ]
[sh:pattern "^LT(\\d{9}|\\d{12})$" ]
[sh:pattern "^LU\\d{8}$" ]
[sh:pattern "^LV\\d{11}$" ]
[sh:pattern "^MA\\d{7}$" ]
[sh:pattern "^MD\\d{7}$" ]
[sh:pattern "^ME(\\d{8}|\\d{12})$" ]
[sh:pattern "^MK\\d{13}$" ]
[sh:pattern "^MR\\d{8}$" ]
[sh:pattern "^MT\\d{8}$" ]
[sh:pattern "^NL\\d{9}B\\d{1,2}$" ]
[sh:pattern "^NO\\d{9}(M|MVA)?$" ]
[sh:pattern "^PL\\d{10}$" ]
[sh:pattern "^PT\\d{9}$" ]
[sh:pattern "^RO\\d{7,8}$" ]
[sh:pattern "^RS\\d{9}$" ]
[sh:pattern "^RU\\d{10}$" ]
[sh:pattern "^SE\\d{12}$" ]
[sh:pattern "^SG[A-Z]?\\d{9}[A-Z]$" ]
[sh:pattern "^SI\\d{8}$" ]
[sh:pattern "^SK\\d{10}$" ]
[sh:pattern "^SM\\d{5}$" ]
[sh:pattern "^TR\\d{10}$" ]
[sh:pattern "^UA\\d{8,12}$" ]
[sh:pattern "^US\\d{9}([A-Z]{2}\\d)?$" ]
[sh:pattern "^XK\\d{9}$" ]
).
sh:targetSubjectsOf tr:vatNumber;
sh:property [
sh:path tr:countryCode;
sh:minCount 1].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/> select * { ?this tr:vatNumber [] . FILTER NOT EXISTS { ?this tr:countryCode ?cc .}} limit 10
Examples:
59XREALPETROL11F
"REAL PETROL HOLDING KFT"
with VAT "HU24189514": country "IT" is wrong22X20110811----W
"INEOS CHLORVINYLS
LIMITED" with VAT "GB768506886": country "BE" is wrongMore example for traders in AE (United Arab Emirates), in particular the Dubai DMCC:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select ?eic ?co ?vat ?name ?notation ?function ?descr {
?x tr:countryCode "AE"
optional {?x tr:eic ?eic}
optional {?x tr:countryCode ?co}
optional {?x tr:name ?name}
optional {?x tr:notation ?notation}
optional {?x tr:function ?function}
optional {?x tr:vatNumber ?vat}
optional {?x tr:description ?descr}
}
eic | co | vat | name | notation | function | descr |
---|---|---|---|---|---|---|
48X000000000255O | AE | LUZIRA DMCC | BUGOLOBI | Interconnection Trade Responsible | A VAT number is not available for this company, so we are providing the Legal Entity Identifier (LEI) company registration number which is 984500O3EFBA8613AA78. | |
48X0000000000432 | AE | GB383911772 | COBBLESTONE ENERGY DMCC | COBBLESTONEDMCC | Balance Responsible Party | UK VAT Code not available. Value in above field is the registered company number. |
11X0-0000-0554-Q | AE | NONE | ENERGETECH TRADING DMCC | ENERGETECH | Balance Responsible Party | |
53XPL000000ININY | AE | Infusion International INC | INFUSION_INTL | Network User | The company registered in UAE. According to local (UAE) regulations they are treated as offshore company and they function in so called free zone. No possibility for them to get the VAT code. | |
59XVORTICES--017 | AE | Vortices Energy Ltd. | VORTICESENERGY | Balance Responsible Party | UAE Company; EU Value not inserted because non-european company. |
This indicates some trouble regarding the filling of VAT information for non-European parties. Going row by row:
vatNumber
, including LEIvatNumber
has the prefix "GB" given that it's
an AE company. What forced the data entry user to enter this
misleading value?sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this {
$this tr:countryCode ?co; tr:vatNumber ?vat
bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
filter(!strstarts(?vat,?co1))}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "Country code is {?co}";
sh:select """
select $this (tr:vatNumber as ?path) (?vat as ?value) ?co {
$this tr:countryCode ?co; tr:vatNumber ?vat}"""].
SPARQL check:
select $this {
$this tr:countryCode ?co; tr:vatNumber ?vat
bind(if(?co="CH","CHE",if(?co="GR","EL",?co)) as ?co1)
filter(!strstarts(?vat,?co1))}
A python script queries VIES in bulk, then RDFize VIES Checks records that as RDF.
XI
but most such companies in EIC
data are recorded with code GB
(except 2)ES
VAT numbers as
non-existent, perhaps the respective companies are not
registered for VAT. An example is 18XFERL-12345--K
Ferloga, SL (VAT ESB24049272
)
B24049272
, VAT
ESB24049272
, Kompass
ES1074724
B24049272
B24049272
,
24049272
, A24049272
We use SHACL-SPARQL in order to put the wrong VAT number
in ?value
:
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this {
$this tr:vatInVies false}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select $this (tr:vatNumber as ?path) ?value {
$this tr:vatInVies false; tr:vatNumber ?value}"""];
SPARQL check:
select $this {
$this tr:vatInVies false}
Notes:
Example of a data mistake: Installed Capacity per Production Type for France on 21-Jan-2022 showed this:
Production Type | 2021 MW | 2022 MW |
---|---|---|
Other | 1120 | 7900729 |
This means that 7.9 TW (7.9 million MW!) of "Other" capacity was newly installed in France. Have the French tamed some Dark Energy source that would solve all our energy problems?
Checking Installed Capacity Per Production Unit shows only 1 "Other" asset:
Production Type | Code | Name | Installed Capacity at the beginning of the year | Current Installed Capacity | Location | Voltage Connection Level | Commissioning Date |
---|---|---|---|---|---|---|---|
Other | 17W100P100P0352E | CYCOFOS TV2 | 62 | 62 | France | 225 | 01.09.2009 |
It was installed in 2009 and there's no change in capacity (62 MW) in the last two years. So unfortunately the 7.9 TW is not a miracle but a data error.
Implementation:
installedOutput
, and
130% of that as installedOutputHigh
ProductionUnit
) because the bottom level
(GenerationUnit
) capacities are already
included in the top level (see rule ProductionUnit-capacity-GTE-GenerationUnit-capacity)SPARQL check:
base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select ?aggr ?comp ?aggrOutput ?compOutput ?compOutputHigh {
?aggr a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityAggregated>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?aggrOutput.
?comp a tr:DataObservation; tr:dataItem <data/generation/InstalledGenerationCapacityComputed>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?compOutput;
tr:installedOutputHigh ?compOutputHigh.
filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))
} limit 1000
Implementation with SHACL-SPARQL. We return extra info
using sh:message
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
base <https://transparency.ontotext.com/resource/>
select (?aggr as $this) ?s2 {
?aggr a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?aggrOutput.
?s2 a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?compOutput;
tr:installedOutputHigh ?compOutputHigh.
filter(!(?compOutput <= ?aggrOutput && ?aggrOutput <= ?compOutputHigh))}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:message "Must be between {?compOutput} and {?compOutputHigh}";
sh:prefixes tr: ;
sh:select """
base <https://transparency.ontotext.com/resource/>
select $this (?aggrOutput as ?value) ?compOutput ?compOutputHigh {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityAggregated>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?aggrOutput.
?comp a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/InstalledGenerationCapacityComputed>;
tr:controlArea|tr:biddingZone ?area;
tr:assetType ?assetType;
tr:installedOutput ?compOutput;
tr:installedOutputHigh ?compOutputHigh}"""].
Out of 4.5M observations over 3 months, there are 3.3M violations:
controlArea
, but that's because it was
submitted at the top level of Production and Generation
Units, i.e. that is a discrepancycontrolArea
, neither itself or through its
Production Unit (parentResource
)sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
base <https://transparency.ontotext.com/resource/>
select $this ?s2 ?s3 {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:controlArea ?area; tr:generationUnit ?s2 .
optional {?s2 tr:parentResource? ?s3}
filter not exists {$this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
} limit 1000"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:select """
select $this (tr:controlArea as ?path) (?area as ?value) {
$this tr:controlArea ?area}"""].
SPARQL check:
base <https://transparency.ontotext.com/resource/>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
select * {
?this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>; tr:controlArea ?area.
filter not exists {?this tr:generationUnit / tr:parentResource? / tr:controlArea ?area}
optional {
?this tr:generationUnit ?gen
optional {?gen tr:controlArea ?genArea}}
optional {
?this tr:generationUnit/tr:parentResource ?prod
optional {?prod tr:controlArea ?prodArea}}
} limit 1000
SPARQL check:
base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select $this (?output1 as ?value) ?genUnitOutput {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:installedOutput ?output1.
optional {$this tr:generationUnit/tr:installedOutput ?output2}
filter (!bound(?output2) || !(?output1 = ?output2))
bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)
} limit 200
SPARQL count:
base <https://transparency.ontotext.com/resource/>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
select (count(*) as ?c) (count(?output2) as ?c2) {
$this tr:dataItem <data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:installedOutput ?output
filter not exists {$this tr:generationUnit/tr:installedOutput ?output2 filter (?output2=?output)}
optional{$this tr:generationUnit/tr:installedOutput ?output2}
}
Violations:
generationUnit
has different
installedOutput
22.3k of 4.5M observations over
3 months; 25k over 4 monthsgenerationUnit
doesn't have any
installedOutput
34.7k of 4.5M observations over
3 months; 63k over 4 monthsImplementation:
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
base <https://transparency.ontotext.com/resource/>
select $this ?s2 {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:installedOutput ?output1; tr:generationUnit ?s2.
filter not exists {?s2 tr:installedOutput ?output2
filter(?output1 = ?output2)}}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "The GenerationUnit installed capacity (nominalP) {?genUnitOutput}";
sh:select """
select $this (tr:installedOutput as ?path) (?output1 as ?value) ?genUnitOutput {
$this tr:installedOutput ?output1.
optional {$this tr:generationUnit/tr:installedOutput ?output2}
bind(if(bound(?output2),concat("is ",str(?output2)),"does not exist") as ?genUnitOutput)}"""].
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select $this {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:actualOutput ?actual; tr:installedOutput ?installed
filter(!(?actual <= ?installed))}"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "The actual generation output, `{?value}` of this observation is greater than the installed output, `{?installed}` for its Generation Unit." ;
sh:select """
select distinct $this ?installed ?value {
$this a tr:DataObservation ;
tr:actualOutput ?value; tr:installedOutput ?installed .
filter(!(?value <= ?installed))}
"""].
SPARQL check:
base <https://transparency.ontotext.com/resource/>
select $this {
$this a tr:DataObservation; tr:dataItem <https://transparency.ontotext.com/resource/data/generation/ActualGenerationOutputPerGenerationUnit>;
tr:actualOutput ?actual; tr:installedOutput ?installed
filter(!(?actual <= ?installed))};
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:Outage ;
tr:controlArea ?ca ;
tr:energyResource/tr:controlArea ?eca .
FILTER (?ca != ?eca)
$this tr:energyResource ?s2 .
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "The outage has the control area {?ca}, but its energy resource has the control area {?value}";
sh:select """
select distinct $this ?ca ?value {
$this a tr:Outage ;
tr:controlArea ?ca ;
tr:energyResource/tr:controlArea ?value .
FILTER (?ca != ?value)
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
$this a tr:Outage ;
tr:controlArea ?ca ;
tr:energyResource/tr:controlArea ?eca .
FILTER (?ca != ?eca)
}
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:Outage ;
tr:biddingZone ?ca ;
tr:energyResource/tr:biddingZone ?eca .
FILTER (?ca != ?eca)
$this tr:energyResource ?s2
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "The outage has the bidding zone {?ca}, but its energy resource has the bidding zone {?value}";
sh:select """
select distinct $this ?value ?ca {
$this a tr:Outage ;
tr:biddingZone ?ca ;
tr:energyResource/tr:biddingZone ?value .
FILTER (?ca != ?value)
$this tr:energyResource ?s2
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
$this a tr:Outage ;
tr:biddingZone ?ca ;
tr:energyResource/tr:biddingZone ?eca .
FILTER (?ca != ?eca)
}
sh:targetClass tr:Outage;
sh:property [
sh:path (tr:energyResource tr:eic);
sh:minCount 1].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
$this a tr:Outage .
FILTER NOT EXISTS {
$this tr:energyResource/tr:eic ?eic
}
}
sh:target [a sh:SPARQLTarget;
sh:prefixes tr: ;
sh:select """
select distinct $this ?s2 {
$this a tr:Outage ;
tr:installedOutput ?ca ;
tr:energyResource/tr:installedOutput ?eca .
FILTER (?ca != ?eca)
$this tr:energyResource ?s2
}
"""];
sh:sparql [a sh:SPARQLConstraint;
sh:prefixes tr: ;
sh:message "The outage has an installed capacity {?ca}, but its energy resource has the installed capacity {?value}";
sh:select """
select distinct $this ?ca ?value {
$this a tr:Outage ;
tr:installedOutput ?ca ;
tr:energyResource/tr:installedOutput ?value .
FILTER (?ca != ?value)
}
"""].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
$this a tr:Outage ;
tr:installedOutput ?ca ;
tr:energyResource/tr:installedOutput ?eca .
FILTER (?ca != ?eca)
}
sh:targetClass tr:Outage;
sh:property [
sh:path tr:availableOutput;
sh:lessThan tr:installedOutput].
SPARQL check:
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct * {
$this a tr:Outage ;
tr:availableOutput ?ao ;
tr:installedOutput ?io .
FILTER (?io <= ?ao)
}
Here are ideas for more validation rules that are not yet defined. As we define them, we move them to the section above:
The following rules will not be implemented:
The following rules were checked quickly and no errors were found, so we found no need to implement them:
grep "<mRID>" allocated-eic-codes.xml|sort|uniq -d
highVoltageLimit, assetType, controlArea, biddingZone
installedOutput
is not consistent
and we have a validation rule for that?x tr:eic [] filter (!exists {?x tr:notation []} || !exists {?x tr:name []})
MAW
, highVoltageLimit: KVT
)
select ?unit (count(*) as ?c) {?x tr:unit ?unit} group by ?unit
tr:unit
so the query below will not workselect ?powUnit ?powUnitN ?powUnitUOM ?genUnit ?genUnitN ?genUnitUOM {
?powUnit tr:generationUnit ?genUnit.
optional {?powUnit tr:installedOutput/tr:unit ?powUnitUOM}
optional {?genUnit tr:installedOutput/tr:unit ?genUnitUOM}
filter (!bound(?powUnitUOM) || !bound(?genUnitUOM) || ?powUnitUOM != ?genUnitUOM)
}
Validation service options are currently under investigation. There are two validators under consideration: TopQuadrant SHACL API and GraphDB's ShaclSail.
The chief questions to be investigated are:
The TQ SHACL API is an open-source API developed by TopQuadrant. It is based on Apache Jena.
sh:annotationProperty
, which would make
reporting harder.The performance issue could be mitigated by clever target definitions, i.e., using SPARQL for targeting.
Since we store data in GraphDB, we would need to fetch all data to be validated, store it in a Jena model (can be in-memory), then validate.
ShaclSail is implemented in RDF4J and is part of GraphDB. It is native to our database, so we would need no integration layer.
sh:SPARQLTarget
functionality
more efficiently.Since we never want to reject data, and only want to record validation errors, we need to run with the validator toggled off, then do a bulk validation. This can be achieved in one of two ways:
Of the two, the first option is notably better performance-wise, except for very large files.
Custom SPARQL validations are very flexible and offer better performance than SHACL-SPARQL. The downside is that we would need custom logic to implement them. Custom SPARQL validation also can easily be used in conjunction with one of the two SHACL validators.
The DQA (Data Quality Assessment) Dashboard displays validation results.
The functions (scope) of the DQA dashboard include:
DQA Mockups are shown in textual form in preceding sections:
This section specifies Integrations and/or Validations based on external data to be integrated into the KG. In addition to the external data sources described in subsections, we also considered the following sources:
Over 10000 VAT numbers are present in the data. We will validate them using the VIES-on-the-Web system. It is a free web service provided by the EC, running on top of national VAT databases corresponding to EC Member States and Northern Ireland.
The service is a simple SOAP API where two parameters are
sent as XML elements: countryCode
and
vatNumber
. The response is a boolean value
whether the VAT number is valid, and if valid then some
basic information about the entity it corresponds to.
Example response for VAT IT13433711002
:
soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
<countryCode>IT</countryCode>
<vatNumber>13433711002</vatNumber>
<requestDate>2022-01-12+01:00</requestDate>
<valid>true</valid>
<name>ARCADIA ITALIA S.R.L.</name>
<address>VIA PERUGINO 4 00196 ROMA RM </address>
<checkVatResponse>
</soap:Body>
</soap:Envelope> </
An important limitation of VIES is that not all countries relevant for ENTSOE are present. A future project should evaluate the possibility to use an additional free service such as VATApp, or use directly open data dumps provided by the respective countries (UK and NO in particulr).
select (count(*) as ?c) ?co {
?x tr:vatNumber ?vat
optional {?x tr:countryCode ?co}
} group by ?co order by desc(?c)
GB
,
CH
, UA
, MK
,
RS
, GR
, AL
,
NO
, BA
, MD
,
XK
, ME
, US
,
TR
, LI
, SG
,
KY
, AE
, GE
,
IS
, AD
, AR
,
AU
, MA
, MY
,
NC
, PR
, RU
,
SM
, UK
(Total 30)TR LI SG KY AE GE IS AD AR AU MA MY NC PR RU SM UK
.
They'll will be ignored for VAT format analysis (see
below)VAT Number Statistics: Out of 9,919 VAT numbers
K42101801N
,
country AL
GREL099790528
, country GR
.countryCode | total | valid | invalid | names |
---|---|---|---|---|
AT | 128 | 110 | 18 | 110 |
BE | 133 | 107 | 26 | 107 |
BG | 187 | 169 | 18 | 169 |
CY | 22 | 10 | 12 | 10 |
CZ | 326 | 199 | 127 | 199 |
DE | 1027 | 969 | 58 | |
DK | 92 | 85 | 7 | 85 |
EE | 57 | 47 | 10 | 47 |
EL | 93 | 90 | 3 | 90 |
ES | 3455 | 1499 | 1956 | |
FI | 229 | 224 | 5 | 224 |
FR | 124 | 107 | 17 | 107 |
HR | 152 | 106 | 46 | 106 |
HU | 109 | 75 | 34 | 75 |
IE | 57 | 49 | 8 | 49 |
IT | 611 | 349 | 262 | 349 |
LT | 88 | 69 | 19 | 69 |
LU | 25 | 21 | 4 | 21 |
LV | 70 | 52 | 18 | 52 |
MT | 11 | 11 | 11 | |
NL | 202 | 165 | 37 | 165 |
PL | 341 | 235 | 106 | 235 |
PT | 102 | 95 | 7 | 95 |
RO | 228 | 186 | 42 | 186 |
SE | 34 | 31 | 3 | 31 |
SI | 117 | 79 | 38 | 79 |
SK | 274 | 195 | 79 | 195 |
XI | 2 | 1 | 1 | 1 |
TOTAL | 8296 | 5335 | 2961 | 2867 |
#+TBLFM: @>$2..$> = vsum(@I..@-1)
VAT format was researched on:
EU-TID
:Format and structure of tax identification numbers (TINs) in the EU
[JKL]
, and the last character is a
letter. E.g. K99999999L, L99999999G
ALL11731504A
,
ALJ61820031J
, ALL32130008F
,
M12221008I
, ALK11624001V
WP
: 'ATU'+8 digits. E.g.
ATU99999999
. EU-TID
9 digits.
U50568407
, U49637200
and ATU6729404
(7 digits)WP
: 'BE' + 8 digits + 2 check
digits. E.g. BE09999999XX
. EU-TID
:
10 digits
GB768506886
, 0711797282
,
0754605263
BG999999999
. EU-TID
: 10 digits
CHE
followed by
9
digitsCH
followed by
6
digitsCH
followed by
9
digitsCH
followed by
11
digitsCHE
followed by
8
digitsCHE
followed by
7
digitsWP
: 9 characters. E.g.
CY99999999L
. EU-TID
: the same for
individuals but 8 digits for legal entities.
10375510G
and
10390426G
, miss CY
prefixEU-TID
8 digits.
DE289523572
and
DE814987657
DE999999999
. EU-TID
: 11 digits.
DE29149497
and
DE29535215
DE4370403223
and
DE3503951816
DE
: 6 of them have
11 digits, 3 of them have 10 digitsDK99999999
. EU-TID
: 8 digits
DK
prefixGB684966762
and
CZ07292015
EU-TID
: 8
digits for legal entities and 11 digits for individuals.
14912868
which misses prefix as well
as it has 8 digits instead of 9EU-TID
:same. Where
the first letter defines the type of company and the
following first 2 digits define the province where the
company was registered. The last character is a control
digit.ESX9999999R
ESA0879906
,
ESB9159561
, ESA5840219
ESB588111980
(9 digits instead of
8)ES
:
B95713541
, PT980633745
and
PT508193117
WP
: FI + 7 digits + check
digit. E.g. FI99999999
.
EU-TID
:same
FI
prefixWP
: 'FR'+ 2 digits (as
validation key) + 9 digits (as SIREN), the first and/or the
second value can also be a character – e.g.
FRXX999999999
. EU-TID
: 9 digits
for legal entities and completely different thing for
individuals.
DE813871435
and
0000000000000
FR5950773519
and
FR2783328587
FR572221034
and
FR440117620
FR69448572
123 4567 89
GB
prefixEU-TID
: 9 digits.
EL + 9 digits
EL
GREL099790528
WP
: 'HR'+ 11 digits.
EU-TID
: 11 digits.
HR
prefixHR1642377552
HU12345678
.
EU-TID
: 10 digits.
10728068244
(too many digits)WP
: Two standards: 'IE'+7 digits+2
letters, e.g. IE1234567FA
; or 'IE'+7 digits+1
letter, optionally followed by 'W' for married women, e.g.
IE1234567T
or IE1234567TW
.
EU-TID
: the same both for legal entities and
individuals.GB
IE9Y66I020
WP
: 11 digits (the first 7
digits is a sequential number, the following 3 indicate the
province of residence, the last digit is a checksum.
EU-TID
: the same.
IT
prefixIT1374910113
(10
digits instead of 11)IT
prefix, e.g.
2822840605
HU24189514
WP
: 9 or 12 digits.
EU-TID
: the same.
LT860632610
(9
digits)LT1106284811
(10
digits)LT10000580981
WP
: 8 digits.
EU-TID
: 11 digits.
LU
prefixWP
: 11 digits
.
EU-TID
: the same.
LV
prefix0203943
(no
MD
prefix)MD05754540655
02751372
(without
ME
prefix)40310007516
MK4032013544513
4080009501086
(without
MK
prefix)MK403000452960
(12 digits after
prefix)40430008038555
(14 digits)WP
: 'NL'+9 digits+B+2
digits. E.g. NL999999999B01
.
EU-TID
: 9 digits.
32117527
,
801424250RT000
, GB115163840
and
IT01831490766
NO989795848MVA
NO
981355210
GB894770371
WP
: 'PL'+10 digits.
EU-TID
: the same.
WP
: 'PT'+9 digits (last
digit is a checksum). EU-TID
: the same.
WP
: 'RO' (optional) + 10
digits. EU-TID
: the same.
RO13328043
(8
digits)RO1092690
RO943038
RO291111546
RS
, e.g
RS107350223
SR
and SK
prefix:
SR109027050
, SK2022490800
,
SR105523323
, SR107634440
,
SR104217641
, SR104613706
WP
: 12 digits.
EU-TID
: 10 digits.
5561085688
WP
:'SI'+8 digits.
EU-TID
: the same.
SI20874731
WP
: 'SK'+10 digits.
EU-TID
the same.
36699624
40298595
Most VATs comply with their official definitions. The majority of numbers start with their corresponding country code.
However, there are VATs which are valid but miss their country prefix. The inconsistencies are of several types:
countryCode
different from
vatNumber
prefix, e.g. DE289523572
appears in CZ
VATs; GB
appears in
VATs of countries like NO, NL, IE, DK, BE
IE9Y66I020
where the format doesn't allow for letters between the
country code and digitsFor easier verification of VAT Numbers (both format and existence in VIES), a python script was developed. It:
DE289523572
.It can also accept a single VAT number, validate it, and retrieve all the info from the VIES service.
The above script also queries EU VIES for VAT codes in EU+IE and records it as CSV: The query etl_scripts/VAT-from-VIES.ru RDFizes this data and attaches it to EIC nodes:
tr:viesCheckDate
(request date
): when the check was madetr:vatInVies
(VAT validity
):
whether the VAT is found and valid (not expired)tr:nameInVies
(company name
):
Legal company name as reported by VIEStr:addressInVies
(address
):
Company address as reported by VIESOpen Street Map (OSM) is a global crowd-sourced database of geographic information, including power plants and generators. E.g. the screenshot below shows a coal power station and some of the OSM data fields that describe it.
OSM has three element types:
node
- represents a specific point on the
earth's surface defined by its latitude and longitude. Each
node comprises at least an id number and a pair of
coordinates.way
- ordered list of between 2 and 2,000
nodes that define a polyline. Ways are used to represent
linear features such as rivers and roads.relation
- multi-purpose data structure
that documents a relationship between two or more data
elements (nodes, ways, and/or other relations)The following screenshots show Varna Power Plant with its
three generators. Note that the generators are of type
node
and they are part of the
relation
corresponding to te power plant.
We'll use it to complement ENTSOE Production Unit data with detailed geo-information.
OSM includes detailed data such as:
Key:ref:EU:ENTSOE_EIC
We've tried several different services to provide OSM data:
Another reason why we chose Overpass over Sophox is that
the SPARQL endpoint did not always work properly. Eg 20k
Plants have property osmt:name
, but when you
try to download all the Plants along with other properties,
only the first 2k records had the osmt:name
field.
Although the world-wide coverage of power plants in OSM is very good, its number of EIC ids is not so large. Therefore:
In order to contribute to OSM:
Also there are third party editors which we can use as alternatives. These are the most popular:
OSM Tag
Info is a series of dashboards allowing to explore the
distribution of different tags. We used it to explore the
distribution of objects with a EIC id
(ref:EU:ENTSOE_EIC
) and objects tagged as
power:plant
. The Timelines display the gradual
contribution of this type of objects to the OSM
database.
Geography and Chronology of tag power=plant (61.5k); plus tag power=generator (1.84M)
Geography and Chronology of key ref:EU:ENTSOE_EIC (3667). Our recent contributions are also visible on this timeline.
Data about the Plants has been downloaded in JSON format from Overpass by using the below query:
/*
This has been generated by the overpass-turbo wizard.
*/
[out:json][timeout:3000];
(
// query part for: “power=plant
node["power"="plant"];
way["power"="plant"];
relation["power"="plant"];
);
// print results
out body;
>;
out skel qt;
Generators have been downloaded with wget request to
http://overpass-api.de/api/interpreter
because
the Overpass workbench was crashing due to the large size of
the data.
First you should create file generator.osm which contains the following query:
/*
This has been generated by the overpass-turbo wizard.
*/
[out:json];
(
// query part for: “power=generator
way["power"="generator"];
);
// print results
out body;
>;
out skel qt;
After that run below command:
wget -O generator.json --post-file=generator.osm "http://overpass-api.de/api/interpreter"
You have to repeat above steps for node
and
relation
, save the ouput in different json
files and then merge them into one. We have to do this due
to large size of generators. Other option is to download the
generator for each country because in OSM you can't filter
by continent.
Note: There are Plants and Generators which have output
electricity with values yes
or no
instead of number.
We've researched how accurate are the coordinates for the Plants and Generators when we have cascades, where the dam/weir and pipeline can be far removed. We have gone through several examples and we can say that the pinpoints are good.
For example, below is a comparison of the outline and
Also, we have found an exception where we have a hydro plant which covers a large area, but even then we have close point to the facility:
Some other useful Overpass queries:
Search by EIC
[out:json][timeout:300];
(
way["ref:EU:ENTSOE_EIC"~"32W001100100089X"] ;
);
out body;
>;
out skel qt;
Search for centroid
[out:csv(::type,::id,name,::lat,::lon)][timeout:20];
(rel(2865507);) -> .object;
.object out center;
The following screenshots show some excellent OSM issue/validation reports
We also investigate a number of other external databases. We analyse them and evaluate the possibility to import the missing generation and production units into Open Street Map.
Data fusion of multiple power plant databases. 7 databases, including ENTSO Transparency, of which 6 are free (Platts WEPP is paid).
Summary by country
csvtk summary -f id -g Country matched_data_red.csv |csvtk sort -k 2:rn
Country,id:count
Germany,1193
Norway,1009
France,993
Spain,761
Italy,575
Switzerland,528
United Kingdom,464
Portugal,288
Finland,212
Austria,201
Sweden,166
Romania,142
Poland,120
Czech Republic,55
Netherlands,55
Greece,50
Bulgaria,49
Slovenia,46
Belgium,45
Ireland,39
Slovakia,36
Denmark,32
Hungary,30
Croatia,27
"Macedonia, Republic of",12
Estonia,11
Lithuania,5
Latvia,4
Luxembourg,2
Summary by project ID
csvtk cut -f projectID matched_data_red.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
5159 CARMA
3455 JRC
2728 OPSD
1370 GPD
1324 ENTSOE
1197 GEO
WRI GPPD (World Resources Initiative, Global Power Plant Database) a comprehensive, global, open source database of power plants. The database covers approximately 35,000 power plants from 167 countries.
Available fields:
country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,other_fuel3,commissioning_year,owner,source,url,geolocation_source,wepp_id,year_of_capacity_data,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,generation_gwh_2018,generation_gwh_2019,generation_data_source,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
The latest version is form June 2021. Approximatly 10765 powerplants are in ENTSOE countries
Summary by country
csvtk summary -f gppd_idnr:count -g country global_power_plant_database.csv|csvtk sort -k 2:nr|head -21
country,gppd_idnr:count
USA,9833
CHN,4235
GBR,2751
BRA,2360
FRA,2155
IND,1589
DEU,1309
CAN,1159
ESP,829
RUS,545
JPN,522
AUS,486
PRT,469
CZE,462
ITA,396
CHL,315
NOR,306
MEX,277
VNM,236
ARG,236
THA,196
POL,189
Summary by ENTSOE country, marked with "*" are countries where we are not sure of relevant for ENTSOE
csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv |csvtk summary -f gppd_idnr -g iso3
iso3,gppd_idnr:count
ALB,8
AUT,103
BEL,69
BGR,43
BIH,20
BLR,24 (*)
CHE,168
CYP,3
CZE,462
DEU,1309
DNK,47
ESP,829
EST,17
FIN,185
FRA,2155
GBR,2751
GRC,90
HRV,24
HUN,18
IRL,59
ISL,20
ITA,396
LTU,6
LUX,2
LVA,5
MDA,6 (*)
MKD,12
MNE,3
NLD,71
NOR,306
POL,189
PRT,469
ROU,68
RUS,545 (*)
SRB,12
SVK,30
SVN,8
SWE,168
UKR,64
csvtk summary -f capacity_mw:min,capacity_mw:q1,capacity_mw:q2,capacity_mw:median,capacity_mw:q3,capacity_mw:mean,capacity_mw:max,capacity_mw:stdev,capacity_mw:variance global_power_plant_database.csv
min, q1, q2, median,q3, mean, max, stdev, variance
1.00,4.90,16.74,16.74, 75.34,163.36,22500.00,489.64,239743.48
```bash
csvtk summary -f year_of_capacity_data:min,year_of_capacity_data:max -i global_power_plant_database.csv
min, max
2000.00,2019.00
Breakdown by all fuels
csvtk cut -f primary_fuel,other_fuel1,other_fuel2,other_fuel3 global_power_plant_database.csv|perl -pe "s{,}{\n}g"|sort|uniq -c|sort -rn
10718 Solar
7191 Hydro
5358 Wind
4512 Gas
3568 Oil
2420 Coal
1506 Biomass
1182 Waste
195 Nuclear
189 Geothermal
186 Storage
130 Other
48 Cogeneration
35 Petcoke
10 Wave and Tidal
Breakdown by primary fuel in ENTSOE countries:
csvtk join -f iso3;country data\countries.csv data-ext\global_power_plant_database_v_1_3\global_power_plant_database.csv | csvtk cut -f primary_fuel | sort|uniq -c|sort -rn
3921 Solar
2329 Wind
2056 Hydro
779 Gas
503 Biomass
443 Waste
420 Coal
125 Oil
74 Nuclear
46 Geothermal
31 Storage
22 Other
8 Wave and Tidal
7 Cogeneration
PyPSA-Eur, the first open model dataset of the European power system at the transmission network level to cover the full ENTSO-E area, is presented.
A power plant database is presented using a sophisticated algorithm that matches records from a wide range of available sources and includes geo-data
5151 records
Fields:
id,Name,Fueltype,Technology,Set,Country,Capacity,Duration,YearCommissioned,Retrofit,lat,lon,File,projectID,bus
Example row:
705,Ec łódź,Hard Coal,Steam Turbine,PP,Poland,403.0,0.0,,, 51.74050670000001,19.440413600000007,, "{'CARMA': ['CARMA25606', 'CARMA25608', 'CARMA25607'], 'ENTSOE': ['19W000000000107C', '19W000000000106E'], 'GEO': ['GEO42495']}",4403
Summary by fuel type
csvtk summary -f id -g Fueltype PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
Hydro,3594
OCGT,406
CCGT,257
Hard Coal,197
Bioenergy,188
Oil,132
Waste,129
Other,79
Lignite,72
Nuclear,62
Geothermal,29
"CCGT, Thermal",2
Storage Technologies,1
Pv,1
Caes,1
Summary by country
csvtk summary -f id -g Country PyPSA-Eur-powerplants.csv|csvtk sort -k 2:rn
France,830
Spain,734
Norway,581
Switzerland,555
Germany,552
Italy,507
United Kingdom,305
Finland,202
Austria,163
Sweden,145
Portugal,126
Poland,56
Netherlands,48
Slovenia,46
Greece,38
Romania,35
Slovakia,32
Belgium,31
Bulgaria,30
Czech Republic,28
Croatia,24
Ireland,23
Denmark,23
Hungary,20
Lithuania,5
Estonia,5
Latvia,4
Luxembourg,2
Summary by source file
csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe "s{\, }{\n}g"$ csvtk cut -f File PyPSA-Eur-powerplants.csv|perl -pe 's{\, }{\n}g; s{"}{}g'|sort|uniq -c|sort -rn|head -20
2232
727 SEDE
417 BFE
400 ENTSOE
230 IWPDCY.csv
220 GOV
198 EnergyAuthority
147 energy_storage_exchange
144 Department for Business Energy & Industrial Strategy
130 https://www.verbund.com/de-at/ueber-verbund/kraftwerke/unsere-kraftwerke
98 Energias Endogenas de Portugal
96 RTE
70 Nordpool
53 Red Eléctrica de España
43 Terna
30 SEAS
24 Vattenfall
22 GPI
15 Tennet_Q4
15 Energinet DK
Summary by source dataset
csvtk cut -f projectID PyPSA-Eur-powerplants.csv|perl -lne "print \$1 while m{'([A-Z]+)'}g"|sort|uniq -c|sort -rn
4072 CARMA
2734 OPSD
1730 ENTSOE
883 GEO
816 GPD
230 IWPDCY
147 ESE
In 2017 the Joint Research Centre developed a Power Plant Database for energy systems modelling (JRC-PPDB) in order to support the unit activities in energy systems modelling and knowledge management.
Size: Production and Generation units: 7118, of which 3961 unique Production Unit EIC
A mapping between identifiers is provided in
JRC_OPEN_LINKAGES.csv
.
Unique ID counts
csvtk summary -f eic_p:countunique,eic_g:countunique,eprtr_facilityID:countunique,WRI_id:countunique,GEO_id:countunique
,fresna_id:countunique JRC_OPEN_LINKAGES.csv
eic_p, eic_g, eprtr,WRI, GEO, fresna
1967, 3359, 592, 983, 597, 1306
Breakdown of WRI identifiers
csvtk cut -f WRI_id JRC_OPEN_LINKAGES.csv |tr 0-9 d|sort|uniq -c
4 BRAddddddd
2 CANddddddd
213 GBRddddddd
55 GEODBddddddd
2 USAddddddd
2171 WRIddddddd
The table summarises the contents of the datasests above, the number of records with EIC identifiers and the number of coordinate pairs in each of the datasets.
Also are counted the EIC codes present in each dataset which we also find in Open Street Map and the other external datasets
SPARQL query
for entities with ref:EU:ENTSOE_EIC
on OSM.
Data Source | Items with EIC | Distinct EIC ids | Coords Total | OSM Match |
---|---|---|---|---|
OSM TagInfo | 3364 | - | 3364 | - |
Sophox | 3540 | 3533 | 3540 | - |
PyPsa | 5061 | 5049 | 1975 | 3541 |
Open Power System | 4277 | 3944 | 997 | 3639 |
JRC Open Plants | 3961 | 3961 | 4865 | 993 |
JRC Open Generators | 6809 | 6809 | 4722 | 59 |
Wikidata | 1267 | 1267 | 1120 | 791 |
The following analytics will be provided, using items from data domains EIC, Generation, Load, and Outages.
A faceted search will allow searching for production and generation units based on their location and fuel type. The following facets will be included:
Aggregated values for number of units and cumulative capacity will be displayed on each element of the search.
The results of the search will be displayed as a list. It is however possible to also combine the search with other modalities and display the result on a map or on a chart
A timeline showing all the data from the load domain (actual and projected, 5 individual tables) for a given Control Area, Bidding Zone, Country
Below is a mockup of this chart realized using Google Charts.
BZA BG
An interactive version of the chart is available here. N.B it is not available for mobile browsers.
The mockup is limited by Google Charts' features but shows how the data looks when superimposed. Of particular interest are the occasions when the forecast and actual load are mismatched. This is easily visible on the chart and we will emphasise on them in the final version, using the available functionalities of the Vega charting library, (e.g this example)
A Timeline showing day ahead wind and solar and actual generation wind and solar.
The timeline will be analogous to the previous example.
Zoomable and navigable map with the production and generation units.
Example of a map showing power plants by capacity and fuel type:
Drill-down data is available when interacting with a marker. This can be:
Outages displayed on a map: current or future, planned or forced, active or canceled.
A timeline showing
Prices Of Activated Balancing Energy
and
ActivatedBalancingEnergy
for any given area.
The diagram consists of 2 vertically symmetrical zones, one
for "up" regulation and one "down" regulation. Each zone
superimposes - 4 line charts for the price of each resource
type - A stacked histogram for the volume of each activated
resource
The following transformations need to be applied
An example of a similar diagram can be seen in this vega example
A timeline chart with circular markers showing future
accepted offers from
AcceptedAggregatedOffers_17.1.D
data item The
chart will display the following variables: - temporal
dimension (x-axis) - area concerned by the bid (y-axis):
this will create a swimlane effect - Volume: size of the
marker - direction: shape of the marker (a circular marker
with a protrusion directed up or down) - type of asset:
color of the marker - a summary of the above variables
displayed in the popup
An example of a similar diagram can be seen in this vega-lite example
A timeline chart combining
ActivatedBalancingEnergy_17.1.E
and
PricesOfActivatedBalancingEnergy_17.1.F
Similar to the chart above the price/volume bubble chart will show price instead of time.
Technologies to use for Analytics:
The data is updated automatically from the ENTSOE SFTP and REST services on a daily basis
The semantic models is in the form of turtle
examples and diagrams of all semantic data areas. They are
shown in previous sections:
"Manual" RDFization
doc SFTP
Appendix B: Area Naming
Convention has the zone codes used on ENTSO portal.
10Y1001A1001A869
is
BZN|UA-DobTPP
(bidding zone
Ukraine-Dobrotvirska TPP)BZN
is a prefix that is displayed for the
particular time series, not an attribute of that EICUA-DOB_TPP
(different spelling) and functions "Control Area, Market
Balance Area, Scheduling Area" but not "Bidding Zone"kb.ttl
)
describes the Data Items, more details are needed. See
section aboveThe TEKG ontology is available in tr.ttl and covers the full scope of the semantic models.
The ontology is also available in the Annex of this document.
We have revised and elaborated the conceptual architecture compared to the proposal. It presents the technologies and services that TEKG will use and implement to achieve its objectives:
All components will be packaged and deployed in an enterprise-ready fashion using Docker, Kubernetes, and Helm charts.
The programing languages and frameworks used for development of the different components, services and tests are:
Source data is obtained from ENTSOE transparency platform on a scheduled basis (frequency to be discussed) via:
*.csv
file
extensionThe service will convert the ingested XMLs and CSVs and produce RDF data. The initial assumption was that we are going work only with the XMLs from the REST API and the main tool that we proposed was XSPARQL. After careful exploration of the data and its sources, we discovered additional data in CSV format that we need.
To achieve flexible and generic service that can handle the required data, we've considered using additional tools like OntoRefine and TARQL. In order to measure the performance of the different tools and to pick the right one for the service, we've done some experiments. The results are presented in the Conversion Performance Comparison section.
XSPARQL is a language for transforming data between XML and RDF.
XSPARQL Github contains the implementation of the tools that we are using.
Data
<?xml version="1.0" encoding="UTF-8"?>
Configuration_MarketDocument xmlns="urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0">
<mRID>8be8471a92f345ce8129102d965c19d7</mRID>
<type>A95</type>
<process.processType>A39</process.processType>
<sender_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</sender_MarketParticipant.mRID>
<sender_MarketParticipant.marketRole.type>A32</sender_MarketParticipant.marketRole.type>
<receiver_MarketParticipant.mRID codingScheme="A01">10X1001A1001A450</receiver_MarketParticipant.mRID>
<receiver_MarketParticipant.marketRole.type>A32</receiver_MarketParticipant.marketRole.type>
<createdDateTime>2022-01-17T12:50:49Z</createdDateTime>
<TimeSeries>
<mRID>87546cb0270a4ea8</mRID>
<businessType>B11</businessType>
<implementation_DateAndOrTime.date>2021-10-01</implementation_DateAndOrTime.date>
<biddingZone_Domain.mRID codingScheme="A01">10YUA-WEPS-----0</biddingZone_Domain.mRID>
<registeredResource.mRID codingScheme="A01">62W875768058757F</registeredResource.mRID>
<registeredResource.name>KALUSHCHPP</registeredResource.name>
<registeredResource.location.name>Kalush</registeredResource.location.name>
<ControlArea_Domain>
<mRID codingScheme="A01">10YUA-WEPS-----0</mRID>
<ControlArea_Domain>
</Provider_MarketParticipant>
<mRID codingScheme="A01">10X1001C--00001X</mRID>
<Provider_MarketParticipant>
</MktPSRType>
<psrType>B05</psrType>
<production_PowerSystemResources.highVoltageLimit unit="KVT">110</production_PowerSystemResources.highVoltageLimit>
<nominalIP_PowerSystemResources.nominalP unit="MAW">200</nominalIP_PowerSystemResources.nominalP>
<GeneratingUnit_PowerSystemResources>
<mRID codingScheme="A01">62W2081564720502</mRID>
<name>KALUSHCHPP-V</name>
<nominalP unit="MAW">200</nominalP>
<generatingUnit_PSRType.psrType>B05</generatingUnit_PSRType.psrType>
<generatingUnit_Location.name>Kalush</generatingUnit_Location.name>
<GeneratingUnit_PowerSystemResources>
</MktPSRType>
</TimeSeries>
</Configuration_MarketDocument> </
Script
prefix ns: <urn:iec62325.351:tc57wg16:451-6:configurationdocument:3:0>
prefix tr: <https://transparency.ontotext.com/resource/tr/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
declare variable $input as xs:string external;
declare option saxon:output "method=text";
for $data in doc($input)/ns:Configuration_MarketDocument/ns:TimeSeries
let $BASE := "https://transparency.ontotext.com/resource/"
let $TYPE := fn:concat($BASE,"type/")
let $UNIT := fn:concat($TYPE,"UnitSymbol/") # TODO or "UnitOfMeasure/" ?
let $EIC := fn:concat($BASE,"eic/")
let $url := fn:concat($EIC,$data/ns:registeredResource.mRID/text())
construct {
<{$url}>
tr:dateImplemented {$data/ns:implementation_DateAndOrTime.date/text()}^^xsd:date;
tr:notationAlt {$data/ns:registeredResource.name/text()};
tr:location {$data/ns:registeredResource.location.name/text()};
tr:assetType <{fn:concat($TYPE,"Asset/",$data/ns:MktPSRType/ns:psrType/text())}>.
{
for $x in $data/ns:biddingZone_Domain.mRID/text() # 0-1
construct {<{$url}> tr:biddingZone <{fn:concat($EIC,$x)}>},
for $x in $data/ns:ControlArea_Domain/ns:mRID/text() # 1-many
construct {<{$url}> tr:controlArea <{fn:concat($EIC,$x)}>},
for $x in $data/ns:Provider_MarketParticipant/ns:mRID/text() # 1-many
construct {<{$url}> tr:providerParticipant <{fn:concat($EIC,$x)}>},
for $x in $data/ns:MktPSRType/ns:production_PowerSystemResources.highVoltageLimit # 0-1
construct {
<{$url}> tr:highVoltageLimit {$x/text()}^^xsd:float
},
for $x in $data/ns:MktPSRType/ns:nominalIP_PowerSystemResources.nominalP # 0-1
construct {
<{$url}> tr:installedOutput {$x/text()}^^xsd:float
},
for $gen in $data/ns:MktPSRType/ns:GeneratingUnit_PowerSystemResources # 0-many
let $url1 := fn:concat($EIC,$gen/ns:mRID/text())
construct {
<{$url}> tr:generationUnit <{$url1}>.
<{$url1}>
tr:notationAlt {$gen/ns:name/text()};
tr:assetType <{fn:concat($TYPE,"Asset/",$gen/ns:generatingUnit_PSRType.psrType/text())}>;
tr:location {$gen/ns:generatingUnit_Location.name/text()};
tr:installedOutput {$gen/ns:nominalP/text()}^^xsd:float
}
}
}
Result
@base <https://transparency.ontotext.com/resource/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tr: <https://transparency.ontotext.com/resource/tr/> .
<eic/62W875768058757F> tr:dateImplemented "2021-10-01"^^xsd:date .
<eic/62W875768058757F> tr:notationAlt "KALUSHCHPP" .
<eic/62W875768058757F> tr:location "Kalush" .
<eic/62W875768058757F> tr:assetType <type/Asset/B05> .
<eic/62W875768058757F> tr:biddingZone <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:controlArea <eic/10YUA-WEPS-----0> .
<eic/62W875768058757F> tr:providerParticipant <eic/10X1001C--00001X> .
<eic/62W875768058757F> tr:highVoltageLimit "110"^^xsd:float .
<eic/62W875768058757F> tr:installedOutput "200"^^xsd:float .
<eic/62W875768058757F> tr:generationUnit <eic/62W2081564720502> .
<eic/62W2081564720502> tr:notationAlt "KALUSHCHPP-V" .
<eic/62W2081564720502> tr:assetType <type/Asset/B05> .
<eic/62W2081564720502> tr:location "Kalush" .
<eic/62W2081564720502> tr:installedOutput "200"^^xsd:float .
Ontotext has packaged XSPARQL as a web service (WAR file). The benefit of using a web service is that it saves Java startup time, which is needed for every invocation of the command-line tool.
As a further optimization, we considered precompiling the various conversion and putting them into a Registry. This would save the transpilation time (from XSPARQL to XQuery) and compilation time (from XQuery to executable transformation).
log4j
, which needs to be updated to
the latest version due to security vulnerabilities.OntoRefine is a user-friendly tool for cleaning data and converting it to RDF.
The fact that the OntoRefine handles various file formats, including XML, CSV, JSON, etc., makes it a perfect candidate for the current project. It is the preferred option because it is developed and maintained by Ontotext, and shows best overall performance.
Issues:
Note: the rest of this section describes Reconciliation, which is not used in the current project.
Another big advantage is matching of tabular data to KGs via different reconciliation services that OntoRefine supports. Reconciliation services provide semantic matching functionality.
There are various free reconciliation services that can be used by OntoRefine. The Reconciliation Testbench provides a list of some of these services. We host and support three such services based on a subset of Wikidata:
The OntoRefine Mapping UI allows visual creation of semantic transformations. Here's a transformation for the same XML data as in the XSPARQL example:
Using the same data as in the XSPARQL example, it produces a semantically equivalent result.
A conversion script can be exported from the Mapping UI (as JSON) and used as a batch process (see next section). Additionally, the script contains all operations performed over the dataset, including data cleaning and the reconciliation operations.
We developed a conversion service using OntoRefine: a public library called ontorefine-client.
TARQL is a highly performant tool for converting very large CSV/TSV files.
Issues:
If the project used XSPARQL for conversion of XML files, we could use TARQL for conversion of CSV files.
Data (CSV example from CrunchBase)
permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round
lifelock,LifeLock,,web,Tempe,AZ,1-May-07,6850000,USD,b
Mapping
PREFIX ex: <http://ex.org/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CONSTRUCT {
?URI a ex:Organization;
ex:permalink ?permalink;
ex:name ?company;
ex:employees ?numEmployees;
ex:category ?category;
ex:city ?city;
ex:state ?state;
ex:fundingDate ?fundedDate;
ex:raisedAmt ?amount;
ex:raisedCurrency ?raisedCurrency;
ex:round ?round;
}
WHERE {
BIND (URI(CONCAT('http://ex.org/companies/', ?permalink)) AS ?URI)
BIND (xsd:integer(?numEmps) AS ?numEmployees)
BIND (xsd:decimal(?raisedAmt) AS ?amount)
}
Result
<http://ex.org/companies/lifelock>
a ex:Organization ;
ex:permalink "lifelock" ;
ex:name "LifeLock" ;
ex:category "web" ;
ex:city "Tempe" ;
ex:state "AZ" ;
ex:fundingDate "1-May-07" ;
ex:raisedAmt "6850000"^^xsd:decimal ;
ex:raisedCurrency "USD" ;
ex:round <http://example.com/b> .
TARQL does not have a web service implementation for so we would need to implement one.
We did some performance testing to ensure that the most suitable tool can be selected. We used prototypes of conversion services to measure their performance.
For the comparison we use XML datasets in
data/xml/Production_Unit
(documents of type
Configuration_MarketDocument
).
Because the number and size of data files is not that large yet, we have multiplied them in order to measure at scale and reproduce the load of an actual production environment.
The first two columns show count and size of files (MB), the last two columns show time to process by 2 of the tools (seconds).
count | MB | XSPARQL | OntoRefine |
---|---|---|---|
46 | 4.1 | 2 | 1.6 |
460 | 12.6 | 13.6 | 10.6 |
4600 | 353.9 | 181.9 | 156.5 |
7000 | 620.2 | 294.6 | 264.1 |
For comparison purposes we made the services work in an identical way and process the datasets one by one. There are a few optimizations possible for each service, but they are not worth doing at the moment.
We compared the performance of TARQL and OntoRefine on a 240 MB CSV file, producing the same RDF data.
The semantic conversion scripts are in etl_scripts/OR. They are specialized SPARQL CONSTRUCT queries, that run in a OntoRefine instance and map tabular data to a predefined graph pattern.
The data pipeline is glue code to implement
Fetch> Conversion> GraphDB> (Validation, Elastic indexing)
.
It is a standalone Spring Boot application, which have the following components:
Simple layout of the application components.
Interaction flow between the services.
TODO M4
: Add Import and Validation flow
FTP Resource Downloaders
The service is responsible for retrieval of specified datasets from the SFTP. It servers as data provider for the automatic Conversion Service by retrieving the required datasets. The retrieval is done by process, which listens for changes in the FTP, more specifically upload of new dataset. When such event is detected, the service will trigger and make a copy of the file in a configured dataset store. It is possible to filter the trigger event by providing a matching pattern for the file names.
As addition to the automatic mode, the service supports manual invocation. It is convenient for testing or when another application/system want to plug into the processing pipeline.
HTTP Resource Downloaders
Similar to FTP Resource Downloader, this service provides datasets to the Conversion Service. However, unlike the other downloader, this one is not reactive. The process of retrieving the required datasets is by performing HTTP requests to specific REST API. The requests are performed at configurable fixed rate. The datasets that should be retrieved are specified by the request parameters, which are provided externally by configurations. This design allows flexibility and easy modifications, if such are necessary. It also provides the ability to change the scale of the scope of the data that the system is processing.
As the other one, this service exposes its functionality via REST endpoint, which can be invoked manually.
Conversion Service
The purpose of the Conversion Service is to transform the downloaded datasets to RDF data, which can be imported in GraphDB. Like the downloaders, this service has two aspects:
The automatic transformation process begins, when the application is started. If there are unprocessed files in the datasets store, it is picked and the transformations are applied. The transformations are predefined scripts in JSON format. When the conversion is successful, the result RDF data is stored in a file, which later is imported in GraphDB.
The transformation itself is done by using OntoRefine tool. It functionalities are invoked by the OntoRefine Service, which contains the required steps to process a single dataset.
Data Import Service
This service does the job of importing the RDF data in GraphDB and trigger the validation. Following the design of the other components, the import service will have manual and automatic aspects. Similar to the automatic conversion, the trigger of the import service is existence of a unprocessed RDF data file. If the import is successful, the file will be marked as imported and removed from the directory.
Transparency EKG (TEKG) dashboard application is a single page web application with analytical user interface that provides visualizations and validation reporting upon the transparency data that has been ingested, analyzed and validated in GraphDB, see DQA Dashboard.
Transparency EKG uses GraphDB's Elasticsearch connector to synchronize all relevant data in multiple Elasticsearch indices. This enables the dashboard to perform full text and faceted searches in order to construct visualizations as well as to limit down data requests to a single data source.
Refer to Elasticsearch GraphDB connector documentation for more information.
TEKG Dashboard application consists of two parts: the static HTML and CSS files and a server part that serves these static files and acts as an API proxy.
The server part acts as a "backend for front end" which proxies API requests from the web and constructs queries that are then sent to Elasticsearch. This server is implemented with NodeJS and Express framework. Checkout NodeJS and Express documentations for more information.
The web part is implemented with the Angular platform and Typescript. This is a modern choice of framework stack that helps designing and building single page applications (SPA). The source code is organized in web components grouped in Angular modules that are type safe and reusable throughout the application. The Angular platform comes with its own CLI tool which helps generate various web components and modules very easily. Checkout Angular documentation for more information.
The web part will proxy all of its requests down to the server part in order to avoid direct communications from the client to the Elasticsearch server. Queries will be constructed in the server part in order to shift away the complexity from the web.
For analytics visualizations, the TEKG dashboard application makes use of VEGA. This is a visualization grammar with vast options for chart types, transformations and interactions. TEKG Dashboard application will fetch data from ES for each analytic, transform it and pass it to VEGA for rendering. The design of the analytics visualizations is as follows:
The web page will have options for filtering the analytics data which will result in re-fetching it from ES.
The TEKG dashboard application will allow the user to browse and analyze validation reports that have been performed by the Semantic Data Validation Service. The validation visualizations will consist of:
For visualizing map data, the TEKG dashboard application will use Leaflet, a library for making interactive maps with OpenStreetMap data. It provides an easy to use API with a lot of options for configurations and extensions.
The dashboard application will have a wrapper component of Leaflet that can be embedded throughout the analytics to provide more context and insight of the data.
TEKG Dashboard is packaged as a Docker image to achieve portability, ease of deployment and scalability. It can be deployed as a simple Docker container (with Docker compose for example) or as a Kubernetes deployment.
We use Grafana to monitor the overall infrastructure and performance of the system and its services, primarily GraphDB and Ontotext Platform (Semantic Objects service).
Monitoring data is collected with various Telegraf plugins and then stored in the InfluxDB time series database.
V1 of the Energy Knowledge Graph is currently available as RDF graph and SPARQL endpoint.
The Graph consists of 116 million triples and covers the selected data items for a period of three full months as well as the data from the current month (2022-01 - 2022-04).
The following table summarizes the number of observations
(tr:DataObservation
) per Data Item:
dataItem | n_observetions |
---|---|
generation/ActualGenerationOutputPerGenerationUnit | 3969000 |
generation/AggregatedGenerationPerType | 2812002 |
balancing/AggregatedVolumes | 2003930 |
balancing/AggregatedVolumes_HOURLY | 829965 |
balancing/PricesOfActivatedBalancingEnergy | 708636 |
balancing/PricesOfActivatedBalancingEnergy_HOURLY | 347570 |
generation/CurrentGenerationForecastForWindAndSolar | 283136 |
outages/UnavailabilityOfProductionOrGenerationUnits | 79404 |
balancing/AggregatedVolumes_DAILY | 43351 |
balancing/PricesOfActivatedBalancingEnergy_DAILY | 17417 |
generation/InstalledGenerationCapacityComputed | 41 |
A number of sample queries are available on the GraphDB Workbench home page
Bellow is the ontology in Turtle format.
# @prefix trr: <https://transparency.ontotext.com/resource/> . # OMIT since this takes over all other prefixes
@prefix tr: <https://transparency.ontotext.com/resource/tr/> . # Ontology
@prefix eic: <https://transparency.ontotext.com/resource/eic/> . # EnergyResource with EIC
@prefix type: <https://transparency.ontotext.com/resource/type/> . # codelists
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix vann: <http://purl.org/vocab/vann/> .
tr: a owl:Ontology;
rdfs:label "Transparency Energy ontology";
rdfs:comment "Ontology for data from the ENTSOE Electricity Market Transparency portal";
rdfs:seeAlso <https://transparency.entsoe.eu/>, <https://transparency.ontotext.com/>;
dct:creator <https://ontotext.com/>, <mailto:vladimir.alexiev@ontotext.com>;
dct:created "2021-06-02"^^xsd:date;
dct:modified "2022-02-21"^^xsd:date;
owl:versionInfo "1.0";
vann:preferredNamespaceUri "https://transparency.ontotext.com/resource/tr/";
vann:preferredNamespacePrefix "tr".
#################### classes
tr:Area a rdfs:Class;
rdfs:subClassOf tr:EnergyResource;
rdfs:isDefinedBy tr: ;
rdfs:label "Area";
rdfs:comment "Area, as referenced in CSV files, described in REST API documentation and out of which resources are served by the REST API".
tr:CodeList a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Code List";
rdfs:comment "A code list (eg Message type, UnitOfMeasure, Asset type)".
tr:CodeValue a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Code Value";
rdfs:comment "Value in a code list".
tr:Country a rdfs:Class;
rdfs:subClassOf tr:EnergyResource;
rdfs:isDefinedBy tr: ;
rdfs:label "Country";
rdfs:comment "Country (member state)".
tr:DataDomain a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Data Domain";
rdfs:comment "Major area of transparency data".
tr:DataItem a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Data Item";
rdfs:comment "Data item (time series) of transparency data in a particular domain".
tr:DataObservation a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Data Observation";
rdfs:comment "Data Observation, having dataItem, date, dateUpdated and observation-specific fields".
tr:EicTypeValid a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "EIC Type Valid";
rdfs:comment "EIC types that are valid or invalid with the listed function".
tr:EnergyResource a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Energy Resource";
rdfs:comment "Energy resource or participant identified with EIC and having a function".
tr:FunctionValid a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Function Valid";
rdfs:comment "A valid function and a corresponding invalid (misspelt) function".
tr:GenerationUnit a rdfs:Class;
rdfs:subClassOf tr:EnergyResource;
rdfs:isDefinedBy tr: ;
rdfs:label "Generation Unit";
rdfs:comment "Generation Unit (generator) as described at the lower level of Installed Capacity of Production and Generation Units".
tr:Outage a rdfs:Class;
rdfs:subClassOf tr:DataObservation;
rdfs:isDefinedBy tr: ;
rdfs:label "Outage";
rdfs:comment "Outage (unavailability) of Production or Generation Unit".
tr:ProductionUnit a rdfs:Class;
rdfs:subClassOf tr:EnergyResource;
rdfs:isDefinedBy tr: ;
rdfs:label "Production Unit";
rdfs:comment "Production Unit (power plant) as described at the higher level of Installed Capacity of Production and Generation Units".
tr:ValidationCount a rdfs:Class;
rdfs:isDefinedBy tr: ;
rdfs:label "Validation Count";
rdfs:comment "Validation summary result, characterized by rule (shape), area and count".
#################### properties
tr:acerCode a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "ACER code";
rdfs:comment "Agency for Cooperation of Energy Regulators code of an energy participant";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string.
tr:actualConsumption a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "actual consumption";
rdfs:comment "Actual consumption of Production Unit due to technological consumption (MW)"; # or Area?
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:actualOutput a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "actual output";
rdfs:comment "Actual power output of a Production Unit or Area (MW)";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:appliesTo a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "applies to";
rdfs:comment "Whether this validation rule applies to 'Country' or 'Area' (used for sorting them into tables)";
rdfs:domain sh:Shape;
rdfs:range xsd:string.
tr:assetType a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "asset type";
rdfs:comment "Asset type of a Power System Resource";
rdfs:domain tr:EnergyResource, tr:DataObservation;
rdfs:range tr:CodeValue;
tr:xpath "MktPSRType/psrType".
tr:availableOutput a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "available output";
rdfs:comment "Available power output of Production or Generation Unit, reduced due to Outage (MW)";
rdfs:domain tr:Outage;
rdfs:range xsd:float.
tr:biddingZone a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "bidding zone";
rdfs:comment "Bidding Zone of this Energy Resource or Outage";
schema:domainIncludes tr:EnergyResource, tr:Outage;
rdfs:range tr:Area;
tr:xpath "biddingZone_Domain.mRID".
tr:codeList a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "code list";
rdfs:comment "List this code value is part of";
rdfs:domain tr:CodeValue;
rdfs:range tr:CodeList.
tr:controlArea a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "control area";
rdfs:comment "Control Area(s) of this Energy Resource or Outage";
schema:domainIncludes tr:EnergyResource, tr:Outage;
rdfs:range tr:Area;
tr:xpath "ControlArea_Domain/mRID".
tr:count a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "count";
rdfs:comment "Count of violations";
rdfs:domain tr:ValidationCount;
rdfs:range xsd:integer.
tr:countryCode a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "country code";
rdfs:comment "Country code of an energy resource or participant";
schema:domainIncludes tr:EnergyResource, sh:ValidationResult, tr:ValidationCount;
rdfs:range xsd:string;
tr:xpath "eICCode_MarketParticipant.streetAddress/townDetail/country".
tr:currency a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "currency";
rdfs:comment "Currency code corresponding to the 'price' field";
rdfs:domain tr:DataObservation;
rdfs:range xsd:string.
tr:dataDomain a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "data domain";
rdfs:comment "Domain of this data item";
rdfs:domain tr:DataItem;
rdfs:range tr:DataDomain.
tr:dataItem a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "data item";
rdfs:comment "Data item(s) that this observation (or validation rule) is (are) about";
schema:domainIncludes tr:DataObservation, sh:Shape;
rdfs:range tr:DataItem.
tr:date a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "date";
rdfs:domain tr:DataObservation;
rdfs:comment "Date of an observation";
rdfs:range xsd:dateTime.
tr:dateEnd a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "date end";
rdfs:domain tr:Outage;
rdfs:comment "Ending date of an outage";
rdfs:range xsd:dateTime.
tr:dateImplemented a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "date implemented";
rdfs:comment "Date when an Energy Resource was implemented";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:date;
tr:xpath "implementation_DateAndOrTime.date".
tr:dateStart a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "date start";
rdfs:domain tr:Outage;
rdfs:comment "Starting date of an outage";
rdfs:range xsd:dateTime.
tr:dateUpdated a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "date updated";
schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource, tr:DataObservation, tr:Outage;
rdfs:comment "Date when a record was last updated";
rdfs:range xsd:dateTime;
tr:xpath "lastRequest_DateAndOrTime.date".
tr:description a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "description";
rdfs:comment "A description of something";
schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
rdfs:range xsd:string.
tr:direction a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "direction";
rdfs:comment "Direction of energy flow of this balancing volume or price (Up, Down, Up and Down)";
rdfs:domain tr:DataObservation;
rdfs:range tr:CodeValue.
tr:displayArea a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "display area";
rdfs:comment "Area notation or country code where this validation result or count should be grouped, including the special values 'other' and 'none'";
schema:domainIncludes sh:ValidationResult, tr:ValidationCount;
rdfs:range xsd:string.
tr:duration a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "duration";
rdfs:comment "Duration (time quant) of this data observation";
rdfs:domain tr:DataObservation;
rdfs:range xsd:duration.
tr:eic a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "EIC";
rdfs:comment "Energy Identification Code of an energy resource or participant";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string.
tr:eicType a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "EIC type";
rdfs:comment "Type of Energy resource or participant derived from the third char of its EIC. It's a single-value field and is a 'supertype' of 'function'";
rdfs:domain tr:EnergyResource;
rdfs:range tr:CodeValue.
tr:eicTypeInvalid a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "EIC type invalid";
rdfs:comment "EIC type that is invalid with the listed function";
rdfs:domain tr:EicTypeValid;
rdfs:range tr:CodeValue.
tr:eicTypeValid a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "EIC type valid";
rdfs:comment "EIC type that is valid with the listed function";
rdfs:domain tr:EicTypeValid;
rdfs:range tr:CodeValue.
tr:ekgCheckDataQuality a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "TEKG checks data quality";
rdfs:comment "Whether the TEKG project checks the quality of data of this data item";
rdfs:domain tr:DataItem;
rdfs:range xsd:boolean.
tr:ekgImplementAnalytics a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "TEKG implements analytics";
rdfs:comment "Whether the TEKG project implements analytics over this data item";
rdfs:domain tr:DataItem;
rdfs:range xsd:boolean.
tr:energyResource a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "energy resource";
rdfs:comment "Energy resource (Production or Generation Unit) reported in this outage";
rdfs:domain tr:Outage;
rdfs:range tr:EnergyResource.
tr:fields a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "fields";
rdfs:comment "Fields that this validation rule is about (listed as a single string)";
rdfs:range sh:Shape;
rdfs:range xsd:string.
tr:fileName a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "file name";
rdfs:comment "Root file name of this data item";
rdfs:domain tr:DataItem;
rdfs:range xsd:string.
tr:fileType a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "file type";
rdfs:comment "File type of this data item as consumed by the TEKG project (XML or CSV)";
rdfs:domain tr:DataItem;
rdfs:range xsd:string.
tr:function a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "function";
rdfs:comment "Function(s) of an energy resource or participant, eg Generation Unit, Production Unit, Generation, Load, Connection Point, Internal Line, Tieline, Transformer, Substation, Trade Responsible Party, Balance Responsible Party, Production Responsible party, Consumption Responsible Party...";
rdfs:domain tr:EnergyResource, tr:EicTypeValid, tr:FunctionValid;
rdfs:range xsd:string.
tr:functionInvalid a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "function invalid";
rdfs:comment "Function that is invalid (misspelled)";
rdfs:domain tr:FunctionValid;
rdfs:range xsd:string.
tr:functionValid a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "function valid";
rdfs:comment "Function that is valid, or allowed for this EIC type";
rdfs:domain tr:CodeValue, tr:FunctionValid;
rdfs:range xsd:string.
tr:generationUnit a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "generation unit";
rdfs:comment "Generation Units of this Production Unit (semi-inverse of parentResource)";
rdfs:domain tr:ProductionUnit;
rdfs:range tr:GenerationUnit.
tr:hasProdUnits a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "has Production Units";
rdfs:comment "Whether the area has Production/Generation Units returned from the REST API";
rdfs:domain tr:Area;
rdfs:range xsd:boolean.
tr:highVoltageLimit a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "high voltage limit";
rdfs:comment "High voltage limit of Production Unit";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:float;
tr:xpath "production_PowerSystemResources.highVoltageLimit".
tr:inAPI a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "in API";
rdfs:comment "Whether the area is returned by the REST API";
rdfs:domain tr:Area;
rdfs:range xsd:boolean.
tr:inDoc a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "in Documentation";
rdfs:comment "Whether the area is decsribed in the REST API documentation";
rdfs:domain tr:Area;
rdfs:range xsd:boolean.
tr:inEIC a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "in EIC";
rdfs:comment "Whether the area is described in the EIC file (we've added the missing ones in eic-extra.ttl)";
rdfs:domain tr:Area;
rdfs:range xsd:boolean.
tr:inVies a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "in VIES";
rdfs:comment """Whether a Country or a particular Party's VAT Number is present in the EU VAT Information Exchange System (VIES).
No value is recorded for Party if its country is not covered by VIES""";
rdfs:domain tr:EnergyResource, tr:Country;
rdfs:range xsd:boolean.
tr:installedOutput a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "installed output";
rdfs:comment "Installed nominal power output of Production or Generation Unit (MW)";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:float;
tr:xpath "nominalP".
tr:isFreeReuse a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "is for free reuse";
rdfs:comment "Whether the data item can be reused freely";
rdfs:domain tr:DataItem;
rdfs:range xsd:boolean.
tr:isVatValid a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "is VAT valid";
rdfs:comment "Whether the Value Added Tax number is syntactically valid according to per-country patterns";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:boolean.
tr:iso2 a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "ISO alpha2";
rdfs:comment "2-letter alphabetical ISO code of this country, used for linking to external datasets";
rdfs:domain tr:Country;
rdfs:range xsd:string.
tr:iso3 a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "ISO alpha3";
rdfs:comment "3-letter alphabetical ISO code of this country, used for linking to external datasets";
rdfs:domain tr:Country;
rdfs:range xsd:string.
tr:link a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "link";
rdfs:comment "Link to page with information or direct download page (outside of portal)";
rdfs:domain tr:DataItem.
tr:linkDescription a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "link to description";
rdfs:comment "Link to detailed Knowledge Base description on portal";
rdfs:domain tr:DataItem;
rdfs:range xsd:string.
tr:linkPortal a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "link to portal";
rdfs:comment "Link to data serving page on portal";
rdfs:domain tr:DataItem.
tr:location a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "location";
rdfs:comment "Location of an energy resource (Production Unit)";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string;
tr:xpath "registeredResource.location.name", "generatingUnit_Location.name".
tr:marketBalanceArea a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "market balance area";
rdfs:comment "Market Balance Area of this balancing volume or price";
rdfs:domain tr:DataObservation;
rdfs:range tr:Area.
tr:marketProduct a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "market product";
rdfs:comment "Type of market product of this balancing volume or price (Standard, Specific, Local)";
rdfs:domain tr:DataObservation;
rdfs:range tr:CodeValue.
tr:mrid a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "message id";
rdfs:comment "Unique message id (mRID), used in the URL";
rdfs:domain tr:Outage;
rdfs:range xsd:string.
tr:name a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "name";
rdfs:comment "The name of something";
schema:domainIncludes tr:DataDomain, tr:DataItem, tr:CodeList, tr:CodeValue, tr:EnergyResource;
rdfs:range xsd:string;
tr:xpath "registeredResource.location.name". # TODO and more
tr:nameAlt a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "name alt";
rdfs:comment "Alternative name of a code value, as present in CSV files";
rdfs:domain tr:CodeValue;
rdfs:range xsd:string.
tr:netOutput a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "net output";
rdfs:comment "Net power output (actualOutput minus actualConsumption) of a Production Unit or Area (MW)";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:forecastedOutput a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "forecasted output";
rdfs:comment "Forecasted output of a Production Unit or Area (MW)";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:notation a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "notation";
rdfs:comment """Code of something, eg A01 (a code value), EFET (European Federation of Energy Traders), CB-RO-OP (Control Block Romania Operator).
Single value, coming from EIC or code list master data""";
schema:domainIncludes tr:CodeList, tr:CodeValue, tr:EnergyResource;
rdfs:range xsd:string;
tr:xpath "long_Names.name".
tr:notationAlt a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "notation alt";
rdfs:comment """Alternative code for an Energy Resource.
Potentially multiple values, coming from messages (Configuration_MarketDocument)""";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string;
tr:xpath "registeredResource.name".
tr:parentResource a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "parent resource";
rdfs:comment """Parent of this Energy Resource, eg:
- Control Block parentResource Coordination Center Zone
- Generation Unit parentResource Production Unit
""";
rdfs:domain tr:EnergyResource;
rdfs:range tr:EnergyResource.
tr:price a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "price";
rdfs:comment "Price reported in this data observation in 'currency' per MW/h (see also 'priceInEur')";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:priceCategory a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "price category";
rdfs:comment "Price category of this balancing price (Average or Marginal)";
rdfs:domain tr:DataObservation;
rdfs:range tr:CodeValue.
tr:priceInEur a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "price in EUR";
rdfs:comment "Price reported in this data observation in EUR per MW/h (see also 'price')";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:providerParticipant a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "provider participant";
rdfs:comment "Provider participant(s) of this Energy Resource";
rdfs:domain tr:EnergyResource;
rdfs:range tr:EnergyResource;
tr:xpath "Provider_MarketParticipant.mRID".
tr:reason a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "reason";
rdfs:comment "Motivation of an act (in whole Message or individual TimeSeries) in coded form";
schema:domainIncludes tr:Message, tr:TimeSeries;
rdfs:range tr:CodeValue.
tr:reasonText a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "reason text";
rdfs:comment "Motivation of an act as free text, when `reason` is A95 Complementary information";
schema:domainIncludes tr:Message, tr:TimeSeries;
rdfs:range xsd:string.
tr:regArticle a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "regulation article";
rdfs:comment "Article in Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets that describes the data item";
rdfs:seeAlso <https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0543>;
rdfs:domain tr:DataItem;
rdfs:range xsd:string.
tr:reserveType a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "reserve type";
rdfs:comment "Type of reserve resource of this balancing volume or price (FCR, aFRR, mFRR, RR)";
rdfs:domain tr:DataObservation;
rdfs:range tr:CodeValue.
tr:responsibleParticipant a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "responsible participant";
rdfs:comment "Participant that is responsible for this Energy Resource";
rdfs:domain tr:EnergyResource;
rdfs:range tr:EnergyResource;
tr:xpath "eICResponsible_MarketParticipant.mRID".
tr:schedulingArea a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "scheduling area";
rdfs:comment "Scheduling Area of this balancing volume or price";
rdfs:domain tr:DataObservation;
rdfs:range tr:Area.
tr:statusText a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "available output";
rdfs:comment "Latest status of an Outage: 'Active, Withdrawn, Canceled'";
rdfs:domain tr:Outage;
rdfs:range xsd:string.
tr:timeZone a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "time zone";
rdfs:domain tr:Outage;
rdfs:comment "Time zone code of an Outage";
rdfs:range xsd:string.
tr:typeText a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "available output";
rdfs:comment "Type of an Outage: 'Planned, Forced'";
rdfs:domain tr:Outage;
rdfs:range xsd:string.
tr:vatNumber a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "VAT number";
rdfs:comment "Value Added Tax number of an energy participant";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string.
tr:version a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "version";
rdfs:comment "Version of the message. Only the latest version(s) of a MRID are retained. Used in the URL";
rdfs:domain tr:Outage;
rdfs:range xsd:integer.
tr:viesAddress a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "VIES address";
rdfs:comment "Party address as returned by EU VIES (only if present in VIES)";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string.
tr:viesCheckDate a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "VIES check date";
rdfs:comment "Datetime when EU VIES check was performed";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:dateTime.
tr:viesName a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "VIES name";
rdfs:comment "Party name as returned by EU VIES (only if present in VIES)";
rdfs:domain tr:EnergyResource;
rdfs:range xsd:string.
tr:volume a owl:DatatypeProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "volume";
rdfs:comment "Volume offered, accepted, activated or unavailable (MW)";
rdfs:domain tr:DataObservation;
rdfs:range xsd:float.
tr:volumeCategory a owl:ObjectProperty;
rdfs:isDefinedBy tr: ;
rdfs:label "volume category";
rdfs:comment "Volume category of this balancing volume (offered, accepted, activated or unavailable)";
rdfs:domain tr:DataObservation;
rdfs:range tr:CodeValue.
tr:xpath a owl:DatatypeProperty;
rdfs:label "xpath";
rdfs:comment "xpath that carries XML data for an RDF property. TODO: also need namespace and enclosing elements?";
schema:domainIncludes owl:ObjectProperty, owl:DatatypeProperty; # rdfs:Class ?
rdfs:range xsd:string.