Special Features

There are several functions that enhance the usage of emipy. In this tutorial we take a look at these functions and work threw their use cases.

Change root path

At first let’s take a look at the configuration options of emipy. When initialise an emipy project, we defined a path to the root of the project. This path is stored in a config file and is used when the data is loaded in a session or auto exported to the export folder.
When you use emipy in another environment than the one in which you initialised emipy, your root path is not set to the projects folder. If you want to continue your work in this project, or simply don’t want to download all data again, you can adapt this path.
import emipy as ep

ep.change_RootPath(r'Your\individual\path\to\your\project')

Downloading additional map data

During the initialization, emipy downloads map data from Eurostat. There is not just one map, but a lot of different ways to visualize the countries.
Emipy downloads the predefined set with resolution factor 1:10 Million, but you can download additional map data if wanted. For the download you can choose from the resolution (1:1,3,10,20,60 Million) and emipy downloads the map data for all publication years, projections and NUTS-LVL into the projects MappingData folder. With read_mb() you can make further choices of the way, your map is diplayed. For more details see Data parameters.
The resolution can be an Integer or a list of Integers which values has to be the resolution factor. You can put the parameter clear to True, to clear the MappingData folder before downloading. This prevents doublication of data and reduces the memory size.
example_res =[3,60]
ep.download_MapData(r'The\path\to\the\root\of\your\project', resolution=example_res, clear=True)

Change Units

When working with the pollution data, you might want to change the unit of the emission. The function change_unit() reads the current unit for all entries and recalculates the emission value to the new unit and stores the new unit code in the DataFrame.
db = ep.read_db()
data1 = ep.f_db(db,CountryName='Germany')
data2 = ep.change_unit(data1, unit='megaton')

Data Table Adaption

For the export you might not need all of the 73 columns and want to increase your overview. You can exclude columns and reduce the columns to those which you are interested in.
The function row_reduction() uses information from the config file to determine which columns are defined as columns of interest. You can change these configuration with change_ColumnsOfInterest(). You can use the paramter total to change the complete list, add to add column names or sub to substract from present column names. If you put the parameter reset on True, you reset the list of column names to the initial state.
data3 = ep.row_reduction(data1)
total=['CountryCode', 'CountryName']
ep.change_ColumnsOfInterest(total=total, add=None, sub=None, reset=False)
data4 = ep.row_reduction(data3)
ep.change_ColumnsOfInterest(total=None, add='Lat', sub=None, reset=False)
data5 = ep.row_reduction(data3)
ep.change_ColumnsOfInterest(total=None, add=None, sub='CountryCode', reset=False)
data6 = ep.row_reduction(data3)
ep.change_ColumnsOfInterest(total=None, add=None, sub=None, reset=True)
data7 = ep.row_reduction(data3)
There is also an option to rename the columns at large scale. The working principle is the same as the column reduction but you have to insert dicts instead of strings or lists.
addition={'Lat': 'Latitude'}
ep.change_RenameDict(total=None, add=addition, sub=None, reset=False)
data8 = ep.rename_columns(data3)

ep.change_RenameDict(total=None, add=None, sub=None, reset=True)
data9 = ep.rename_columns(data3)

Emission information

So far you have produced filtered data bases and plots of these data bases. But perhaps you want to get the information of your plot in form of a data table.
data10 = ep.get_PollutantVolume(data2, FirstOrder='ReportingYear')
data11 = ep.get_PollutantVolumeRel(data2, FirstOrder='ReportingYear')
data12 = ep.get_PollutantVolumeChange(data2, FirstOrder='ReportingYear')
In comparison to your data base, this table has summed up all emissions for your order parameter. The usage of the order parameter is the same as in the plot functions.

NACE-Code selection

The economical classification of the entries with the NACE-Code is not consistent over time. The European Union performed a revision of the NACE-Classification NACE 1.1 to NACE 2. In consequence, the entries for the years 2001 and 2004 are encoded in the old classification, while the newer entries are encoded by NACE 2.
Emipy provides a function that performes an transition of the old codes to the new, based on the transition tables, provided by Eurostat.
db2 = ep.read_db()
db = ep.perform_NACETransition(db2)
The transition does not allow an unique assignment of new codes, which is why the new codes may be stored as list of multiple codes. In a consequence, entries might pass your filter, but are not truly part of your requested data. You might want to check for these entries, if they really are part of your economic field.
There exist entries in the old data, that have 2 different NACE 1.1 Code, which have no assignment in the transition tables. For these cases we decided for a assignment.
The NACECode 27.35 is translated to 24.10 since the NACEMainEconomicActivityName of both sounds very similar. 74.84 is translated to 59.20, 63.99, 74.10, 74.90, 77.40, 82.30, 82.91, 82.99. Here the NACEMainEconomicActivityName is the same as for 74.87 and we copied the transition from this NACECode.
You can finde the NACE-Codes in the NACE Rev.2 starting at page 63. Choosing the right code enables you to filter for your request. NACEMainEconomicActivityCode needs a string with the complete NACE Code like ‘01.46’ or a list of these Codes.
data13 = ep.f_db(db, NACEMainEconomicActivityCode='35.11')
Some groups of NACE codes are stored in the config file. You can access them with get_NACECode_filter(). If the parameter specify is None, which it is by default, you receive a list of dictionaries which have the NACE Codes as list corresponding to the key name. You can put specify to on of the keys to receive the value, the list of NACE Codes.
print(ep.get_NACECode_filter())
NACECode = ep.get_NACECode_filter(specify = 'animal production')
data14 = ep.f_db(db2, NACEMainEconomicActivityCode=NACECode)
You can create your own NACE-Code lists with change_NACECode_filter(). This works very much like change_RenameDict(). You can add and subtract single key/value pairs, or replace the complete dictionary. For the right syntax, make sure your codes are 5 characters long and seperated by a comma.
ep.change_NACECode_filter(add={'metalmanufaction':'24.51,24.52,24.53,24.54'})
ep.change_NACECode_filter(sub={'metalmanufaction':'24.51,24.52,24.53,24.54'})

Impurity analysis

The emission of specific pollutants comes with emission of other pollutants. In consequence you do not have pure emissions, but impurities to your target pollutant. To analyse these, emipy provides the functions get_ImpurityVolume() and plot_ImpurityVolume().
You can specify your analysis with a few parameters. At first you specify your data that is to analyse with calling db. Then you name the target pollutant, which is the impured one. For the plot function you also have to specify the impurity molecule that you are looking for.
Additional paramters for the get_ function are FirstOrder, ReleaseMediumName, absolute, FacilityFocus, impurity and statistics. With FirstOrder, you specify the column at which your data is sorted. The default is FacilityReportID, since it is very intuitive to look for the impurities of specific facilities. Nontheless, you could also choose for example NUTSRegionGeoCode, to make a region analysis, rather than a facility analysis.
data = ep.f_db(db, CountryName='Germany')
data16 = ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', FirstOrder='NUTSRegionGeoCode')
ep.plot_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', impurity='Carbon monoxide (CO)', FirstOrder='NUTSRegionGeoCode')
You can change the ReleaseMediumName to Water or Soil, if your target pollutant is emitted in another medium than air. The function returns the emission value of your impurity in relation to the emission of your target. If you want to get the absolute value, you can change the absolute parameter to True.
data17 = ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', ReleaseMediumName='Air', absolute=True)
If your FirstOrder is something else than FacilityReportID, the FacilityFocus parameter becomes interesting. If this parameter is True, only impurities emitted in facilities that aswell emit your target pollutant are considered. You can put the parameter to False, to turn this feature off and analyse the impurities of all facilities in your Order parameter.
data18 = ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', FirstOrder='NUTSRegionGeoCode', FacilityFocus=False)
ep.plot_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', impurity='Carbon monoxide (CO)', FirstOrder='NUTSRegionGeoCode', FacilityFocus=False)
If you have not specified the impurity, you will get a table with all present impurities. You can specify your impurity, to receive only your impurity of interest.
data19 = ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)')
data20 = ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', impurity='Carbon monoxide (CO)')
You can also set the parameter statistics to False or True to either simply get or plot your impurity values or to get or plot the statistics of these. The default is False.
ep.get_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', impurity='Carbon monoxide (CO)')
ep.plot_ImpurityVolume(db=data, target='Carbon dioxide (CO2)', impurity='Carbon monoxide (CO)', statistics=True)
When creating a data table with get_ImpurityVolume() you will often have NaN as entries. This happens because different facilities have different impurities. So a NaN value means, that there is no impurity of this pollutant type listet in the data base. This does not mean, that there is definitly no impurity. Impurities can be below a certain threshold value and therefore not listet in the E-PRTR.
If statistics is True, plot_ImpurityVolume() automatically ignores NaN values. When you plot your simple impurity values, you can set the parameter PlotNA to False. Then the na values are not plotted.

The following example illustrates emipy’s functionality for impurity analysis.

db = ep.read_db()

data1 = ep.f_db(db, ReportingYear=2015, CountryName='France', NUTSRegionGeoCode='FR30',
                PollutantName='Carbon dioxide (CO2)')
data1 = ep.change_unit(data1, unit='megaton')
data2 = ep.f_db(db, ReportingYear=2015, CountryName='France', NUTSRegionGeoCode='FR30')

testdata=ep.get_PollutantVolume(data1, FirstOrder='FacilityReportID').sort_values('TotalQuantity', ascending=False)
testdata2=ep.get_ImpurityVolume(data2, target='Carbon dioxide (CO2)', impurity='Nitrogen oxides (NOx/NO2)', absolute=True).sort_values('TotalQuantity', ascending=False)
testdata2.loc[:,'Nitrogen oxides (NOx/NO2)'] /= 1000000

testdata3=ep.get_ImpurityVolume(data2, target='Carbon dioxide (CO2)', impurity='Nitrogen oxides (NOx/NO2)').sort_values('TotalQuantity', ascending=False)
testdata3.loc[:,'Nitrogen oxides (NOx/NO2)'] *= 1000
testdata4=ep.get_ImpurityVolume(data2, target='Carbon dioxide (CO2)', impurity='Nitrogen oxides (NOx/NO2)', statistics=True)
testdata4.loc[:,'Nitrogen oxides (NOx/NO2)'] *= 1000
testdata4 = testdata4.drop('count')

fig3, ax = plt.subplots(2, 2)

testdata.plot(x='FacilityReportID', y='TotalQuantity', ax=ax[0, 0], kind='bar')
testdata2.plot(x='FacilityReportID', y='Nitrogen oxides (NOx/NO2)', ax=ax[0,1], kind='bar')
testdata3.plot(x='FacilityReportID', y='Nitrogen oxides (NOx/NO2)', ax=ax[1,0], kind='bar')
testdata4.plot(y='Nitrogen oxides (NOx/NO2)', ax=ax[1,1], kind='bar', rot=0)

plt.tight_layout()
ep.export_fig(fig3, filename='impurity_analysis.pdf')
Tut5pic1

Calliope Export

The following code generates csv and yaml files for the use with the energy modelling framework Calliope.
The csv file contains information about the supply constraint of the carbon dioxide source
db = emipy.read_db()
source = f_db(db, CountryName='Germany', ReportingYear=2017, PollutantCode='CO2', City='Dueren')
emipy.export_calliope(source)
The generated yaml file contains the technology definition. In the default setting the costs for \(CO_2\) provision are set to 70€ (which can be specified using the sc variable).
3519569:
techs:
    co2_supply:
        essentials:
            name: CO2 Supply
            color: '#0b95ef'
            parent: supply
            carrier: co2
        constraints:
            resource: file=emipy2calliope.csv:3519569
            energy_cap_max: 20776.255707762557
            lifetime: 1
        costs:
            monetary:
                interest_rate: 0
                om_prod: 0.07
coordinates:
    lat: 50.776516546
    lon: 6.48949128038