Generating data sets¶
At first import the package emipy and read the data base.
The programm stored the path to the project initialisation and automatically searches for the data there and loads it. You can aswell read explicit databases. For this, give the function read_db() the path in form of a String as an argument.
import emipy as ep
db = ep.read_db()
db.head()
FacilityReportID | PollutantReleaseAndTransferReportID | FacilityID | NationalID | ParentCompanyName | FacilityName | StreetName | BuildingNumber | City | PostalCode | ... | PollutantName | PollutantGroupCode | PollutantGroupName | PollutantCAS | MethodBasisCode | MethodBasisName | TotalQuantity | AccidentalQuantity | UnitCode | UnitName | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1856 | 1 | 5763 | 1013410312 | Lenzing AG | Lenzing AG | Werkstraße 1 | NaN | Lenzing | 4860 | ... | Particulate matter (PM10) | INORG | Inorganic substances | NaN | E | Estimated | 68200.0 | 0.0 | KGM | kilogram |
1 | 1856 | 1 | 5763 | 1013410312 | Lenzing AG | Lenzing AG | Werkstraße 1 | NaN | Lenzing | 4860 | ... | Sulphur oxides (SOx/SO2) | OTHGAS | Other gases | NaN | M | Measured | 420000.0 | 0.0 | KGM | kilogram |
2 | 1856 | 1 | 5763 | 1013410312 | Lenzing AG | Lenzing AG | Werkstraße 1 | NaN | Lenzing | 4860 | ... | Carbon dioxide (CO2) | GRHGAS | Greenhouse gases | 124-38-9 | E | Estimated | 182000000.0 | 0.0 | KGM | kilogram |
3 | 1856 | 1 | 5763 | 1013410312 | Lenzing AG | Lenzing AG | Werkstraße 1 | NaN | Lenzing | 4860 | ... | Nitrogen oxides (NOx/NO2) | OTHGAS | Other gases | NaN | M | Measured | 818000.0 | 0.0 | KGM | kilogram |
4 | 1857 | 1 | 5764 | 1013410313 | Lenzing AG | Wasserreinhalteverband Lenzing - Lenzing AG | Werkstraße 1 | NaN | Lenzing | 4860 | ... | Zinc and compounds (as Zn) | HEVMET | Heavy metals | NaN | M | Measured | 3210.0 | 0.0 | KGM | kilogram |
5 rows × 73 columns
A list of possible column names to filter for is displayed with:
db.columns
Index(['FacilityReportID', 'PollutantReleaseAndTransferReportID', 'FacilityID',
'NationalID', 'ParentCompanyName', 'FacilityName', 'StreetName',
'BuildingNumber', 'City', 'PostalCode', 'CountryCode', 'CountryName',
'Lat', 'Long', 'RBDGeoCode', 'RBDGeoName', 'NUTSRegionGeoCode',
'NUTSRegionGeoName', 'RBDSourceCode', 'RBDSourceName',
'NUTSRegionSourceCode', 'NUTSRegionSourceName',
'NACEMainEconomicActivityCode', 'NACEMainEconomicActivityName',
'CompetentAuthorityName', 'CompetentAuthorityAddressStreetName',
'CompetentAuthorityAddressBuildingNumber',
'CompetentAuthorityAddressCity', 'CompetentAuthorityAddressPostalCode',
'CompetentAuthorityAddressCountryCode',
'CompetentAuthorityAddressCountryName',
'CompetentAuthorityTelephoneCommunication',
'CompetentAuthorityFaxCommunication',
'CompetentAuthorityEmailCommunication',
'CompetentAuthorityContactPersonName', 'ProductionVolumeProductName',
'ProductionVolumeQuantity', 'ProductionVolumeUnitCode',
'ProductionVolumeUnitName', 'TotalIPPCInstallationQuantity',
'OperatingHours', 'TotalEmployeeQuantity', 'WebsiteCommunication',
'PublicInformation', 'ConfidentialIndicator',
'ConfidentialityReasonCode', 'ConfidentialityReasonName',
'ProtectVoluntaryData', 'MainIASectorCode', 'MainIASectorName',
'MainIAActivityCode', 'MainIAActivityName', 'MainIASubActivityCode',
'MainIASubActivityName', 'ReportingYear', 'CoordinateSystemCode',
'CoordinateSystemName', 'CdrReleased', 'Published',
'PollutantReleaseID', 'ReleaseMediumCode', 'ReleaseMediumName',
'PollutantCode', 'PollutantName', 'PollutantGroupCode',
'PollutantGroupName', 'PollutantCAS', 'MethodBasisCode',
'MethodBasisName', 'TotalQuantity', 'AccidentalQuantity', 'UnitCode',
'UnitName'],
dtype='object')
If you are interested in e.g. the countries that occur in your database you can receive a list with the get_Countrylist() function. There are more get_xy() functions to access the information in your data base. For more information take a look at the processdata module description.
ep.get_CountryList(db)
['Austria',
'Belgium',
'Cyprus',
'Czech Republic',
'Germany',
'Denmark',
'Estonia',
'Spain',
'Finland',
'France',
'Greece',
'Hungary',
'Ireland',
'Italy',
'Lithuania',
'Luxembourg',
'Latvia',
'Malta',
'Netherlands',
'Norway',
'Poland',
'Portugal',
'Sweden',
'Slovenia',
'Slovakia',
'United Kingdom',
'Iceland',
'Serbia',
'Romania',
'Bulgaria',
'Switzerland',
'Croatia']
The actual filtering happens with the function f_db(). You have to specifiy the database that you want to filter and the column names and column values that you want to filter for.
Note
The following lines only create the DataFrame and do not display it. To display the data table, execute e.g. data1.head().
For a better overview, you can use data = ep.row_reduction(db). The new DataFrame is reduced to a list of columns. This list can be adjusted.
Let’s filter for pollution in Germany:
data1 = ep.f_db(db, CountryName='Germany')
If you want to filter for multiple values in one column you have to insert a list.
data2 = ep.f_db(db, CountryName=['Germany', 'Switzerland', 'Austria'])
You can filter for multiple columns at the same time:
CountryName = ['Germany', 'Austria', 'Switzerland']
ReportingYear = [2014, 2015, 2016,2017]
PollutantName = ['Carbon dioxide (CO2)', 'Methane (CH4)']
data3 = ep.f_db(db, CountryName=CountryName, ReportingYear=ReportingYear, PollutantName=PollutantName)
Note
Take into account that numbers are not from type string and therefore do not need quote markers around them.
For the precise values use the get_xy() function or alternativley, you can take a look at the parameter table.
You can also filter step by step. For this you would have to insert the filtered database into the filter function.
You can adjust two more arguments in f_db().
If you want to take a look at the continent Europe, you have to exclude Exclaves that belong to European countries, like French Guiana.
data4 = ep.f_db(db, ExclaveExclude=True)
If you put ReturnUnknown on True the function returns a data table, which contains all entries that would be sorted out in the filter process but just do not possess enough information to pass the filter. If this table is empty, then it is a good sign.
data5 = ep.f_db(db, CountryName='Germany', ReturnUnknown=True)
Now you can generate your own data set of interest with a few lines of code. Since db is a DataFrame object, you can use all pandas functions as well, to personalize your data generation.