Generating data sets

At first import the package emipy and read the data base.
The programm stored the path to the project initialisation and automatically searches for the data there and loads it. You can aswell read explicit databases. For this, give the function read_db() the path in form of a String as an argument.
import emipy as ep

db = ep.read_db()
db.head()

FacilityReportID PollutantReleaseAndTransferReportID FacilityID NationalID ParentCompanyName FacilityName StreetName BuildingNumber City PostalCode ... PollutantName PollutantGroupCode PollutantGroupName PollutantCAS MethodBasisCode MethodBasisName TotalQuantity AccidentalQuantity UnitCode UnitName
0 1856 1 5763 1013410312 Lenzing AG Lenzing AG Werkstraße 1 NaN Lenzing 4860 ... Particulate matter (PM10) INORG Inorganic substances NaN E Estimated 68200.0 0.0 KGM kilogram
1 1856 1 5763 1013410312 Lenzing AG Lenzing AG Werkstraße 1 NaN Lenzing 4860 ... Sulphur oxides (SOx/SO2) OTHGAS Other gases NaN M Measured 420000.0 0.0 KGM kilogram
2 1856 1 5763 1013410312 Lenzing AG Lenzing AG Werkstraße 1 NaN Lenzing 4860 ... Carbon dioxide (CO2) GRHGAS Greenhouse gases 124-38-9 E Estimated 182000000.0 0.0 KGM kilogram
3 1856 1 5763 1013410312 Lenzing AG Lenzing AG Werkstraße 1 NaN Lenzing 4860 ... Nitrogen oxides (NOx/NO2) OTHGAS Other gases NaN M Measured 818000.0 0.0 KGM kilogram
4 1857 1 5764 1013410313 Lenzing AG Wasserreinhalteverband Lenzing - Lenzing AG Werkstraße 1 NaN Lenzing 4860 ... Zinc and compounds (as Zn) HEVMET Heavy metals NaN M Measured 3210.0 0.0 KGM kilogram

5 rows × 73 columns

A list of possible column names to filter for is displayed with:
db.columns

Index(['FacilityReportID', 'PollutantReleaseAndTransferReportID', 'FacilityID',
       'NationalID', 'ParentCompanyName', 'FacilityName', 'StreetName',
       'BuildingNumber', 'City', 'PostalCode', 'CountryCode', 'CountryName',
       'Lat', 'Long', 'RBDGeoCode', 'RBDGeoName', 'NUTSRegionGeoCode',
       'NUTSRegionGeoName', 'RBDSourceCode', 'RBDSourceName',
       'NUTSRegionSourceCode', 'NUTSRegionSourceName',
       'NACEMainEconomicActivityCode', 'NACEMainEconomicActivityName',
       'CompetentAuthorityName', 'CompetentAuthorityAddressStreetName',
       'CompetentAuthorityAddressBuildingNumber',
       'CompetentAuthorityAddressCity', 'CompetentAuthorityAddressPostalCode',
       'CompetentAuthorityAddressCountryCode',
       'CompetentAuthorityAddressCountryName',
       'CompetentAuthorityTelephoneCommunication',
       'CompetentAuthorityFaxCommunication',
       'CompetentAuthorityEmailCommunication',
       'CompetentAuthorityContactPersonName', 'ProductionVolumeProductName',
       'ProductionVolumeQuantity', 'ProductionVolumeUnitCode',
       'ProductionVolumeUnitName', 'TotalIPPCInstallationQuantity',
       'OperatingHours', 'TotalEmployeeQuantity', 'WebsiteCommunication',
       'PublicInformation', 'ConfidentialIndicator',
       'ConfidentialityReasonCode', 'ConfidentialityReasonName',
       'ProtectVoluntaryData', 'MainIASectorCode', 'MainIASectorName',
       'MainIAActivityCode', 'MainIAActivityName', 'MainIASubActivityCode',
       'MainIASubActivityName', 'ReportingYear', 'CoordinateSystemCode',
       'CoordinateSystemName', 'CdrReleased', 'Published',
       'PollutantReleaseID', 'ReleaseMediumCode', 'ReleaseMediumName',
       'PollutantCode', 'PollutantName', 'PollutantGroupCode',
       'PollutantGroupName', 'PollutantCAS', 'MethodBasisCode',
       'MethodBasisName', 'TotalQuantity', 'AccidentalQuantity', 'UnitCode',
       'UnitName'],
      dtype='object')
If you are interested in e.g. the countries that occur in your database you can receive a list with the get_Countrylist() function. There are more get_xy() functions to access the information in your data base. For more information take a look at the processdata module description.
ep.get_CountryList(db)

['Austria',
 'Belgium',
 'Cyprus',
 'Czech Republic',
 'Germany',
 'Denmark',
 'Estonia',
 'Spain',
 'Finland',
 'France',
 'Greece',
 'Hungary',
 'Ireland',
 'Italy',
 'Lithuania',
 'Luxembourg',
 'Latvia',
 'Malta',
 'Netherlands',
 'Norway',
 'Poland',
 'Portugal',
 'Sweden',
 'Slovenia',
 'Slovakia',
 'United Kingdom',
 'Iceland',
 'Serbia',
 'Romania',
 'Bulgaria',
 'Switzerland',
 'Croatia']
The actual filtering happens with the function f_db(). You have to specifiy the database that you want to filter and the column names and column values that you want to filter for.

Note

The following lines only create the DataFrame and do not display it. To display the data table, execute e.g. data1.head().
For a better overview, you can use data = ep.row_reduction(db). The new DataFrame is reduced to a list of columns. This list can be adjusted.
Let’s filter for pollution in Germany:
data1 = ep.f_db(db, CountryName='Germany')
If you want to filter for multiple values in one column you have to insert a list.
data2 = ep.f_db(db, CountryName=['Germany', 'Switzerland', 'Austria'])
You can filter for multiple columns at the same time:
CountryName = ['Germany', 'Austria', 'Switzerland']
ReportingYear = [2014, 2015, 2016,2017]
PollutantName = ['Carbon dioxide (CO2)', 'Methane (CH4)']

data3 = ep.f_db(db, CountryName=CountryName, ReportingYear=ReportingYear, PollutantName=PollutantName)

Note

Take into account that numbers are not from type string and therefore do not need quote markers around them.

For the precise values use the get_xy() function or alternativley, you can take a look at the parameter table.
You can also filter step by step. For this you would have to insert the filtered database into the filter function.

You can adjust two more arguments in f_db().
If you want to take a look at the continent Europe, you have to exclude Exclaves that belong to European countries, like French Guiana.
data4 = ep.f_db(db, ExclaveExclude=True)
If you put ReturnUnknown on True the function returns a data table, which contains all entries that would be sorted out in the filter process but just do not possess enough information to pass the filter. If this table is empty, then it is a good sign.
data5 = ep.f_db(db, CountryName='Germany', ReturnUnknown=True)
Now you can generate your own data set of interest with a few lines of code. Since db is a DataFrame object, you can use all pandas functions as well, to personalize your data generation.