Sources

Sources #

The Sources Components are the first building blocks of a Modifier. While creating a Modifier, you need to first provide a source from where the data will be extracted. Without a working Source, the Modifier will not be valid.

Predefined Sources can only be defined by the system administrator.
Working with large data sources consumes a lot of RAM!

Oracle Database Source #

Component for extracting data from an Oracle Database. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Driver - the driver needed to access the data (Predefined),
  3. DB Host - the address where the database is located,
  4. Port - the port of the address (Predefined),
  5. Username - the username with access to the database (Predefined),
  6. Password - the password needed to access the database (Predefined),
  7. Service name - the alias of the database; (use either Service name or SID),
  8. SID - the unique name of the database; (use either Service name or SID),
  9. Table - the table from which we want to extract data (use either Table or SQL query),
  10. SQL query - the query to use when accessing the database (use either Table or SQL query),

Postgres Database Source #

Component for extracting data from a Postgres Database. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. DB Host - the address where the database is located,
  3. Port - the port of the address (Predefined),
  4. DB Name - name of the database,
  5. Username - the username with access to the database (Predefined),
  6. Password - the password needed to access the database (Predefined),
  7. Table - the table from which we want to extract data (use either Table or SQL query),
  8. SQL query - the query to use when accessing the database (use either Table or SQL query),

MySQL Database Source #

Component for extracting data from a MySQL Database. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. DB Host - the address where the database is located,
  3. Port - the port of the address (Predefined),
  4. DB Name - name of the database,
  5. Username - the username with access to the database (Predefined),
  6. Password - the password needed to access the database (Predefined),
  7. Table - the table from which we want to extract data (use either Table or SQL query),
  8. SQL query - the query to use when accessing the database (use either Table or SQL query),

Microsoft SQL Server Database Source #

Component for extracting data from a Microsoft SQL Server Database. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. DB Host - the address where the database is located,
  3. Port - the port of the address (Predefined),
  4. DB Name - name of the database,
  5. Username - the username with access to the database (Predefined),
  6. Password - the password needed to access the database (Predefined),
  7. Table - the table from which we want to extract data (use either Table or SQL query),
  8. SQL query - the query to use when accessing the database (use either Table or SQL query),

ODBC Database Source #

Component for extracting data from an ODBC Database. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Driver - the driver needed to access the data (Predefined),
  3. ODBC connection string - the string needed to make a connection to the database,

FusionRegistry10 Source #

Component for extracting data from a Fusion Registry Dataflow. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Fusion URL - the URL address of the Fusion Registry instance (Predefined),
  3. Username - the username with access to Fusion Registry dataflows (Predefined),
  4. Password - the password needed to access Fusion Registry dataflows (Predefined),
  5. Dataflow - the Dataflow from which to extract data,
  6. Dataflow filters - additional filters to use for filtering data before extracting,
  7. Start time period - the period from which to start extracting data; an additional filter,
  8. End time period - the period to which to stop extracting data; an additional filter,

Script Source #

This Source can only be modified by the system administrator!

Component for extracting data from any source. It is for advanced users to create a Python script that will return a Python Pandas DataFrame. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Script code - the code to use for extracting data; the code is a function that must return a Python Pandas DataFrame (Predefined),

Available Packages #

Here is a list of available packages to use in your Script:

  • all packages in the Python Standard Library,
  • the pandas manipulation tool,
  • the dateutil module for date and time manipulations,
  • the NumPy fundamental package for scientific computing with Python,
  • the OpenPyXL library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files,
  • the xlrd library to read old xls Excel files,
  • the odfpy library to read and write OpenDocument files (i.e. ods),
  • the pandaSDMX library for SDMX-ML data manipulation,
  • the SQLAlchemy Python SQL toolkit and Object Relational Mapper,
  • the Requests library for sending HTTP requests,
  • the Beautiful Soup library for pulling data out of HTML and XML files,
  • the pyodbc module for accessing ODBC databases,
  • the wbgapi module for accessing the World Bank’s economic data,
  • the pandas-datareader library for fetching financial data into pandas DataFrames.
  • the html5lib pure-python library for parsing HTML.

File Based Sources #

The Sources below are based on files that are uploaded to the Resources of ETL data_snake or files situated on any place on the Internet. If you want to use local files, then you must upload them to the Resources folder. You can do so by clicking the button. A new window will open, where you must choose a specific folder in the Resources directory (1) and choose a file to load there by clicking the Browse button (2).

This will open a file upload popup, where you must choose one file from your local files. When you choose a file, the upload process will start. You can cancel the process by clicking the icon.

Upon completion, you will see a message informing you that you can Save the file or reset the form to cancel the upload and choose another file. You can click the Save and add next file button to save the file and reset the form to upload another file. Clicking the Save button will close the modal. Clicking any of the SAVE buttons will fill the Path field with the newly uploaded file.

Excel Source #

Component for extracting data from an Excel file (xls, ods and xlsx). This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Path/URL - the filepath in the Resources folder or URL address of the file,
  3. File location - whether the Excel file is on the computer or on the internet,
  4. Range from - the cell from which to start extracting the Excel file,
  5. Range to - the cell on which the extracting of the Excel file should stop,
  6. Header row - the row number where the column headers are located, relative to the entire file; should not be filled when Use header is unchecked,
  7. Sheet - the name of the sheet to be extracted,
  8. Use header - whether there are column headers in the file.

CSV Source #

Component for extracting data from a CSV file. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Path/URL - the filepath in the Resources folder or URL address of the file,
  3. File location - whether the Excel file is on the computer or on the internet,
  4. Delimiter - symbol used to divide the values in the CSV file (Predefined),
  5. Header row - the line where the headers are located (Predefined).

SDMX-ML Source #

Component for extracting data from a SDMX-ML file. This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Path/URL - the filepath in the Resources folder or URL address of the file,
  3. File location - whether the SDMX-ML file is on the computer or on the internet.

Microsoft Access Source #

Component for extracting data from a Microsoft Access Database (accdb and mdb files). This Source needs the following fields to be provided:

  1. Name - a name of the component on the list of Sources,
  2. Path/URL - the filepath in the Resources folder or URL address of the file,
  3. File location - whether the Excel file is on the computer or on the internet,
  4. Table - table from which the data will be extracted.