Skip to main content

Locally Setting Up Scrapers

This guide details how to set up a scraper locally. To illustrate this, we will work with an instance of the Sentinel Image Scraper, which uses the SentinelHub API to scrape images for a given location and date.

Prerequisites

1. Setting Up Docker

Before using Docker, you need to have it installed on your system. Follow these steps to ensure Docker is installed:

For Windows and Mac:

  1. Download Docker Desktop:

  2. Install Docker Desktop:

    • Run the downloaded installer and follow the on-screen instructions to complete the installation.
  3. Start Docker Desktop:

    • After installation, start Docker Desktop from your applications menu. Docker will start running in the background.
  4. Verify Installation:

    • Open a terminal or command prompt and run the following command to check if Docker is installed correctly:
      docker --version
    • You should see the Docker version information if it is installed correctly.

For Linux:

  1. Update Your Package Index:

    • Open a terminal and run:
      sudo apt-get update
  2. Install Required Packages:

    • Run:
      sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
  3. Add Docker’s Official GPG Key:

    • Run:
      curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  4. Add Docker Repository:

    • Run:
      sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  5. Install Docker Engine:

    • Run:
      sudo apt-get update
      sudo apt-get install docker-ce
  6. Start Docker:

    • Run:
      sudo systemctl start docker
  7. Verify Installation:

    • Run:
      docker --version
    • You should see the Docker version information if it is installed correctly.

By following these steps, you will ensure that Docker is installed and running on your system. Ensure docker is installed and running succesfully before proceeding with the next steps

2. Ensure you have a running instance of Kernel-Planckster

To do this, please refer to the Kernel Planckster Readme for help.

3. Obtain a client ID and client secret from Sentinel Hub

To do this, please refer to the official Sentinel Hub documentation.

Optional: Create a virtual environment to avoid any package version conflicts

python3 -m venv .venv
source .venv/bin/activate
  • python3 -m venv: This command uses the venv module in Python to create a virtual environment.
  • .venv: This is the name of the directory where the virtual environment will be created. You can name it anything, but .venv is a common convention.

Setup Instructions

1. Clone the Repository

git clone https://github.com/dream-aim-deliver/mpi-sda-sentinel.git
cd mpi-sda-sentinel
  • git clone https://github.com/dream-aim-deliver/mpi-sda-sentinel.git:

    • git clone: This command is used to create a local copy of a remote repository.

    • https://github.com/dream-aim-deliver/mpi-sda-sentinel.git: This is the URL of the remote Git repository you want to clone. It points to the repository hosted on GitHub.

    • When you run this command, Git downloads the entire repository (including its history and all files) to the directory on your local machine from which you ran the clone command. The new directory will have the same name as the repository (mpi-sda-sentinel).

  • cd mpi-sda-sentinel:

    • cd: This command is used to change the current directory in your terminal or command prompt.
    • mpi-sda-sentinel: This should be the directory name you want to navigate into. Verify the correct directory name after cloning.

In summary, these commands clone a GitHub repository to your local machine and then change into the directory of the cloned repository.

2. Prepare Environment variables

Copy the env.template file to create a .env file:

cp .env.template .env

Fill in the environment variables in the .env file:

sh_client_id={ENTER THE CLIENT ID FROM Sentinel Hub}
sh_client_secret={ENTER CLIENT SECRET FROM Sentinel Hub}
HOST={THE HOSTNAME OF THE FASTAPI APP}
PORT={THE PORT OF THE FASTAPI APP}

3. Run the Docker Container

To build and run the Docker Container, execute the following script:

./run.sh

troubleshoot common error: The run.sh file has the following configurations

docker run --name mpi-satellite \
--rm \
-e "HOST=0.0.0.0" \
-e "PORT=8000" \
-e "sh_client_id= CLIENT_ID" \
-e "sh_client_secret= CLIENT_SECRET" \
-p "8000:8000" \
mpi-satellite

change the PORT to any other available port(e.g. 8001) and the -p flag correspondingly ("8001:8001") , if any port conflict occurs.

4. Run the Demo

To test the setup , you can run the demo.sh file, ensure all the below fields are filled correctly before running the script-

  • Open the demo.sh file:
python sentinel_scraper.py --start_date=2023-8-8 --end_date=2023-8-30 \
--long_left=-156.708984 --lat_up=20.759645 --long_right=-156.299744 --lat_down=20.955027 --log-level="INFO" \
--kp_auth_token test123 --kp_host localhost --kp_port 8000 --kp_scheme http \
--sentinel_client_id YOUR CLIENT-ID --sentinel_client_secret YOUR CLIENT SECRET \

Now run the file using:

./demo.sh

If the above script runs successfully this means your local environment for the Sentinel Image Scraper is set up and running.