Introduction to geoCML
geoCML is a Containerized, Multi-paradigm, Lightweight deployment pattern for geographic information systems running locally or in the cloud. geoCML deployments package best-in-class open-source GIS software, including QGIS and PostGIS, allowing you to quickly go to production. In this topic, you will learn how geoCML deployments work and be able to make an informed decision about whether geoCML is a good choice your needs.
How does geoCML work?
geoCML deployments include four containerized microservices: geocml-desktop (the primary user entrypoint), geocml-server (the secondary user entrypoint), geocml-postgres (the primary data store), and geocml-task-scheduler (an automation service, often abbreviated as gTS). Each service can communicate via an internal network called geocml-network. Once deployed as a geoCML instance, you can begin using geoCML to complete GIS workflows. By design, each geoCML instance should host only one project/dataset. Multiple datasets should be hosted across multiple geoCML instances. Specific documentation on each service is available for your reference.
What can geoCML do?
geoCML deployments come out of the box with a variety of features:
-
Desktop GIS workflows
-
Cloud native server GIS workflows
-
Routine database backups
-
PostgreSQL/PostGIS database hosting
-
Python support for writing custom gTS automation tasks
-
Persist information between containerized services via the persistence-layer directory
-
Realtime GIS
-
Use less space/resources than other enterprise GIS deployments
What can geoCML not do out of the box?
geoCML is currently limited in the following areas:
-
Integration with ArcGIS proprietary deployments and extensions
-
Database version management
-
Backup domain values in a database
-
Support for storing raster data in geocml-postgres
-
MySQL support, SQL Server support, etc (base deployments only support PostgreSQL/PostGIS)
geoCML base deployments are customizable to fit your specific needs. You can customize your services however you’d like! If you would like to contribute to the development of geoCML, you can find the project on GitHub at github.com/geocml.
Quick deployment guide
Get up and running with geoCML in under 15 minutes!
geoCML deployments are multi-paradigm, offering a desktop, server, and web GIS experience with a single deployment. You may host your geoCML instance locally or in the cloud, depending on your needs.
Before instantiating a geoCML deployment, you must have Docker and Docker Compose installed on the machine you want to host geoCML on. You do not need any additional GIS software installed on the host machine. Once you have satisfied these conditions, please follow the following steps to deploy your geoCML instance.
-
Clone the geoCML source code from github.com/geocml/geocml-base-deployment
-
Open a terminal and cd into the source code directory
-
Copy
.env.example
into a new file called.env
-
Update your
.env
to include your deployment specific configuration variables -
Run
sh build.sh
to build geoCML service images on your machine. -
Run
docker network create geocml-network
-
Run
sh start.sh
to bring up the instance.
That’s it! You can access geoCML Desktop via {deployment host URL}:10000 or geoCML Server Portal via {deployment host URL}:80 using a web browser. Further configuration steps for each of these services are discussed in later topics.
Using hosted geoCML images from GHCR
The geoCML development team hosts pre-built containers at our container registry on Github. These containers are a great way to demo geoCML, but please note that these services are not production ready, because they lack the required build arguments. If you want to use geoCML in production, please build your containers.
Please keep in mind that the GEOCML_DESKTOP_PASSWORD variable in the .env
file must be set to access your deployment via geoCML Desktop.
geoCML Desktop
geoCML Desktop is the primary access point for your geoCML deployment. geoCML Desktop provides you with a desktop environment allowing you to prepare, visualize, and analyze your GIS data. The geocml-desktop container is based on Ubuntu Linux and comes installed with the XPRA, allowing you to view running applications in a web browser. geoCML Desktop comes preinstalled with QGIS, a best-in-class open-source desktop GIS application. geoCML Desktop is password protected. You must set a password within your deployment’s .env
file.
Connecting to geoCML Desktop
After instantiating your geoCML deployment, you can connect to geoCML Desktop via a web browser at {deployment host URL}:10000.
Using geoCML Desktop
geoCML Desktop is designed to be a replacement for your typical desktop GIS experience. With the default geoCML Desktop, you can:
-
Use QGIS to prepare, aggregate, and visualize your GIS data
-
Configure geoCML Server Portal
-
Connect to the geocml-postgres service
Configuring geoCML Desktop
You may want to extend the default geocml-desktop service with additional applications or configurations that meet your needs. geoCML Destkop has a two step configuration process: Dockerfile configuration, and Ansible configuration.
Docker Configuration
You can use Docker to install packages to geoCML Desktop. Open geocml-base-deployment\Dockerfiles\Dockerfile.geocml-desktop in your favorite text editor, and between the Customize Container Here and End Customizations comments, add your use-case specific geoCML Desktop configuration steps.
Ansible Configuration
You can use Ansible to automate advanced, tedious configuration workflows. By default, geoCML Desktop uses Ansible to automate configuring your database. You can further customize your advanced configurations. Open geocml-base-deployment/ansible-playbooks/geocml-desktop-playbook.yaml
Understanding the persistence layer
geoCML deployments do not persist data; the file system within the service should not ever be changed directly unless during the build process. You may wonder how, then, are you to upload and change datasets contained within the geoCML Desktop service?
The persistence layer is a mutible and persistent directory shared between the geoCML Desktop service and the deployment’s host machine. geoCML Base Deployments use Docker to bind the host machine’s local persistence layer to a persistence layer within the geoCML Desktop service. This binding allows you to add files to the host machine’s persistence layer and have them available within the geoCML Desktop service. The persistence layer is also be available to other services within the deployment.
About geocml-project.qgz
geoCML Desktop will automatically open geocml-project.qgz in QGIS when you connect to the XPRA service. geocml-project.qgz is the central project file for your entire geoCML deployment. By design, all of your GIS work must be done in this file.
geoCML Postgres
geoCML deploys a micro-service container with a PostGIS enabled Postgres database by default for your project. This is the primary data store for a geoCML deployment.
Configuring geoCML Postgres
geoCML Postgres can be fully configured to suit the needs of your use case. geoCML Postgres has a two step configuration process: Dockerfile configuration, and Ansible configuration.
Docker Configuration
You can use Docker to install packages to your geoCML Postgres service deployment. Open geocml-base-deployment\Dockerfiles\Dockerfile.geocml-postgres in your favorite text editor, and between the Customize Container Here and End Customizations comments, add your use-case specific geocml-postgres configuration steps.
Ansible Configuration
You can use Ansible to automate advanced, tedious configuration workflows. By default, geoCML Postgres uses Ansible to automate configuring your database. You can further customize your advanced configurations. Open geocml-base-deployment/ansible-playbooks/geocml-postgres-playbook.yaml
Accessing your database
geoCML Postgres creates a database named geocml_db, which is the primary datastore for your geoCML project. Do not change the name of this database! You can access your database in several ways:
-
via geoCML Desktop and QGIS (over the internal geocml-network)
-
via a PostgreSQL data explorer (over port {deployment host URL}:5432)
The default credentials for accessing your database are:
-
Username: geocml
-
Password: geocml
geoCML Postgres also configures an admin user for your data store. The default credentials for accessing your data store as an admin are:
-
Username: postgres
-
Password: admin
You can change the default password for both the geocml and postgres users in your deployment’s .env
file. Note that these are build-time arguments; you must rebuild your containers to commit these changes.
Current limitations of the geoCML Postgres service
In geoCML v0.3.0, there are known limitations with the geoCML Postgres service. Currently, geoCML Postgres does not support the following data:
-
Raster datasets,
-
Domain fields
If you are working with raster data, please store them in the persistence layer rather than in geoCML Postgres.
geoCML Server
geoCML deploys a micro-service container with a QGIS Server instance, an Apache web server, and a React application called geoCML Server Portal (a frontend for your server GIS). This micro-service is responsible for implementing web and server GIS paradigms for your project. geoCML Server collects information from geocml-project.qgz to serve your data via several web services.
Configuring geoCML Server
geoCML Server can be fully configured to suit the needs of your use case. geoCML Server has a two step configuration process: Dockerfile configuration, and Ansible configuration.
Docker Configuration
You can use Docker to install packages to your geoCML Server service deployment. Open geocml-base-deployment\Dockerfiles\Dockerfile.geocml-server in your favorite text editor, and between the Customize Container Here and End Customizations comments, add your use-case specific geocml-server configuration steps.
Ansible Configuration
You can use Ansible to automate advanced, tedious configuration workflows. By default, geoCML Server uses Ansible to automate configuring your Apache and QGIS Server. You can further customize your advanced configurations. Open geocml-base-deployment/ansible-playbooks/geocml-server-playbook.yaml
Accessing geoCML Server via the web
geoCML Server Portal is accessible via {deployment host URL}:80. geoCML Server Portal is a web application frontend for geoCML Server written in React. geoCML Server Portal acts as the secondary user entrypoint for your deployment, allowing you to share data with others over the internet. geoCML Server Portal relies on Web Map Service (WMS) information from geocml-project.qgz in order to properly display your instance information.
geoCML Server Portal features:
-
a description of your project,
-
your contact information,
-
copyright claims for your project’s data,
-
a hosted Leaflet web map,
-
WMS connection information,
-
WFS connection information,
-
WCS connection information,
-
a preview of your hosted data,
-
a list of all hosted data tables,
-
similar recommended datasets from DRGON
Accessing geoCML Server via an API
geoCML Server exposes all API functionality for QGIS Server via cfcgi and Apache. Learn more about QGIS Server here: https://docs.qgis.org/3.34/en/docs/server_manual/index.html
geoCML Task Scheduler (gTS)
geoCML deploys a micro-service container called geoCML Task Scheduler (gTS). This micro-service is responsible for automating routine tasks within your deployment such as backing up data from geoCML Postgres, restoring geocml_db from a backup, and healthchecking services within your deployment. geoCML Task Scheduler exposes a simple Python API for developing additional tasks.
Configuring geoCML Task Scheduler
geoCML Task Scheduler can be fully configured to suit the needs of your use case. geoCML Task Scheduler has a two step configuration process: Dockerfile configuration, and Ansible configuration.
Docker Configuration
You can use Docker to install packages to your geoCML Task Scheduler service deployment. Open geocml-base-deployment\Dockerfiles\Dockerfile.geocml-task-scheduler in your favorite text editor, and between the Customize Container Here and End Customizations comments, add your use-case specific geocml-task-scheduler configuration steps.
Ansible Configuration
You can use Ansible to automate advanced, tedious configuration workflows. Open geocml-base-deployment/ansible-playbooks/geocml-task-scheduler-playbook.yaml
Understanding DBBackups
geoCML Postgres is not a persistent data store. Because of this, when your geoCML instance goes down, you will risk losing information in your data store. geoCML Task Scheduler handles this by automatically backing up geocml_db every hour. When your instance is brought back up, geoCML Task Scheduler will automatically restore your data store from the most recent backup. Backups are stored in the DBBackups directory in the persistence layer. Each DBBackup contains a .tabor file defining the schema of geocml_db and a series of CSV files representing the actual data in your tables.
Writing Tasks
You can create custom tasks in geoCML Task Scheduler.
-
Open
build-resources/geocml-task-scheduler/geocml-task-scheduler/
in your favorite text editor. -
Create a new Python file.
-
Define a function in the new file with your task logic.
-
Return 0 if you want your task to run only once. Otherwise, your task will run according to its position in the schedule.
-
Save your Python file.
-
Open
schedule.py
-
Create a new Task object, instantiated with your new function.
-
Schedule your task for execution with its execution frequency (in seconds)
-
Rebuild the geocml-task-scheduler container and deploy
Your new task is created, and it is scheduled for execution.
Tabor
Tabor is a database modeling language for GIS based on YAML, but with additional syntax restrictions. The goal of Tabor is to allow GIS users to create and maintain complex database rules using plain-text configuration files. The following is an example of a Tabor configuration file for a PostGIS database:
tabor: 0.3.0
layers:
- name: grass
schema: public
owner: geocml
geometry: polygon
srid: 4326
fields:
- name: fid
type: int
pk: true
- name: trees
schema: public
owner: geocml
geometry: point
srid: 4326
fields:
- name: fid
type: int
pk: true
- name: genus
type: text
- name: species
type: text
- name: height_meters
type: numeric
- name: circumference_cm
type: numeric
constraints:
- name: on
layer: grass
- name: streams
schema: public
owner: geocml
geometry: polyline
srid: 4326
fields:
- name: fid
type: int
pk: true
Running this file through the Tabor command line utility generates a valid PostgreSQL schema query that can be used to create or update tables in a PostGIS database.
Downloading and Installing Tabor
Tabor can be downloaded directly from https://github.com/geoCML/tabor. After downloading, simply extract the downloaded .zip file to a directory accessible on your terminal path.
Command Line Usage
tabor read --file <path/to/file>
→ Converts a .tabor file into a PostGIS schema query.
tabor write --file <path/to/file> --db <name_of_psql_db> --username <name of db user> --password <password of db user?> --host <host of psql db?> --port <port of psql db?> --ignore <tables to ignore?>
→ Converts a PostGIS database to a .tabor file
tabor load --file <path/to/file> --db <name_of_psql_db> --username <name of db user> --password <password of db user?> --host <host of psql db?> --port <port of psql db?>
→ Loads a PostGIS database from a .tabor file.
Supported Geometry Types
Tabor supports the following geometry types:
-
polygon
→ for 2D shapes such as political boundaries -
polyline
→ for 1D shapes such as roadway centerlines -
point
→ for 1D shapes such as trees
If a layer has no geometry type, it may still be defined as a non-geometric table in the database.
The spatial reference of each layer in your Tabor file can be defined with the srid
field, followed by the SRID of your reference system.
Supported Field Types
Tabor supports the following primitive field types:
-
int
→ for whole numbers -
numeric
→ for decimal numbers -
text
→ for unlimited length character strings -
boolean
→ for true or false values
Tabor also supports arrays of primitive data. You can define an array as type: <primitive> array
It is strongly recommended that you have a primary key (pk) on each of your layers. You can define a field as a primary key using pk: true
.
Data Constraints
You can define complex business logic in your Tabor file with data constraints. Data constraints ensure that changes to your database table meets specific rules before they are committed to the database. You can choose to include as many data constraints on your layers as you need.
Tabor supports the following data constraints:
-
on
→ Checks that new features are at least partially within the boundaries of another layer (works with all geometry types)
ex.
constraints:
- name: on
layer: other_layer
-
length
→ Checks that new polyline features have either a minimum or maximum length
ex.
constraints:
- name: length
minimum: 0.5
maximum: 99.5 # You must include either a minimum or maximum value, but not both!
-
near
→ Checks that new features are placed within a given distance of another layer (works with all geometry type)
ex.
constraints:
- name: near
distance: 15.6
layer: other_layer
DRGON
DRGON (pronounced as 'Dragon') is a Distributed Registry of GISystems Over a Network. DRGON collects a registry geoCML deployments over the internet (or an intranet, if you prefer). Using a simple REST API, you can easily query DRGON to find the perfect dataset. A public registry is hosted at https://drgon.geocml.com, but you may also self host a DRGON instance, depending on your needs.
Quickstart Guide
Before interacting with DRGON, you must first have a hosted geoCML deployment with a properly configured geoCML Server Portal. Next, register for an API key via a POST request to <DRGON_HOST>/apikey; You must provide an email address in the request body. Copy your API key to a safe place, you will only be able to view it once!
On your deployment’s server machine, add the following values to your .env
file:
-
DRGON_HOST: the host URL of the DRGON instance you want to use (Do not include trailing slash)
-
DRGON_API_KEY: your DRGON API key
-
GEOCML_DEPLOYMENT_HOST: the domain name of your hosted geoCML instance
Rebuild your geoCML deployment, and restart your instance. After about a minute of up-time, geoCML Task Scheduler will ping DRGON and automatically register your deployment.
Self Hosting a DRGON Instance
You can host your own dedicated DRGON instance with a few simple steps:
-
Clone the geoCML source code from github.com/geocml/drgon
-
Open a terminal and cd into the source code directory
-
Copy
.env.example
into a new file called.env
-
Update your
.env
to include your deployment specific configuration variables -
Run
sh build.sh
to build DRGON service images on your machine. -
Run
docker network create drgon-network
-
Run
sh start.sh
to bring up the instance.
API Reference
Endpoint |
Method |
Request Body |
Description |
|
POST |
|
Requests an API key necessary for registering your geoCML deployment with DRGON. This key is 100% free, forever. |
|
GET |
None |
Requests a registry containing all geoCML deployments known to DRGON. |
|
POST |
|
Requests that a hosted geoCML Deployment be registered with DRGON. |
|
GET |
|
Requests DRGON for recommended datasets based on the provided tags. |
Moderating DRGON
Whoever hosts a DRGON instance is free to moderate their registry however they wish. DRGON automatically moderates deployments containing banned words (e.g. offensive words), however manual review and moderation of deployments in your registry is highly recommended.
The DRGON development team recommends that you prevent users from abusing your registry, either by uploading duplicate datasets, or by registering non-geoCML deployments. If a user is abusing your registry, you can revoke their API key within the drgon-postgres
service database.