Build a Real-Time Analytics Service in Python with Neon and Kafka
Latency is often a major bottleneck when developing cloud-native, data-intensive applications in a Kubernetes cluster. Whether you’re working with audio, video or — frankly — any variety of data at scale, moving that data to and from your local machine during development almost inevitably involves significantly increased latency as compared to a production cluster running in the cloud, which translates into tedious dev sessions and best guesses as to application performance in the wild.
Today, we'll look at an alternative approach with Velocity, as we develop and deploy a simple analytics service built with Kafka running in our Kubernetes cluster, and our Kafka stream being written to a managed Postgres instance we'll create in Neon.
Neon is an open-source, fully managed, cloud-native Postgres provider that separates compute from storage to support autoscaling, branching your database for development, and bottomless storage. Kafka, an open-source distributed event streaming platform for managing continuous streams of data, is purpose-built for handling high volumes of throughput, which will manage our demo analytics data stream with ease.
The app will be written in Python, and the full example includes a simple React frontend that will display the total number of “click” and “view” events over the past five minutes in a bar graph that refreshes every five seconds.
The app will consist of a simple React frontend, two core backend services, and a data generation service. Each service will be deployed in Kubernetes, and we’ll also deploy an Ingress that will enable HTTP access from outside of the cluster.
Mock analytics data will be streamed from a data service into Kafka, and a worker service will then read that data from Kafka, and feed it into our Neon Postgres instance. Finally, we'll create a web-api service that will query the Neon instance in order to display the analytics data.
And once we have the app deployed to our cluster, we’ll start a Velocity development session to showcase how Velocity can dramatically reduce lag when working on a data-intensive application that’s running in Kubernetes, so that the application performance in your dev environment is actually aligned with its performance in production.
The full project is available on GitHub.
First, you'll need to create a free account with Neon. Then, create a project called “AnalyticsExample.”
Next, create a new database called “analytics” within the project by clicking the “Database” dropdown and selecting “Create new database”.
Next, we'll create our data service that will generate a continuous stream of mock analytics data at a regular interval, which will be passed to Kafka via a Kafka producer we'll define in Python with the `confluent-kafka` library.
Next, we'll need to define the Kafka consumer, which will allow the Worker service to read our Kafka data stream and write that data to our Neon Postgres instance. In order to write the data, we'll use the popular Python ORM SQLAlchemy.
The above file imports two other files we'll have to define as well — db.py and models.py, in which we'll define the database connection and our SQL table respectively.
With the above data flow complete, we can now write the `web-api` service, which will query the Neon Postgres instance, so we can display the analytics data written there by the `worker` service.
Note that this service includes the same db.py and models.py files as the `worker` service, as it will also need to connect to the database, and query the “analytics” database we created above.
Now that our services are all defined, we can begin deploying the application in Kubernetes. If you have a cluster running, you can deploy it there to see the full benefit of Velocity. But for the purpose of easily running this example, we'll start a Minikube cluster. To do so, we can run the following with Docker installed locally.
Now that the cluster is up, we can deploy our various services.
We'll use Helm — a popular package manager for K8s. To install Helm on MacOS, you can run the following:
We'll first have to create a Dockerfile for each service we defined above, and then build those images in our Minikube cluster with the `minikube image build -t <image-name>` command.
NOTE: for all services that connect to your Neon database, you’ll need to define the Neon host according to the value provided in the Neon dashboard, as follows:
The Dockerfiles for each of our services will be similar to that shown below, but each service will also require a requirements.txt file that will include the specific Python packages required by that particular service, which will be unique for each of the services we defined above.
Note: the full project includes Dockerfiles and requirements.txt files for each service we defined above.
Once all of our services have been built in Minikube, we'll need to define YAML manifests for deploying those services in Kubernetes.
The following manifest defines a Kubernetes Deployment and a Kubernetes Service for the Data microservice we defined above.
Again, as with the above section on containerizing our services, this process will need to be completed for each service we need to deploy. The full project includes a Helm chart with all of these services defined for your convenience.
To securely pass your Neon password as an environment variable, you’ll want to create a Kubernetes Secret. To create this, run the following from the command line, where <value> is your Neon password:
Finally, with each of the above services defined in YAML manifests, we can run the following apply each of them to the cluster as follows:
After each of the services start, we will be able to see the data that our Data service is passing to Kafka, and which — in turn — our Worker service is writing to Postgres in the Neon dashboard by clicking on the “Tables” tab in the sidebar menu, and then clicking on “analytics_data” — the name of the table we defined with SQLAlchemy.
Now that the full application is deployed in Kubernetes, the process of developing it further can be rather tedious, as we would traditionally need to update our local source code, rebuild the image, push it to our registry — DockerHub in our case — and then redeploy the associated Kubernetes resources with the new image.
But with Velocity's free IDE plugin, you can connect to your running application in Kubernetes, and develop as you would locally.
Below, we go through the process of starting a Velocity development session, updating our local code, and automatically updating the remotely running code — the actual image that's running in Kubernetes — with our new code. This way, we don't have to go through all the above steps each time we want to develop or debug code that is running in Kubernetes. Instead, we can just write code, and Velocity updates the Kubernetes Deployment with our new code!
Neon is an open-source, highly scalable Serverless Postgres provider and Kafka is an open-source distributed event streaming platform that is built specifically to handle high volumes of throughput. Together, these services provide a strong foundation for handling and storing large streams of data efficiently and effectively.
Above, we demonstrated this capability by developing a microservice-based application that writes and retrieves analytics data generated continuously. We were able to further develop and debug the application in a very straightforward way after it had been deployed to Kubernetes with Velocity's IDE plugin.
If we hadn’t used Velocity, for every code change we made, we would have had to wait for all relevant CI processes to complete, rebuild the image, and deploy it to our Kubernetes cluster. Instead, we were able to simply update our code as if we were developing locally, and Velocity dynamically replaced our running image with one that included our current local code.
Python class called ProcessVideo
Python class called ProcessVideo
Thank you! Your submission has been received!