Get started with Velocity
Join the Waitlist
Join Our Discord
Blogs

Build a Highly Performant File Server in Kubernetes with SeaweedFS

Jeff Vincent
Jeff Vincent
  
January 1, 2024

Build a Highly Performant File Server in Kubernetes with SeaweedFS

Build a Highly Performant File Server in Kubernetes with SeaweedFS

Then, with Helm installed, we can deploy the SeaweedFS chart by running the following:

SeaweedFS stands out as a highly scalable and efficient distributed file store that seamlessly integrates with Kubernetes using Helm. Capable of handling billions of files, SeaweedFS excels in rapidly retrieving files with an O(1) time complexity for disk seeks. This means that the algorithm powering SeaweedFS maintains a consistent performance level, unaffected by the scale of input data, ensuring efficient processing even with vast amounts of information.

To explore the basics of SeaweedFS, we'll build a simple file server, which writes files to Volumes in SeaweedFS, and stores related metadata in a separate Postgres DB. This data will be written and read in a sequence that allows it to be easily accessible and extremely efficient.

The full project is available in GitHub.

Topics we'll cover

What we're building

We're going to build a simple file server that accepts image files as uploads, stores them in SeaweedFS, and writes related metadata to a Postgres database. The file server will also allow files to be downloaded by clicking a hyperlinked filename in the browser.

Setup up a Kubernetes Cluster

minikube start
minikube addons enable kong
minikube tunnel
>

Deploy SeaweedFS in K8s

To deploy SeaweedFS in Kubernetes, we'll use the official Helm Chart, which will require that we first install Helm. To install on macOS, you can run:

brew install helm

Then, with Helm installed, we can deploy the SeaweedFS chart by running the following:

NOTE: the example repo includes a Helm values.yaml file that will allow you to deploy SeaweedFS on a M1. To include it in the above helm install command, simply include `-f <path-to-the-example-values.yaml-file>`.

The Web-API

The Web-API service will handle all queries to SeaweedFS and Postgres.

db.go

First, we'll need to define a database table to store file metadata in Postgres, as described above. For this, we'll use Gorm, a popular SQL ORM for Go.

package main


import (
   "fmt"
   "log"
   "os"


   "gorm.io/driver/postgres"
   "gorm.io/gorm"
)


var (
   postgres_user     = os.Getenv("POSTGRES_USER")
   postgres_password = os.Getenv("POSTGRES_PASSWORD")
   postgres_db       = os.Getenv("POSTGRES_DB")
   postgres_host     = os.Getenv("POSTGRES_HOST")
   postgres_port     = os.Getenv("POSTGRES_PORT")
   postgres_uri      = fmt.Sprintf("postgresql://%s:%s@%s:%s/%s?sslmode=disable", postgres_user, postgres_password, postgres_host, postgres_port, postgres_db)
)


func InitDB() *gorm.DB {
   db, err := gorm.Open(postgres.Open(postgres_uri), &gorm.Config{})
   if err != nil {
       log.Fatalf("Failed to connect to the database: %v", err)
   }
   FileRecord := FileRecord{}
   db.AutoMigrate(&FileRecord)
   return db
}

As you can see, the database table is defined as a Go struct called FileRecord (defined in the following code block). We create an instance of this struct and pass a pointer to the struct to the db.AutoMigrate() method provided by Gorm, and we create a database client in the same InitDB() function, which we'll later call in our main() function in main.go.

models.go

Next, we have several structs to create. MasterResponse, Location, and Volume, are for parsing JSON responses from various components in the SeaweedFS system, and FileRecord — as described above — is the struct we'll use to write file metadata to Postgres.

package main

import (
	"gorm.io/gorm"
)

type MasterResponse struct {
	Count     int    `json:"count"`
	FID       string `json:"fid"`
	URL       string `json:"url"`
	PublicURL string `json:"publicUrl"`
}

type Location struct {
	PublicURL string `json:"publicUrl"`
	URL       string `json:"url"`
}

type Volume struct {
	VolumeID  string     `json:"volumeId"`
	Locations []Location `json:"locations"`
}

type FileRecord struct {
	gorm.Model
	FID      string `json:"fid"`
	FileName string `json:"fileName"`
}

main.go

Let's begin by defining our main() function and our first Gin endpoint — “/api/upload,” which will accept a POST request with a file as form data.

After importing the required packages and defining the SeaweedFS master URL, we define the Gin router and instantiate our database connection by calling InitDB(), which we defined above.

Then, we parse the incoming file and call two functions in sequence. First, we call the SeaweedFS Master to get an available file location, and the URL for the SeaweedFS volume in which the file will be stored with the function getSeaweedfsFidAndUrl(). Then, we call the uploadFileToSeaweedfs() function with the file and url values returned from the SeaweedFS Master, along with the file to be uploaded.

Finally, after the upload has succeeded, we create a FileRecord containing the file's location in SeaweedFS and its filename, which we then write to Postgres. This will allow us to easily track the files that have been uploaded, and the locations in SeaweedFS to which they've been written.

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"log"
	"mime/multipart"
	"net/http"

	"github.com/gin-gonic/gin"
	"gorm.io/gorm"
)

var (
	seaweedfs_master_url = os.Getenv(“SEAWEEDFS_MASTER_URL”)
)

func main() {
	r := gin.Default()
	db := InitDB()

	r.POST("/api/upload", func(c *gin.Context) {
		// Read the image file from the form data
		file, fh, err := c.Request.FormFile("file")
		if err != nil {
			c.String(http.StatusInternalServerError, "Error reading file: "+err.Error())
			return
		}

		fid, url, err := getSeaweedfsFidAndUrl()
		if err != nil {
			c.String(http.StatusInternalServerError, "Error getting SeaweedFS FID and URL: "+err.Error())
			return
		}

		responseBody, err := uploadFileToSeaweedFS(fid, url, file)
		if err != nil {
			c.String(http.StatusInternalServerError, "Error uploading file to SeaweedFS: "+err.Error())
			return
		}
		f := FileRecord{FileName: fh.Filename, FID: fid}
		db.Create(&f)
		c.JSON(http.StatusOK, fid)
	})

	r.Run(":8080")
}

Next, still in our main.go file, let's define the two functions called in the Gin router handler we just defined. First, we'll define getSeaweedfsFidAndUrl():

func getSeaweedfsFidAndUrl() (string, string, error) {
	response, err := http.Get(fmt.Sprintf("%s%s", seaweedfs_master_url, "assign"))
	if err != nil {
		return "", "", fmt.Errorf("error sending GET request: %w", err)
	}
	defer response.Body.Close()
	var data MasterResponse
	err = json.NewDecoder(response.Body).Decode(&data)
	if err != nil {
		return "", "", fmt.Errorf("error parsing JSON: %w", err)
	}
	fid := data.FID
	url := data.URL

	return fid, url, nil
}

And then, we'll need to define the uploadFileToSeaweedFS() function in the same file, like so:

func uploadFileToSeaweedFS(fid string, url string, file multipart.File) ([]byte, error) {
	// Create a new buffer to store the multipart request body
	body := new(bytes.Buffer)
	writer := multipart.NewWriter(body)

	// Add the image file to the multipart form
	part, err := writer.CreateFormFile("file", "image.png")
	if err != nil {
		fmt.Println("Error creating form file:", err)
		return nil, err
	}

	_, err = io.Copy(part, file)
	if err != nil {
		fmt.Println("Error copying file data:", err)
		return nil, err
	}

	// Close the writer to finalize the multipart form
	writer.Close()

	request, err := http.NewRequest("POST", fmt.Sprintf("http://%s/%s", url, fid), body)
	if err != nil {
		fmt.Println("Error creating request:", err)
		return nil, err
	}

	// Set the Content-Type header with the boundary parameter
	request.Header.Set("Content-Type", writer.FormDataContentType())

	client := &http.Client{}
	response, err := client.Do(request)
	if err != nil {
		fmt.Println("Error sending POST request:", err)
		return nil, err
	}
	defer response.Body.Close()

	// Read the response body as a byte array
	responseBody, err := io.ReadAll(response.Body)
	if err != nil {
		fmt.Println("Error reading response body:", err)
		return nil, err
	}

	return responseBody, nil
}

With that, the file server's upload functionality is complete. Next, let's define the download functionality by adding the following Gin route handler to our main() function that will handle GET requests to "/api/download/:fid".

r.GET("/api/download/:fid", func(c *gin.Context) {
		fid := c.Param("fid")
		file_location, err := getSeaweedfsFileLocation()
		if err != nil {
			log.Println(err)
			c.String(http.StatusInternalServerError, "Error getting file from SeaweedFS")
			return
		}
		buf, err := downloadSeaweedfsFile(file_location, fid)
		if err != nil {
			log.Println(err)
			c.String(http.StatusInternalServerError, "Error downloading file from SeaweedFS")
			return
		}

		// Set the appropriate headers for file download
		c.Header("Content-Disposition", "attachment; filename=image.png")
		c.Header("Content-Type", "application/octet-stream")
		c.Header("Content-Length", fmt.Sprint(buf.Len()))

		// Write the buffer directly to the response writer
		if _, err := buf.WriteTo(c.Writer); err != nil {
			log.Println(err)
			c.String(http.StatusInternalServerError, "Error writing file to response")
			return
		}
	})

	r.GET("/api/files", func(c *gin.Context) {
		var file_records []FileRecord
		db.Find(&file_records)
		c.JSON(200, file_records)
	})

Here again, we'll need to define the functions called in the route handler — getSeaweedfsFileLocation() and downloadSeaweedfsFile(). Again, these functions will also be defined in main.go. 

func getSeaweedfsFileLocation() (*Volume, error) {
	response, err := http.Get(fmt.Sprintf("%s%s", seaweedfs_master_url, "lookup?volumeId=3"))
	if err != nil {
		fmt.Println("Error sending GET request:", err)
		return nil, err
	}
	defer response.Body.Close()

	var d Volume

	err = json.NewDecoder(response.Body).Decode(&d)
	if err != nil {
		fmt.Println("Error parsing JSON:", err)
		return nil, err
	}

	return &d, nil
}


func downloadSeaweedfsFile(d *Volume, fid string) (*bytes.Buffer, error) {
	response, err := http.Get(fmt.Sprintf("http://%s/%s", d.Locations[0].PublicURL, fid))
	if err != nil {
		fmt.Println("Error sending GET request:", err)
		return nil, err
	}
	defer response.Body.Close()

	// Create a buffer to store the content of the response body
	var buf bytes.Buffer

	// Copy the response body to the buffer
	_, err = io.Copy(&buf, response.Body)
	if err != nil {
		fmt.Println("Error reading response body:", err)
		return nil, err
	}

	return &buf, nil

}

And now, the Web-API service's download functionality is also ready. 

The Frontend

Next, let's define the React frontend that will include a form to upload files and list all files uploaded as hyperlinked file names in the browser.

files.js

The frontend will consist of a single React component called FileForm, which will include a simple form for uploading files and will fetch all file metadata from Postgres on load, which will be displayed as an ordered list of filenames as download links.

import React, { useState, useEffect } from 'react';
import './styles.css';

const FileForm = () => {
  const [file, setFile] = useState(null);
  const [fileList, setFileList] = useState([]);

  const handleSubmit = async (e) => {
    e.preventDefault();
    try {
      const formData = new FormData();
      formData.append('file', file);

      const response = await fetch(`/api/upload`, {
        method: 'POST',
        body: formData,
      });

      if (response.ok) {
        window.location.reload();
        const data = await response.json();
        console.log(data);
      } else {
        console.error('Error:', response.status);
      }
    } catch (error) {
      console.error(error);
    }
  };

  const handleFileChange = (e) => {
    const file = e.target.files[0];
    setFile(file);
    console.log(file);
  };

  useEffect(() => {
    async function fetchFiles() {
      try {
        const response = await fetch('/api/files');
        if (response.ok) {
          const data = await response.json();
          console.log(data)
          setFileList(data);
        } else {
          console.error('Error:', response.status);
        }
      } catch (error) {
        console.error(error);
      }
    }
    fetchFiles();
  }, []); // Empty dependency array, so it runs only once on mount

  return (
    

Upload File

File List

); }; export default FileForm;

And, finally, we'll have to include our new component FileForm in our App.js file, like so:

App.js

import './App.css';
import FileForm from './components/files';

function App() {
  return (
    
); } export default App;

Containerize the services

Before we can deploy the frontend and Web-API services to Kubernetes, we'll need to create Docker images for both services and push those images to a registry.

Frontend Dockerfile

# Use an official Nginx image as the base
FROM nginx:alpine

# Remove default Nginx configuration
RUN rm -rf /etc/nginx/conf.d

# Copy custom Nginx configuration
COPY nginx.conf /etc/nginx/conf.d/default.conf

# Copy the built React app from the local machine to the container
COPY build /usr/share/nginx/html

# Expose a port for the container
EXPOSE 80

# Start the Nginx web server
CMD ["nginx", "-g", "daemon off;"]

Because we're serving the React app with NGINX, we'll also need to create a nginx.conf file, which we COPY into the image defined above:

nginx.conf

server {
  listen 80;
  server_name localhost;

  location / {
    root /usr/share/nginx/html;
    try_files $uri $uri/ /index.html;
  }
}

To build the image defined above with Minikube, from the /frontend directory, we can now run:

minikube image build -t seaweedfs-frontend:latest . 

Web-API Dockerfile

FROM golang:1.18 as builder

# first (build) stage

WORKDIR /app
COPY . .
RUN go mod download
RUN CGO_ENABLED=0 go build -v -o app .

# final (target) stage

FROM alpine:3.10
WORKDIR /root/
COPY --from=builder /app ./
CMD ["./app"]

Similarly, to build the Web-API image defined above in our Minikube cluster, from the /web-api directory, we can run:

minikube image build  -t seaweedfs-web-api:latest .

Deploy in Kubernetes

Next, we will need to create the Kubernetes manifests that will run the above container images, and allow them to be networked together. All required resource definitions are available in GitHub, so we'll just look at one example here, and walk through what each of the different pieces are doing.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  selector:
    matchLabels:
      api: frontend
  replicas: 1
  template:
    metadata:
      labels:
        app: frontend
        api: frontend
    spec:
      containers:
        - name: frontend
          image: seaweedfs-frontend:latest
          imagePullPolicy: IfNotPresent
          ports:
            - name: frontend
              containerPort: 80
              protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  ports:
    - port: 80
      targetPort: 80
      name: frontend
  selector:
    app: frontend
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
spec:
  ingressClassName: kong
  rules:
  - http:
      paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: frontend
              port:
                number: 80
        - path: /api
          pathType: Prefix
          backend:
            service:
              name: web-api
              port:
                number: 8080

We have four Kubernetes resource types defined above: a deployment, a service an ingress, and in the case of our Postgres deployment a persistent volume claim, which will allow data to be persisted even if the Pod running Postgres should go down.

The deployment is responsible for pulling and running the Docker image we defined above. It can create any number of Kubernetes pods, each running a separate copy of our defined service and load balancing traffic amongst them. The Kubernetes service is of the type ClusterIP, which exposes networking between microservices running inside the cluster.

And finally, the Kubernetes ingress allows external HTTP traffic to hit the endpoints defined in the spec.rules section of the manifest. Above, we have two ingress rules defined, which allows the React frontend to be run on port 80, and a second rule which allows traffic from the browser — where the frontend is running — to the Web-API service to read and write from SeaweedFS and Postgres.

Deploy the manifests

kubectl apply -f web-api.yaml -n default
kubectl apply -f frontend.yaml -n default
kubectl apply -f postgtres.yaml -n default

Develop with Velocity

With the Web-API service written as above, the upload and download functionality works, but we have a bug! Every file that is downloaded is called “image.png,” as the filename is hardcoded. Let's update that so that files are given the same name they have at upload.

Traditionally, when you have one or more services running in Kubernetes, you would have to go through the majority of the above deployment steps in order to update the container images running in your Deployments, but with Velocity you can simply start a remote development and debugging session from your IDE, and update you code as if it were running locally. Velocity automatically syncs your code to the remote cluster, so you can see your changes almost immediately.

To use Velocity, we'll install the VSCode plugin by clicking on the “Extensions” icon in the IDE, searching for Velocity, and clicking “Install.”

Once Velocity is installed, we’ll need to click “Login to Velocity” to login with a Google or GitHub account, and then click the Velocity icon in the leftmost menu in our IDE.

Next, check to make sure that the auto-populated fields are correct — by default, Velocity selects your default Kubernetes environment, but it can work with any Kubernetes environment defined in your Kubeconfig, which can be selected with the “Kubernetes Context” dropdown menu.

Click “Next,” and then in the following view, click “Create.” 

When you do, you will see the following as Velocity spins up the required resources in your Kubernetes cluster, and then builds and pushes your local code according to the specifications defined in your selected local Dockerfile. When this process is complete, you'll see your local code running in the cluster, and every time you make a change locally, that code change will be reflected in the remote cluster. 

With the Velocity development session running, let's add a function to our main.go file that will get the correct filename from the metadata we're storing in Postgres. 

func getSeaweedfsFileName(db *gorm.DB, fid string) (string, error) {
	file_record := FileRecord{}
	result := db.First(&file_record, "f_id = ?", fid)
	if result.Error != nil {
		log.Printf("Failed to get file record by FID: %v", result.Error)
		return "", result.Error
	}
	return file_record.FileName, nil
}

And then, update the main() function in your main.go file as follows:

...

file_name, err := getSeaweedfsFileName(db, fid)
		if err != nil {
			log.Println(err)
			return
		}

		// Set the appropriate headers for file download
		c.Header("Content-Disposition", fmt.Sprintf("attachment; filename=%s", file_name))
		c.Header("Content-Type", "application/octet-stream")
		c.Header("Content-Length", fmt.Sprint(buf.Len()))

...

Now, when you download any existing or new files, they will be named as they were uploaded!

Conclusion

SeaweedFS is a super powerful file store that can easily be spun up in Kubernetes. It can store billions of files, which it can serve extremely efficiently. Above, we walked through the basics of getting SeaweedFS up and running in Kubernetes by building a simple file server that stores files in SeaweedFS volumes and related metadata in a separate Postgres instance.

Then, after the application was up and running in a remote Kubernetes cluster, we saw how Velocity can streamline the development and debugging of applications in complex Kubernetes environments.

Python class called ProcessVideo

Python class called ProcessVideo

Get started with Velocity