Yasin Engin Go Backend

Tolerex - Fault-Tolerant Storage System in Go

A distributed storage lab for practicing replication, failover, secure node traffic, and operational thinking.

Go gRPC mTLS Distributed Systems

Problem

Storage systems need predictable behavior when a node fails, a follower falls behind, or a client repeats a request. Tolerex focuses on the core engineering problem: keeping the service available while making replication, health checks, and recovery visible enough to debug.

Architecture

The system is organized around a leader node, follower nodes, a client-facing API, a heartbeat loop, and a persistent log/checkpoint path. The design keeps health monitoring separate from request handling so failures can be detected without blocking normal traffic.

Technologies

Go for service implementation and concurrency control.
gRPC for explicit service contracts between components.
mTLS for authenticated node-to-node communication.
Disk-backed checkpoints and logs for restart behavior.

What I Built

Leader/follower replication flow with heartbeat-based health checks.
Failure detection path that can trigger a controlled failover scenario.
Persistence layer for data and log checkpoints.
Basic observability points for metrics, logs, and recovery timing.

Screenshots / Diagrams

GitHub Repository

Open Tolerex on GitHub

What I Learned

Separating failure detection from client request paths makes behavior easier to test.
Distributed systems code needs simple, visible state transitions more than clever abstractions.
Secure transport should be designed early because it affects local development, certificates, and deployment habits.

Future Improvements

Add repeatable chaos tests for leader failure, follower lag, and network partitions.
Expose Prometheus metrics and a small dashboard for failover timing.
Document benchmark scenarios with dataset size, request pattern, and recovery target.