RealTimeDefender · Hybrid AI Cyberattack Detection

A hybrid Random Forest + bidirectional LSTM detection framework that runs at the network edge — small enough for a Raspberry Pi, fast enough to actually keep up with traffic.

The premise was simple: most intrusion detection systems are too slow, too cloud-bound, or too expensive for the kind of small-edge networks I actually care about — homelabs, drones, mesh routers. I wanted something that could sit on a Raspberry Pi at the edge and make a verdict on every packet flow before it became a problem.

What I ended up with was RealTimeDefender — a two-path hybrid model that combines the speed of Random Forest with the temporal awareness of a bidirectional LSTM. Here's how it works, what broke along the way, and why I think this architecture is the right shape for edge defense.

10k

PACKETS / SEC

96.4%

F1 SCORE

8ms

P99 LATENCY

RPI 4

EDGE TARGET

The two-path architecture

Most ML-based IDS architectures pick one model and ride it to the bottom. Random Forest is fast but stateless — it sees a flow as a vector of statistics and has no idea whether the packet before it was a SYN or an ACK. LSTM is the opposite — beautiful at sequence modeling, but at 8ms per inference on a Pi it falls behind real traffic almost immediately.

So I built both paths and let them argue.

┌─────────┐    ┌──────────┐    ┌─────────────┐    ┌────────┐
│ CAPTURE │ →  │ FEATURES │ →  │  RF (fast)  │ →  │ FUSION │
└─────────┘    └──────────┘    └─────────────┘ ↗  └────────┘
                    │           ┌─────────────┐  ↗
                    └─────────→ │ LSTM (deep) │ ↗
                                └─────────────┘

The RF model does the first pass — 0.4ms per flow, looking at 41 statistical features (packet count, byte count, inter-arrival times, TCP flag distributions). It's right ~93% of the time on its own. The LSTM picks up the remaining ambiguous flows and reads them as sequences with attention — the way a security analyst would mentally replay the conversation.

// design note — The fusion step is weighted by RF's confidence. If RF returns a verdict at >0.92 confidence, we trust it and skip the LSTM. Below that threshold, the LSTM gets to weigh in. This halves the average inference cost without measurably hurting accuracy.

Feature extraction — what actually mattered

I started with the full CICIDS2017 feature set (78 columns) and trimmed aggressively. Anything with high collinearity got cut. Anything that needed the entire flow to compute (and therefore couldn't run incrementally) got cut. What was left:

Flow duration	microseconds, log-scaled
Packet count fwd/bwd	asymmetry signals scans
Mean / std / min / max IAT	inter-arrival times
TCP flag counts	SYN/ACK/RST/PSH/URG/FIN
Payload entropy	encrypted vs plaintext signal
Pkt size mean/std	bursts vs steady streams

Entropy was the one that surprised me. It correlated more strongly with malicious traffic than I expected — partly because exfiltration tools love to use TLS over non-standard ports, but also because some C2 channels deliberately pad payloads with random data and the entropy ends up looking too random.

The LSTM: why bidirectional, and why attention

A unidirectional LSTM sees the past but not the future. For a real-time IDS that's actually fine — we don't have the future yet. But during training, we have entire flows captured, and a bidirectional pass during training lets the model learn richer representations of what malicious flows actually look like. At inference time we run only the forward pass.

class DefenderLSTM(nn.Module):
    def __init__(self, input_dim=41, hidden=64):
        super().__init__()
        self.bilstm = nn.LSTM(input_dim, hidden, batch_first=True, bidirectional=True)
        self.attn   = nn.MultiheadAttention(hidden*2, num_heads=4, batch_first=True)
        self.fc     = nn.Linear(hidden*2, 2)

    def forward(self, x):
        h, _    = self.bilstm(x)            # [B, T, 2H]
        a, _    = self.attn(h, h, h)        # self-attention over time
        pooled  = a.mean(dim=1)
        return self.fc(pooled)

Attention was the difference between a model that worked on the dataset and a model that worked in production. Without it, the LSTM was equally weighted across every timestep — which meant a 200-packet flow with one suspicious packet at position 47 got drowned out. Self-attention let the model focus on the few critical timesteps that actually carried the signal.

The whole project pivoted the day I added attention. F1 went from 0.84 to 0.96 on the same data. That's not a tuning improvement — that's the model finally seeing what was always there.

Edge deployment: making it fit

The trained PyTorch model was 38 MB. The Pi 4 has 4GB of RAM, but the rest of the stack — packet capture, feature extraction, fusion, the FastAPI server I expose verdicts on — needed room too. So I quantized.

// what worked — Static INT8 quantization with calibration on a holdout set lost only 0.3% F1 and made the model 4× faster. Dynamic INT8 was easier to deploy but cost twice as much latency. If you have a representative dataset, always calibrate.

Where it falls down

I want to be honest about this. RealTimeDefender is good at flooding-style attacks, scans, brute force, and most botnet patterns it's been trained on. It's not good at zero-days that look statistically benign — slow data exfiltration, low-and-slow C2 beacons, anything that hides inside legitimate traffic timing distributions. That's a fundamentally hard problem for any model trained on labeled data.

The next iteration is going to layer an unsupervised autoencoder on top, looking for flows that don't match any of the learned patterns. The hybrid model handles known threats; the autoencoder catches the things we've never seen.

Repo & next steps

The training pipeline, the deployment scripts for Pi, and the FastAPI inference service all live in the same monorepo. Some immediate plans:

Drop in the autoencoder for unknown-pattern detection
Move feature extraction to AF_PACKET for kernel-level capture
Test against my Wi-Fi pen-test lab traffic for end-to-end validation
Pair it with PatchPilot for detect-and-patch automation

RealTimeDefender: Catching Attacks at 10,000 Packets per Second

The two-path architecture

Feature extraction — what actually mattered

The LSTM: why bidirectional, and why attention

Edge deployment: making it fit

Where it falls down

Repo & next steps

// RELATED LOGS