June 2026 · 3 min read

What a $250 Box
Can Actually Do:
Jetson Orin Nano

I burned a week finding this out the hard way. Here it is, so you don't have to.

Edge AI NVIDIA Jetson TensorRT DeepStream
NVIDIA Jetson Orin Nano developer kit — a compact black board with a cooling fan on top, USB ports, HDMI and Ethernet up front
The Jetson Orin Nano dev kit — the whole computer is the size of your palm

The Jetson Orin Nano — the Super dev kit, the $250 one — is a palm-sized AI computer. Integrated Ampere GPU. 8 GB shared memory. Around 67 TOPS. It runs on about the power of a phone charger.

No graphics card. No server. And it runs a full computer-vision pipeline on its own.


The Pipeline It Runs

One graph, end to end, on the device:

Pull a live H.265 camera stream. Decode it on the NVDEC hardware — the CPU never touches the pixels. Run a YOLO detector as a TensorRT engine through DeepStream. Track every object across frames. Turn detections into real events.

RTSP CAMERA live H.265 NVDEC hw decode TENSORRT YOLO DEEPSTREAM TRACKER object IDs EVENTS alerts out ALL ON-DEVICE · 1080P @ 15–18 FPS · NO CLOUD ROUND-TRIP
One graph, end to end — the video never leaves the box

Live camera in, alert out. All local. The video never leaves the box.

That's the whole point. No cloud round-trip. No per-stream cloud bill. It keeps running through an internet outage.

I ran 1080p at ~15–18 FPS — detection, tracking, and event logic together — well inside its memory and power budget. This is production hardware, not a toy.

TensorRT: The Trick and the Trap

A .engine is a model compiled for this exact chip. That's where the speed comes from.

Two things bite you.

The first compile is slow. Minutes. On-device. The engine can't be prebuilt on your laptop — it has to be built on the Orin itself. So compile it once and cache it on the device's storage. If it lands somewhere ephemeral — a container layer, tmpfs — you recompile on every restart.

And TensorRT lies about memory. Mine kept failing to build — "1 MB free" — while the box had gigabytes idle. The real cause: an unset build workspace size. Too small to fit one FP16 kernel. Set it to 2 GB. Built clean.

trtexec --onnx=yolo.onnx --fp16 \
        --memPoolSize=workspace:2048 \
        --saveEngine=yolo.engine

On this box, "out of memory" usually means the workspace cap, not your RAM.

The Edge Gotchas

nvidia-smi doesn't exist here. It's a Tegra GPU. Use tegrastats instead — GR3D_FREQ is your GPU load.

Builds split by architecture. The Jetson is arm64. Your laptop probably isn't. Anything touching CUDA, TensorRT, or codecs builds natively on the device. The rest cross-compiles with buildx and QEMU. A shared :latest tag must carry both architectures, or you break the other platform's pulls.

The plumbing eats your time, not the model. Force RTSP over TCP or fight a UDP/IPv6 quirk. Watch H.264 vs H.265. The detector was never the bottleneck.

The worst bug was silent. Two stages disagreed on the JSON shape between them. One dropped every message. Everything looked like it was running. Print the real payload. Don't trust what the schema should be.

The Takeaway

The Orin Nano runs a full DeepStream and TensorRT vision pipeline at the edge. Price of a mid-range phone. A few watts. The video stays put.

The Quiet Truth

The model is the easy part. The hardware is ready. The work is everything wrapped around it — codecs, architectures, caches, payloads. Get that right and this little box runs the whole thing without breaking a sweat.

/ Have a take?

Comments

No sign-in, no tracking. Just type your name and your thoughts. Be civil.

Loading…
← Previous Arabic Broke My RAG. Here's What Saved It. All writing → Writing index