The Jetson Orin Nano — the Super dev kit, the $250 one — is a palm-sized AI computer. Integrated Ampere GPU. 8 GB shared memory. Around 67 TOPS. It runs on about the power of a phone charger.
No graphics card. No server. And it runs a full computer-vision pipeline on its own.
The Pipeline It Runs
One graph, end to end, on the device:
Pull a live H.265 camera stream. Decode it on the NVDEC hardware — the CPU never touches the pixels. Run a YOLO detector as a TensorRT engine through DeepStream. Track every object across frames. Turn detections into real events.
Live camera in, alert out. All local. The video never leaves the box.
That's the whole point. No cloud round-trip. No per-stream cloud bill. It keeps running through an internet outage.
I ran 1080p at ~15–18 FPS — detection, tracking, and event logic together — well inside its memory and power budget. This is production hardware, not a toy.
TensorRT: The Trick and the Trap
A .engine is a model compiled for this exact chip. That's where the speed comes from.
Two things bite you.
The first compile is slow. Minutes. On-device. The engine can't be prebuilt on your laptop — it has to be built on the Orin itself. So compile it once and cache it on the device's storage. If it lands somewhere ephemeral — a container layer, tmpfs — you recompile on every restart.
And TensorRT lies about memory. Mine kept failing to build — "1 MB free" — while the box had gigabytes idle. The real cause: an unset build workspace size. Too small to fit one FP16 kernel. Set it to 2 GB. Built clean.
trtexec --onnx=yolo.onnx --fp16 \
--memPoolSize=workspace:2048 \
--saveEngine=yolo.engine
On this box, "out of memory" usually means the workspace cap, not your RAM.
The Edge Gotchas
nvidia-smi doesn't exist here. It's a Tegra GPU. Use tegrastats instead — GR3D_FREQ is your GPU load.
Builds split by architecture. The Jetson is arm64. Your laptop probably isn't. Anything touching CUDA, TensorRT, or codecs builds natively on the device. The rest cross-compiles with buildx and QEMU. A shared :latest tag must carry both architectures, or you break the other platform's pulls.
The plumbing eats your time, not the model. Force RTSP over TCP or fight a UDP/IPv6 quirk. Watch H.264 vs H.265. The detector was never the bottleneck.
The worst bug was silent. Two stages disagreed on the JSON shape between them. One dropped every message. Everything looked like it was running. Print the real payload. Don't trust what the schema should be.
The Takeaway
The Orin Nano runs a full DeepStream and TensorRT vision pipeline at the edge. Price of a mid-range phone. A few watts. The video stays put.
The model is the easy part. The hardware is ready. The work is everything wrapped around it — codecs, architectures, caches, payloads. Get that right and this little box runs the whole thing without breaking a sweat.
Comments
No sign-in, no tracking. Just type your name and your thoughts. Be civil.