Software Engineer, Multimodal Storage Infrastructure

Eventualcomputing

San FranciscoUnknownFull TimeSalary not listed

Job details

About Eventual

Every breakthrough Physical AI system — humanoid robots, autonomous vehicles, video generation models — is trained on petabytes of video, lidar, radar, and sensor data. But today's data platforms (Databricks, Snowflake) were built for spreadsheet-like analytics. They don't know how to index a clip by content, co-locate sensors on the same row as video, version multimodal datasets, or push predicates down to a corpus of MP4s. Robotics and video-AI teams build the missing layer themselves: stitching together five to eight tools, organizing disorganized video and sensor data, building schemas and versioning that don't exist. "It was rebuilding what Databricks built 15 years ago for analytics — just for AI data."

Eventual was founded in 2022 to ship that layer once. Our open-source engine, Daft, is the distributed data engine purpose-built for multimodal AI — already running 2 PB/day at Amazon, 60-100 PB at another FAANG company, and in production at Mobileye, TogetherAI, and CloudKitchens. We are building a multimodal warehouse on top of our engine for Physical AI: video, sensors, and sim outputs co-indexed on the same row, aligned on timecode, and versioned — with a content-aware query layer on top.

We're building this in partnership with the top PhysicalAI labs and public AI infrastructure companies today. We have raised $30M from Felicis, CRV, Microsoft M12, Citi, Essence, Y Combinator, Caffeinated Capital, Array.vc, and angels from the co-founders of Databricks and Perplexity. We've assembled a world-class team from AWS, Render, Pinecone and Tesla. We have spent our careers powering the last generation of PhysicalAI in self-driving, and are excited to now do this for the next.

Join our small (but powerful!) team working together 4 days/week in our SF Mission district office.

Your Role

As a Storage Infrastructure Engineer, you'll take everything we know about modern databases and apply it to the world of Physical AI. Our warehouse co-indexes video, sensors, embeddings, and sim outputs on the same row, versioned, with a third query layer (not row/column, not vector/semantic) — content-aware queries over what's inside clips. Your job is to make that layer fast: the right indices for petabyte-scale video, predicate pushdowns that elide whole files, file formats that respect random access into clips, and a query path that turns "left-arm grasp failures on deformable objects" into the smallest possible read.

You should believe, in your bones, that the best read is the read elided.

Key Responsibilities

  • Design and build the storage and indexing layer: row groups, column chunks, secondary indices, vector indices, and the metadata that lets queries skip everything that doesn't matter.

  • Push the query engine harder — predicate pushdown, projection pushdown, late materialization — across multimodal columns including video, embeddings, and sensor streams.

  • Choose, extend, or build on top of modern open formats (Parquet, Iceberg, Delta etc) and build our own/contribute upstream where it makes sense.

  • Build versioning and schema evolution for multimodal datasets so customer data stays reproducible across months of experimentation.

  • Partner with the Dataloading team on the format-to-loader boundary so an iceberg.scan(...) translates into the absolute minimum of bytes hitting NVMe.

  • Partner with the Visual Understanding team to land model outputs in the index without an external glue layer.

What we look for

  • You love thinking about indices. B+ trees, LSM trees, bitmap indices, vector indices, learned indices — you have favorites and you have grudges.

  • You love thinking about query engines. Predicate pushdown makes you happy. Late materialization makes you happier.

  • Strong familiarity with the storage hierarchy: cloud object stores, NVMe, block storage, spinning disk, RAM, GPU memory — and the latency and cost of moving between them.

  • Strong opinions about Parquet — love it or hate it, you've earned the opinion. Same for Iceberg, Delta, Lance, and the other lakehouse formats.

  • A real love for databases and query systems. You read database papers for fun.

  • You believe the best read is the read elided.

Nice to have

  • Background from a storage or table-format team — Lance, Iceberg, Delta, Hudi, Spiral, Snowflake, BigQuery, Databricks Photon, DuckDB, ClickHouse, or similar.

  • You've attempted to build your own database before. Or, at minimum, fantasized about it in detail.

  • Experience with Rust or modern C++ for storage engines.

  • Hands-on time with vector indices (HNSW, IVF, SCANN) or hybrid retrieval systems.

  • Comfort with the OLAP/lakehouse ecosystem: catalogs, file layout, compaction, manifest formats, time travel.

Perks & Benefits

  • In-person, tight-knit team — 4 days/week in our SF Mission office.

  • Competitive comp and meaningful startup equity.

  • Catered lunches and dinners for SF employees.

  • Commuter benefit.

  • Team-building events and poker nights.

  • Health, vision, and dental coverage.

  • Flexible PTO.

  • Latest Apple equipment.

  • 401(k) plan with match.

If you've ever read a Parquet footer for fun and thought "this is so close to what video needs, but yet so far" — we should talk.