Skip to main content

Keywords AI, Senior Backend Engineer AI/LLM Platform

@ Prism Partners

About the Role

What You'll Work On

Core Responsibilities

Design and implement high-performance REST APIs that handle real-time LLM request proxying and logging
Build asynchronous processing pipelines using distributed task queues for background job orchestration
Optimize PostgreSQL queries and schema design for transactional data at scale
Implement sophisticated Redis caching strategies for sub-millisecond response times
Design data ingestion pipelines that write millions of events daily to analytical databases
Work with WebSocket connections for real-time streaming responses
Build evaluation and experimentation systems for AI model testing

Required Skills

Must-Have Technical Skills

1. REST API Development (Critical)

Deep understanding of RESTful principles and API design patterns
Experience building production APIs with:
- Authentication/authorization (JWT, API keys, OAuth)
- Request validation and serialization
- Rate limiting and throttling
- Pagination and filtering
- API versioning strategies
Comfortable with OpenAPI/Swagger specifications
Experience with middleware patterns and request/response interceptors

2. PostgreSQL Expertise (Critical)

Advanced SQL query optimization and indexing strategies
Understanding of ACID transactions and isolation levels
Experience with:
- Complex joins and aggregations
- Database migrations and schema evolution
- Connection pooling (pgBouncer, connection pool managers)
- Query plan analysis and performance tuning
Knowledge of PostgreSQL-specific features (CTEs, window functions, JSONB)

3. Redis Proficiency (Critical)

Production experience using Redis for:
- Application caching (cache invalidation strategies)
- Session storage
- Rate limiting
- Message queuing/pub-sub
Understanding of Redis data structures (strings, hashes, sets, sorted sets)
Experience with Redis persistence (AOF, RDB)
Knowledge of Redis clustering and high availability

4. Python Proficiency (Required)

Strong Python 3.11+ experience
Async/await patterns and asyncio
Type hints and Pydantic for data validation
Understanding of Python concurrency (threading, multiprocessing, gevent)
Experience with Python package management (pip, poetry)

5. General Backend Engineering

Strong understanding of HTTP protocol, status codes, headers
Experience with authentication patterns (JWT, session-based, API keys)
Knowledge of CORS, CSRF protection, and web security best practices
Understanding of serialization formats (JSON, Protocol Buffers)
Experience with logging, monitoring, and observability
Proficient with Git version control
Strong debugging and troubleshooting skills

Highly Desired Skills (Major Plus)

Celery & Distributed Task Processing

Production experience with Celery or similar task queue systems (RQ, Dramatiq, Bull)
Understanding of:
- Task routing and queue prioritization
- Worker concurrency models (gevent, eventlet, prefork)
- Task retries, timeouts, and error handling
- Batching strategies for performance optimization
- Task monitoring and debugging
Experience with message brokers (Redis, RabbitMQ, SQS)
Knowledge of distributed systems challenges (eventual consistency, idempotency)

ClickHouse or Analytical Databases

Experience with OLAP databases (ClickHouse, Druid, Snowflake, BigQuery)
Understanding of columnar storage and query optimization
Experience designing schemas for analytical workloads
Knowledge of data partitioning and retention strategies
Experience with time-series data and aggregations

Nice to Have

Experience with LLM APIs (OpenAI, Anthropic, Google Gemini)
Familiarity with WebSocket protocols and real-time systems
Docker and containerization experience
Kubernetes knowledge
Experience with cloud platforms (AWS, Azure, GCP)
OpenTelemetry and distributed tracing
Experience with Stripe or payment processing
Background in evaluation/testing frameworks
S3 or object storage experience
CI/CD pipeline experience

Tech Stack Overview

Current Stack (not required, but helpful):

Primary Framework: Python with Django REST Framework
Databases: PostgreSQL (primary), ClickHouse (analytics)
Caching/Queuing: Redis (2 instances - MQ and cache)
Task Queue: Celery with gevent workers
Web Server: Gunicorn with gevent, Daphne for WebSockets
Authentication: JWT, API keys, OAuth via social-auth
Integration: LiteLLM for multi-provider LLM routing
Monitoring: OpenTelemetry, custom tracing
Storage: AWS S3

Powered by Prism