*This position is fully remote only for employment in Bulgaria. However, people can also work in one of our offices in Sofia or Varna if they prefer to.
About DataArt
DataArt is a global software engineering firm and a trusted technology partner for market leaders and visionaries. Our world-class team designs and engineers data-driven, cloud-native solutions to deliver immediate and enduring business value.
We promote a culture of radical respect, prioritizing your personal well-being as much as your expertise. We stand firmly against prejudice and inequality, valuing each of our employees equally.
We respect the autonomy of others before all else, offering remote, onsite, and hybrid work options. Our Learning and development centers, R&D labs, and mentorship programs encourage professional growth.
Our long-term approach to collaboration with clients and colleagues alike focuses on building partnerships that extend beyond one-off projects. We provide the ability to switch between projects and technology stacks, creating opportunities for exploration through our learning and networking systems to advance your career.
Position Overview
Our client is a leading UK-based online retailer with over £2 billion in annual revenue. They specialize in innovative grocery e-commerce solutions that enhance shopper experiences and boost productivity and margins for retailers. Over the years, we’ve supported them in developing web portals, mobile apps, delivery systems, staff management tools, and data storage solutions.
We are looking for a skilled Java Developer with SRE expertise to join our distributed team. In this role, you will be responsible for maintaining and improving the reliability, resiliency, scalability, and performance of our onboarded systems.
You will provide on-call support, manage production incidents, and drive continuous improvements to our systems and processes while collaborating closely with engineering and operations teams.
Responsibilities
- Maintain and enhance reliability, resiliency, scalability, and performance of onboarded systems
- Provide on-call support, diagnose, mitigate, fix, and escalate production incidents in a timely manner
- Lead incident follow-ups, root cause analysis, and preventive actions to minimize recurrence
- Implement customer-centric approaches to align system reliability with user experience
- Ensure systems have appropriate SLIs, monitoring, and alerting to meet agreed SLOs
- Identify critical system components requiring enhanced availability in partnership with engineering and operations
- Design and roll out strategies, tooling, and processes to improve system stability and performance
- Develop and maintain CI/CD pipelines for seamless deployment and releases
- Automate repetitive and manual tasks to reduce toil and increase operational efficiency
- Participate in system architecture discussions focused on reliability and reducing maintenance complexity
Requirements
- Java Senior/Expert level with a strong background in Spring Boot
- Experience with shell scripts
- Working experience with Docker, including creation/modification of Docker images
- Maven and Gradle experience
- Understanding of AWS ECS
- Experience working with core AWS Services (SNS, SQS, Kinesis, RDS, DynamoDB, S3, Elasticache)
- Experience with GitLab
- Experience troubleshooting/bugfixing in distributed cloud environments
- Experience with OpenSearch/Kibana
- Understanding of metrics and tracing
- Knowledge of Prometheus and Grafana
- Readiness to be part of a 24/7 on-call support rotations
Nice To Have
- Experience with concurrency in Java
- Python knowledge
- Dependency conflict resolution experience
- Terraform knowledge
- Experience with CloudFormation
- Knowledge of GCP
- Knowledge of BigQuery
- Understanding of core SRE concepts (SLI/SLO/etc)
- Knowledge of reliability patterns (Circuit breaker, Retry, etc.)
What We Offer
- Unique corporate culture – no micromanagement, friendly atmosphere, freedom, and mutual respect
- Flexible schedule – ability to change projects, to work from home, and to try yourself in different roles
- Professional Development Map – a comprehensive map of your professional development within DataArt
- We hire people not for a project, but for the company. If the project (or your work in it) is over, you go to another project or to a paid “Idle”.
- Social benefits – additional health insurance, life insurance, sports card, etc.
- Opportunity to work from another DataArt office in a different city or country (temporarily or permanently)
- Free English courses
- Cozy office with a great atmosphere
- Snacks, drinks, and fruits are always available
reference number: SRE00007