Yifeng Sun
(347) 601-8841 | ys@yifengsun.com | Portfolio | Github
EXPERIENCE
Walmart Hoboken, NJ
Software Engineer - Anomaly Detection Team, Walmart Online Marketplace March 2024 - Present
- Engineered and ensured reliability of a real-time pricing anomaly detection platform that protected marketplace pricing quality, implemented using Kotlin and Java microservices with Kafka-based event streaming with 2500+ TPS.
- Ensured >99.9% decision correctness and reduced false positives by implementing domain-specific validation and scoring logic, including PPU tier validation, seller undercut detection, ML scoring orchestration, and severity-based routing.
- Reduced P99 latency from 1200 ms to 310 ms, by profiling and re-architecting a synchronous request chain into an asynchronous pipeline, enforcing a new latency SLO that empowered downstream teams to decouple, plan, and ship features.
- Increased system efficiency by 30% and cut CPU consumption by 40% through event-deduplication and early-termination logic, reducing downstream data-science calls by approximately 18,000+ requests per day.
- Enhanced platform reliability with 99.99% uptime by architecting and operating a fault-tolerant Kafka-Cassandra data backbone, improving message durability and reducing operational incidents by 25%.
- Accelerated deployment frequency by 3× and reduced rollback incidents by 35% by utilizing existing Jenkins, GitHub, and Helm infrastructure to allow developers to deploy and test on staging environments in parallel on Kubernetes.
- Diagnosed and resolved Kafka consumer idempotency issue causing duplicate processing during traffic spikes, implemented circuit breakers and retry logic to handle 2M+ events/hour, and established runbooks for incident recovery.
- Strengthened compliance readiness and reduced configuration drift by 90% by centralizing sensitive configurations and secrets with Vault in Spring Boot microservices, resulting in faster audit completion and fewer security exceptions.
- Established comprehensive monitoring with Prometheus/Grafana dashboards tracking 15+ service health metrics, reducing MTTR by 45%. Reduced data-access delays from hours to minutes by automating Airflow workflows across GCP BigQuery/Cloud Storage and Hive, to generate human-readable reason codes and tracing suggestions.
Tesla
Software Engineer Intern - Material Flow System Team, Gigafactory Tech July 2021 - October 2021
- Improved system throughput by 29% and reduced database load by implementing a highly efficient Go/Gin backend for the Material Flow System, introducing optimized RESTful APIs across multiple microservices.
- Increased event reliability and cut inter-service latency by 40% through Kafka-based asynchronous processing, ensuring smooth integration with downstream inventory, routing, and analytics services.
- Enhanced operational consistency by centralizing traffic through an API Gateway using TypeScript, reducing cross-service communication errors by 35% and enabling uniform security, monitoring, and rate-limiting across all microservices.
- Scaled infrastructure to support 20-30% higher peak load by architecting resilient AWS architecture, managing EC2 auto-scaling policies, optimizing Lambda cold starts, and implementing EventBridge-based job scheduling.
- Reduced authentication-related failures by 50% and improved access governance by implementing secure JWT and session-cookie RBAC, hardening the system against unauthorized access while maintaining low-latency authorization checks.
Xiaomi
Software Engineer Intern - MIUI Privacy Center May 2021 - July 2021
- Developed Java-based privacy protection features (e.g., Secure Sharing, Privacy Shield) for generic Android XiaomiUI under AOSP guidelines, ensuring compliance with global privacy standards.
- Built secure backend APIs using Spring Boot to manage sensitive user data within Xiaomi’s IoT ecosystem.
- Owned the development and maintenance of the Secure Sharing module, allowing users to remove metadata (e.g., location, timestamps) before exporting files to external platforms.
- Optimized system performance by resolving glitches and addressing abnormal power consumption, utilizing Profiler to pinpoint and fix memory leaks.
SKILLS
Java, Kotlin, Python, Kafka, TypeScript, Cassandra, Docker, Kubernetes, Helm, Istio, Spring Boot, Airflow, Hive, LightGBM, Vault, CI/CD, AWS, multi-region distributed systems, JVM tuning, Prometheus, Grafana, and microservices.
EDUCATION
New York University
Master of Science in Computer Engineering September 2022 - May 2024
Northeastern University
Bachelor of Engineering in Software Engineering September 2018 - June 2022