[{"data":1,"prerenderedAt":816},["ShallowReactive",2],{"/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes":3,"navigation-en-us":38,"banner-en-us":448,"footer-en-us":458,"blog-post-authors-en-us-Matt Smiley":698,"blog-related-posts-en-us-how-we-diagnosed-and-resolved-redis-latency-spikes":712,"blog-promotions-en-us":753,"next-steps-en-us":806},{"id":4,"title":5,"authorSlugs":6,"body":8,"categorySlug":9,"config":10,"content":14,"description":8,"extension":26,"isFeatured":12,"meta":27,"navigation":28,"path":29,"publishedDate":20,"seo":30,"stem":34,"tagSlugs":35,"__hash__":37},"blogPosts/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes.yml","How We Diagnosed And Resolved Redis Latency Spikes",[7],"matt-smiley",null,"engineering",{"slug":11,"featured":12,"template":13},"how-we-diagnosed-and-resolved-redis-latency-spikes",false,"BlogPost",{"title":15,"description":16,"authors":17,"heroImage":19,"date":20,"body":21,"category":9,"tags":22},"How we diagnosed and resolved Redis latency spikes with BPF and other tools","How we uncovered a three-phase cycle involving two distinct saturation points and a simple fix to break that cycle.",[18],"Matt Smiley","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749667913/Blog/Hero%20Images/clocks.jpg","2022-11-28","If you enjoy performance engineering and peeling back abstraction layers to ask underlying subsystems to explain themselves, this article’s for you. The context is a chronic Redis latency problem, and you are about to tour a practical example of using BPF and profiling tools in concert with standard metrics to reveal unintuitive behaviors of a complex system.\n\nBeyond the tools and techniques, we also use an iterative hypothesis-testing approach to compose a behavior model of the system dynamics. This model tells us what factors influence the problem's severity and triggering conditions.\n\nUltimately, we find the root cause, and its remedy is delightfully boring and effective. We uncover a three-phase cycle involving two distinct saturation points and a simple fix to break that cycle. Along the way, we inspect aspects of the system’s behavior using stack sampling profiles, heat maps and flamegraphs, experimental tuning, source and binary analysis, instruction-level BPF instrumentation, and targeted latency injection under specific entry and exit conditions.\n\nIf you are short on time, the takeaways are summarized at the end. But the journey is the fun part, so let's dig in!\n\n## Introducing the problem: Chronic latency\n\nGitLab makes extensive use of Redis, and, on GitLab.com SaaS, we use [separate Redis clusters](https://handbook.gitlab.com/handbook/engineering/infrastructure/production/architecture/#redis-architecture) for certain functions. This tale concerns a Redis instance acting exclusively as a least recently used (LRU) cache.\n\nThis cache had a chronic latency problem that started occurring intermittently over two years ago and in recent months had become significantly worse: Every few minutes, it suffered from bursts of very high latency and corresponding throughput drop, eating into its Service Level Objective (SLO). These latency spikes impacted user-facing response times and [burned error budgets](https://gitlab.com/gitlab-org/gitlab/-/issues/360578#note_966597336) for dependent features, and this is what we aimed to solve.\n\n**Graph:** Spikes in the rate of extremely slow (1 second) Redis requests, each corresponding to an eviction burst\n\n![Graph showing spikes in the slow request rate every few minutes](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/00_redis_slow_request_rate_spikes_during_each_eviction_burst.png)\n\nIn prior work, we had already completed several mitigating optimizations. These sufficed for a while, but organic growth had resurfaced this as an important [long-term scaling problem](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#why-is-it-important-to-get-to-the-root-of-the-latency-spikes). We had also already ruled out externally triggered causes, such as request floods, connection rate spikes, host-level resource contention, etc. These latency spikes were consistently associated with memory usage reaching the eviction threshold (`maxmemory`), not by changes in client traffic patterns or other processes competing with Redis for CPU time, memory bandwidth, or network I/O.\n\nWe [initially thought](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1567) that Redis 6.2’s new [eviction throttling mechanism](https://github.com/redis/redis/pull/7653) might alleviate our eviction burst overhead. It did not. That mechanism solves a different problem: It prevents a stall condition where a single call to `performEvictions` could run arbitrarily long. In contrast, during this analysis we [discovered](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_977816216) that our problem (both before and after upgrading Redis) was related to numerous calls collectively reducing Redis throughput, rather than a few extremely slow calls causing a complete stall.\n\nTo discover our bottleneck and its potential solutions, we needed to investigate Redis’s behavior during our workload’s eviction bursts.\n\n## A little background on Redis evictions\n\nAt the time, our cache was oversubscribed, trying to hold more cache keys than the [configured `maxmemory` threshold](https://redis.io/docs/reference/eviction/) could hold, so evictions from the LRU cache were expected. But the dense concentration of that eviction overhead was surprising and troubling.\n\nRedis is essentially single-threaded. With a few exceptions, the “main” thread does almost all tasks serially, including handling client requests and evictions, among other things. Spending more time on X means there is less remaining time to do Y, so think about queuing behavior as the story unfolds.\n\nWhenever Redis reaches its `maxmemory` threshold, it frees memory by evicting some keys, aiming to do just enough evictions to get back under `maxmemory`. However, contrary to expectation, the metrics for memory usage and eviction rate (shown below) indicated that instead of a continuous steady eviction rate, there were abrupt burst events that freed much more memory than expected. After each eviction burst, no evictions occurred until memory usage climbed back up to the `maxmemory` threshold again.\n\n**Graph:** Redis memory usage drops by 300-500 MB during each eviction burst:\n\n![Memory usage repeatedly rises gradually to 64 GB and then abruptly drops](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/01_redis_memory_usage_dips_during_eviction_bursts.png)\n\n**Graph:** Key eviction spikes match the timing and size of the memory usage dips shown above\n\n![Eviction counter shows a large spike each time the previous graph showed a large memory usage drop](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/02_redis_eviction_bursts.png)\n\nThis apparent excess of evictions became the central mystery. Initially, we thought answering that question might reveal a way to smooth the eviction rate, spreading out the overhead and avoiding the latency spikes. Instead, we discovered that these bursts are an interaction effect that we need to avoid, but more on that later.\n\n## Eviction bursts cause CPU saturation\n\nAs shown above, we had found that these latency spikes correlated perfectly with large spikes in the cache’s eviction rate, but we did not yet understand why the evictions were concentrated into bursts that last a few seconds and occur every few minutes.\n\nAs a first step, we wanted to verify a causal relationship between eviction bursts and latency spikes.\n\nTo test this, we used [`perf`](https://www.brendangregg.com/perf.html) to run a CPU sampling profile on the Redis main thread. Then we applied a filter to split that profile, isolating the samples where it was calling the [`performEvictions` function](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L512). Using [`flamescope`](https://github.com/Netflix/flamescope), we can visualize the profile’s CPU usage as a [subsecond offset heat map](https://www.brendangregg.com/HeatMaps/subsecondoffset.html), where each second on the X axis is folded into a column of 20 msec buckets along the Y axis. This visualization style highlights sub-second activity patterns. Comparing these two heat maps confirmed that during an eviction burst, `performEvictions` is starving all other main thread code paths for CPU time.\n\n**Graph:** Redis main thread CPU time, excluding calls to `performEvictions`\n\n![Heat map shows one large gap and two small gaps in an otherwise uniform pattern of 70 percent to 80 percent CPU usage](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/03_heat_map_of_redis_main_thread_during_eviction_burst__excluding_performEvictions.png)\n\n**Graph:** Remainder of the same profile, showing only the calls to `performEvictions`\n\n![This heat map shows the gaps in the previous heap map were CPU time spent performing evictions](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/04_heat_map_of_redis_main_thread_during_eviction_burst__only_performEvictions.png)\n\nThese results confirm that eviction bursts are causing CPU starvation on the main thread, which acts as a throughput bottleneck and increases Redis’s response time latency.  These CPU utilization bursts typically lasted a few seconds, so they were too short-lived to trigger alerts but were still user impacting.\n\nFor context, the following flamegraph shows where `performEvictions` spends its CPU time. There are a few interesting things here, but the most important takeaways are:\n* It gets called synchronously by `processCommand` (which handles all client requests).\n* It handles many of its own deletes. Despite its name, the `dbAsyncDelete` function only delegates deletes to a helper thread under certain conditions which turn out to be rare for this workload.\n\n![Flamegraph of calls to function performEvictions, as described above](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/05_flamegraph_of_redis_main_thread_during_eviction_burst__only_performEvictions.png)\n\nFor more details on this analysis, see the [walkthrough and methodology](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_854745083).\n\n## How fast are individual calls to `performEvictions`?\n\nEach incoming request to Redis is handled by a call to `processCommand`, and it always concludes by calling the `performEvictions` function. That call to `performEvictions` is frequently a no-op, returning immediately after checking that the `maxmemory` threshold has not been breached. But when the threshold is exceeded, it will continue evicting keys until it either reaches its `mem_tofree` goal or exceeds its configured time limit per call.\n\nThe CPU heat maps shown earlier proved that `performEvictions` calls were collectively consuming a large majority of CPU time for up to several seconds.\n\nTo complement that, we also measured the wall clock time of individual calls.\n\nUsing the `funclatency` CLI tool (part of the [BCC suite of BPF tools](https://github.com/iovisor/bcc)), we measured call duration by instrumenting entry and exit from the `performEvictions` function and aggregated those measurements into a [histogram](https://en.wikipedia.org/wiki/Histogram) at 1-second intervals. When no evictions were occurring, the calls were consistently low latency (4-7 usecs/call). This is the no-op case described above (including 2.5 usecs/call of instrumentation overhead). But during an eviction burst, the results shift to a bimodal distribution, including a combination of the fast no-op calls along with much slower calls that are actively performing evictions:\n\n```shell\n$ sudo funclatency-bpfcc --microseconds --timestamp --interval 1 --duration 600 --pid $( pgrep -o redis-server ) '/opt/gitlab/embedded/bin/redis-server:performEvictions'\n...\n23:54:03\n     usecs               : count     distribution\n         0 -> 1          : 0        |                                        |\n         2 -> 3          : 576      |************                            |\n         4 -> 7          : 1896     |****************************************|\n         8 -> 15         : 392      |********                                |\n        16 -> 31         : 84       |*                                       |\n        32 -> 63         : 62       |*                                       |\n        64 -> 127        : 94       |*                                       |\n       128 -> 255        : 182      |***                                     |\n       256 -> 511        : 826      |*****************                       |\n       512 -> 1023       : 750      |***************                         |\n\n```\n\nThis measurement also directly confirmed and quantified the throughput drop in Redis requests handled per second: The call rate to `performEvictions` (and hence to `processCommand`) dropped to 20% of its norm from before the evictions began, from 25K to 5K calls per second.\n\nThis has a huge impact on clients: New requests are arriving at 5x the rate they are being completed. And crucially, we will see soon that this asymmetry is what drives the eviction burst.\n\nFor more details on this analysis, see the [safety check](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857869826) for instrumentation overhead and the [results walkthrough](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857907521). And for more general reference, the BPF instrumentation overhead estimate is based on these [benchmark results](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1383).\n\n## Experiment: Can tuning mitigate eviction-driven CPU saturation?\n\nThe analyses so far had shown that evictions were severely starving the Redis main thread for CPU time. There were still important unanswered questions (which we will return to shortly), but this was already enough info to [suggest some experiments](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_859236777) to test potential mitigations:\n* Can we spread out the eviction overhead so it takes longer to reach its goal but consumes a smaller percentage of the main thread’s time?\n* Are evictions freeing more memory than expected due to scheduling a lot of keys to be asynchronously deleted by the [lazyfree mechanism](https://github.com/redis/redis/blob/6.2.6/redis.conf#L1079)? Lazyfree is an optional feature that lets the Redis main thread [delegate to an async helper thread](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_859236777) the expensive task of deleting keys that have more than 64 elements. These async evictions do not count immediately towards the eviction loop’s memory goal, so if many keys qualify for lazyfree, this could potentially drive many extra iterations of the eviction loop.\n\nThe [answers](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7172#note_971197943) to both turned out to be no:\n* Reducing `maxmemory-eviction-tenacity` to its minimum setting still did not make `performEvictions` cheap enough to avoid accumulating a request backlog. It did increase response rate, but arrival rate still far exceeded it, so this was not an effective mitigation.\n* Disabling `lazyfree-lazy-eviction` did not prevent the eviction burst from dropping memory usage far below `maxmemory`. Those lazyfrees represent a small percentage of reclaimed memory. This rules out one of the potential explanations for the mystery of excessive memory being freed.\n\nHaving ruled out two potential mitigations and one candidate hypothesis, at this point we return to the pivotal question: Why are several hundred extra megabytes of memory being freed by the end of each eviction burst?\n\n## Why do evictions occur in bursts and free too much memory?\n\nEach round of eviction aims to free just barely enough memory to get back under the `maxmemory` threshold.\n\nWith a steady rate of demand for new memory allocations, the eviction rate should be similarly steady. The rate of arriving cache writes does appear to be steady. So why are evictions happening in dense bursts, rather than smoothly? And why do they reduce memory usage on a scale of hundreds of megabytes rather than hundreds of bytes?\n\nSome potential explanations to explore:\n* Do evictions only end when a large key gets evicted, spontaneously freeing enough memory to skip evictions for a while? No, the memory usage drop is far bigger than the largest keys in the dataset.\n* Do deferred lazyfree evictions cause the eviction loop to overshoot its goal, freeing more memory than intended? No, the above experiment disproved this hypothesis.\n* Is something causing the eviction loop to sometimes calculate an unexpectedly large value for its `mem_tofree` goal? We explore this next. The answer is no, but checking it led to a new insight.\n* Is a feedback loop causing evictions to become somehow self-amplifying? If so, what conditions lead to entering and leaving this state? This turned out to be correct.\n\nThese were all plausible and testable hypotheses, and each would point towards a different solution to the eviction-driven latency problem.\n\nThe first two hypotheses we have already eliminated.\n\nTo test the next two, we built custom BPF instrumentation to peek at the calculation of `mem_tofree` at the start of each call to `performEvictions`.\n\n## Observing the `mem_tofree` calculation with `bpftrace`\n\nThis part of the investigation was a personal favorite and led to a critical realization about the nature of the problem.\n\nAs noted above, our two remaining hypotheses were:\n* an unexpectedly large `mem_tofree` goal\n* a self-amplifying feedback loop\n\nTo differentiate between them, we used [`bpftrace`](https://github.com/iovisor/bpftrace) to instrument the calculation of `mem_tofree`, looking at its input variables and results.\n\nThis set of measurements directly tests the following:\n* Does each call to `performEvictions` aim to free a small amount of memory -- perhaps roughly the size of an average cache entry? If `mem_tofree` ever approaches hundreds of megabytes, that would confirm the first hypothesis and reveal what part of the calculation was causing that large value. Otherwise, it rules out the first hypothesis and makes the feedback loop hypothesis more likely.\n* Does the replication buffer size significantly influence `mem_tofree` as a feedback mechanism? Each eviction adds to this buffer, just like normal writes do. If this buffer grows large (possibly partly due to evictions) and then abruptly shrinks (due to the peer consuming it), that would cause a spontaneous large drop in memory usage, ending evictions and instantly reducing memory usage. This is one potential way for evictions to drive a feedback loop.\n\nTo peek at the values of the `mem_tofree` calculation ([script](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt)), we needed to isolate the [correct call from `performEvictions`](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L523) to the [`getMaxmemoryState`](https://github.com/redis/redis/blob/6.2.6/src/evict.c#L374-L407) function and reverse engineer its assembly to find the right instruction and register to instrument for each of the source code level variables that we wanted to capture. From that data we generate histograms for each of the following variables:\n\n```text\nmem_reported = zmalloc_used_memory()        // All used memory tracked by jemalloc\noverhead = freeMemoryGetNotCountedMemory()  // Replication output buffers + AOF buffer\nmem_used = mem_reported - overhead          // Non-exempt used memory\nmem_tofree = mem_used - maxmemory           // Eviction goal\n```\n\n_Caveat:_ Our [custom BPF instrumentation](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt) is specific to this particular build of the `redis-server` binary, since it attaches to virtual addresses that are likely to change the next time Redis is compiled. But the approach is able to be generalized. Treat this as a concrete example of using BPF to inspect source code variables in the middle of a function call without having to rebuild the binary. Because we are peeking at the function’s intermediate state and because the compiler inlined this function call, we needed to do binary analysis to find the correct instrumentation points. In general, peeking at a function’s arguments or return value is easier and more portable, but in this case it would not suffice.\n\nThe results:\n* Ruled out the first hypothesis: Each call to `performEvictions` had a small target value (`mem_tofree` \u003C 2 MB). This means each call to `performEvictions` did a small amount of work. Redis’s mysterious rapid drop in memory usage cannot have been caused by an abnormally large `mem_tofree` target evicting a big batch of keys all at once. Instead, there must be many calls collectively driving down memory usage.\n* The replication output buffers remained consistently small, ruling out one of the potential feedback loop mechanisms.\n* Surprisingly, `mem_tofree` was usually 16 KB to 64 KB, which is larger than a typical cache entry. This size discrepancy hints that cache keys may not be the main source of the memory pressure perpetuating the eviction burst once it begins.\n\nAll of the above results were consistent with the feedback loop hypothesis.\n\nIn addition to answering the initial questions, we got a bonus outcome: Concurrently measuring both `mem_tofree` and `mem_used` revealed a crucial new fact – _the memory reclaim is a completely distinct phase from the eviction burst_.\n\nReframing the pathology as exhibiting separate phases for evictions versus memory reclaim led to a series of realizations, described in the next section. From that emerged a coherent hypothesis explaining all the observed properties of the pathology.\n\nFor more details on this analysis, see [methodology notes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636), [build notes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982499538) supporting the disassembly of the Redis binary, and [initial interpretations](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_977994182).\n\n## Three-phase cycle\n\nWith the above results indicating a distinct separation between the evictions and the memory reclaim, we can now concisely characterize [three phases](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982623949) in the cycle of eviction-driven latency spikes.\n\n**Graph:** Diagram (not to scale) comparing memory and CPU usage to request and response rates during each of the three phases\n\n![Diagram summarizes the text that follows, showing CPU and memory saturate in Phase 2 until request rate drops to match response rate, after which they recover](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/06_3_phase_cycle_of_eviction_bursts.png)\n\nPhase 1: Not saturated (7-15 minutes)\n* Memory usage is below `maxmemory`. No evictions occur during this phase.\n* Memory usage grows organically until reaching `maxmemory`, which starts the next phase.\n\nPhase 2: Saturated memory and CPU (6-8 seconds)\n* When memory usage reaches `maxmemory`, evictions begin.\n* Evictions occur only during this phase, and they occur intermittently and frequently.\n* Demand for memory frequently exceeds free capacity, repeatedly pushing memory usage above `maxmemory`. Throughout this phase, memory usage oscillates close to the `maxmemory` threshold, evicting a small amount of memory at a time, just enough to get back under `maxmemory`.\n\nPhase 3: Rapid memory reclaim (30-60 seconds)\n* No evictions occur during this phase.\n* During this phase, something that had been holding a lot of memory starts quickly and steadily releasing it.\n* Without the overhead of running evictions, CPU time is again spent mostly on handling requests (starting with the backlog that accumulated during Phase 2).\n* Memory usage drops rapidly and steadily. By the time this phase ends, hundreds of megabytes have been freed. Afterwards, the cycle restarts with Phase 1.\n\nAt the transition between Phase 2 and Phase 3, evictions abruptly ended because memory usage stays below the `maxmemory` threshold.\n\nReaching that transition point where memory pressure becomes negative signals that whatever was driving the memory demand in Phase 2 has started releasing memory faster than it is consuming it, shrinking the footprint it had accumulated during the previous phase.\n\nWhat is this **mystery memory consumer** that bloats its demand during Phase 2 and frees it during Phase 3?\n\n## The mystery revealed\n\n[Modeling the phase transitions](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982651298) gave us some useful constraints that a viable hypothesis must satisfy. The mystery memory consumer must:\n* quickly bloat its footprint to hundreds of megabytes on a timescale of less than 10 seconds (the duration of Phase 2), under conditions triggered by the start of an eviction burst\n* quickly release its accumulated excess on a timescale of just tens of seconds (the duration of Phase 3), under the conditions immediately following an eviction burst\n\n**The answer:** The client input/output buffers meet those constraints to be the mystery memory consumer.\n\nHere is how that hypothesis plays out:\n* During Phase 1 (healthy state), the Redis main thread’s CPU usage is already fairly high. At the start of Phase 2, when evictions begin, the eviction overhead saturates the main thread’s CPU capacity, quickly dropping response rate below the incoming request rate.\n* This throughput mismatch between arrivals versus responses **is itself the amplifier** that takes over driving the eviction burst. As the size of that rate gap increases, the proportion of time spent doing evictions also increases.\n* Accumulating a backlog of requests requires memory, and that backlog continues to grow until enough clients are stalled that the arrival rate drops to match the response rate. As clients stall, the arrival rate falls, and with it the memory pressure, eviction rate, and CPU overhead begin to reduce.\n* At the equilibrium point when arrival rate falls to match response rate, memory demand is satisfied and evictions stop (ending Phase 2). Without the eviction overhead, more CPU time is available to process the backlog, so response rate increases above request arrival rate. This recovery phase steadily consumes the request backlog, incrementally freeing memory as it goes (Phase 3).\n* Once the backlog is resolved, the arrival and response rates match again. CPU usage is back to its Phase 1 norm, and memory usage has temporarily dropped in proportion to the max size of Phase 2’s request backlog.\n\nWe confirmed this hypothesis via a [latency injection experiment](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_987049036) showing that queuing alone explains the pathology. This outcome supports the conclusion that the extra memory demand originates from response rate falling below request arrival rate.\n\n## Remedies: How to avoid entering the eviction burst cycle\n\nNow that we understand the dynamics of the pathology, we can draw confident conclusions about viable solutions.\n\nRedis evictions are only self-amplifying when all of the following conditions are present:\n* **Memory saturation:** Memory usage reaches the `maxmemory` limit, causing evictions to start.\n* **CPU saturation:** The baseline CPU usage by the Redis main thread’s normal workload is close enough to a whole core that the eviction overhead pushes it to saturation. This reduces the response rate below request arrival rate, inducing self-amplification via increased memory demand for request buffering.\n* **Many active clients:** The saturation only lasts as long as request arrival rate exceeds response rate. Stalled clients no longer contribute to that arrival rate, so the saturation lasts longer and has a greater impact if Redis has many active clients still sending requests.\n\nViable remedies include:\n* Avoid memory saturation by any combination of the following to make peak memory usage less than the `maxmemory` limit:\n  * Reduce cache time to live (TTL)\n  * Increase `maxmemory` (and host memory if needed, but watch out for [`numa_balancing` CPU overhead](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1889) on hosts with multiple NUMA nodes)\n  * Adjust client behavior to avoid writing unnecessary cache entries\n  * Split the cache among multiple instances (sharding or functional partitioning, helps avoid both memory and CPU saturation)\n* Avoid CPU saturation by any combination of the following to make peak CPU usage for the workload plus eviction overhead be less than 1 CPU core:\n  * Use the fastest processor available for single-threaded instructions per second\n  * Isolate the redis-server process (particularly its main thread) from any other competing CPU-intensive processes (dedicated host, taskset, cpuset)\n  * Adjust client behavior to avoid unnecessary cache lookups or writes\n  * Split the cache among multiple instances (sharding or functional partitioning, helps avoid both memory and CPU saturation)\n  * Offload work from the Redis main thread (io-threads, lazyfree)\n  * Reduce eviction tenacity (only gives a minor benefit in our experiments)\n\nMore exotic potential remedies could include a new Redis feature. One idea is to exempt ephemeral allocations like client buffers from counting towards the `maxmemory` limit, instead applying that limit only to key storage. Alternatively, we could limit evictions to only consume at most a configurable percentage of the main thread’s time, so that most of its time is still spent on request throughput rather than eviction overhead.\n\nUnfortunately, either of those features would trade one failure mode for another, reducing the risk of eviction-driven CPU saturation while increasing the risk of unbounded memory growth at the process level, which could potentially saturate the host or cgroup and lead to an OOM, or out of memory, kill. That trade-off may not be worthwhile, and in any case it is not currently an option.\n\n## Our solution\n\nWe had already exhausted the low-hanging fruit for CPU efficiency, so we focused our attention on avoiding memory saturation.\n\nTo improve the cache’s memory efficiency, we [evaluated](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_990891708) which types of cache keys were using the most space and how much [`IDLETIME`](https://redis.io/commands/object-idletime/) they had accrued since last access. This memory usage profile identified some rarely used cache entries (which waste space), helped inform the TTL, or time to live, tuning by first focusing on keys with a high idle time, and highlighted some useful potential cutpoints for functionally partitioning the cache.\n\nWe [decided](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_1014582669) to concurrently pursue several cache efficiency improvements and opened an [epic](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/764) for it. The goal was to avoid chronic memory saturation, and the main action items were:\n* Iteratively reduce the cache’s [default TTL](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1854) from 2 weeks to 8 hours (helped a lot!)\n* Switch to [client-side caching](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_1026821730) for certain cache keys (efficiently avoids spending shared cache space on non-shared cache entries)\n* [Partition](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/762) a set of cache keys to a separate Redis instance\n\nThe TTL reduction was the simplest solution and turned out to be a big win. One of our main concerns with TTL reduction was that the additional cache misses could potentially increase workload on other parts of the infrastructure. Some cache misses are more expensive than others, and our metrics are not granular enough to quantify the cost of cache misses per type of cache entry. This concern is why we applied the TTL adjustment incrementally and monitored for SLO violations. Fortunately, our inference was correct: Reducing TTL did not significantly reduce the cache hit rate, and the additional cache misses did not cause noticeable impact to downstream subsystems.\n\nThe TTL reduction turned out to be sufficient to drop memory usage consistently a little below its saturation point.\n\nIncreasing `maxmemory` had initially not been feasible because the original peak memory demand (prior to the efficiency improvements) was expected to be larger than the max size of the VMs we use for Redis. However, once we dropped memory demand below saturation, then we could confidently [provision headroom](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1868) for future growth and re-enable [saturation alerting](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1883).\n\n## Results\n\nThe following graph shows Redis memory usage transitioning out of its chronically saturated state, with annotations describing the milestones when latency spikes ended and when saturation margin became wide enough to be considered safe:\n\n![Redis memory usage stops showing a flat top saturation](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/07_epic_results__memory_saturation_avoided_by_TTL_reductions.png)\n\nZooming into the days when we rolled out the TTL adjustments, we can see the harmful eviction-driven latency spikes vanish as we drop memory usage below its saturation point, exactly as predicted:\n\n![Redis memory usage starts as a flat line and then falls below that saturation line](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/08_results__redis_memory_usage_stops_saturating.png)\n\n![Redis response time spikes stop occurring at the exact point when memory stops being saturated](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/09_results__redis_latency_spikes_end.png)\n\nThese eviction-driven latency spikes had been the biggest cause of slowess in Redis cache.\n\nSolving this source of slowness significantly improved the user experience. This 1-year lookback shows only the long-tail portion of the improvement, not even the full benefit.  Each weekday had roughly 2 million Redis requests slower than 1 second, until our fix in mid-August:\n\n![Graph of the daily count of Redis cache requests slower than 1 second, showing roughly 2 million slow requests per day on weekdays until mid-August, when the TTL adjustments were applied](https://about.gitlab.com/images/blogimages/2022-11-28-diagnosing-redis-latency-spikes-with-bpf-and-friends/10_results__1_year_retrospective_of_slow_redis_requests_per_day.png)\n\n## Conclusions\n\nWe solved a long-standing latency problem that had been worsening as the workload grew, and we learned a lot along the way. This article focuses mostly on the Redis discoveries, since those are general behaviors that some of you may encounter in your travels. We also developed some novel tools and analytical methods and uncovered several useful environment-specific facts about our workload, infrastructure, and observability, leading to several additional improvements and proposals not mentioned above.\n\nOverall, we made several efficiency improvements and broke the cycle that was driving the pathology. Memory demand now stays well below the saturation point, eliminating the latency spikes that were burning error budgets for the development teams and causing intermittent slowness for users. All stakeholders are happy, and we came away with deeper domain knowledge and sharper skills!\n\n## Key insights summary\n\nThe following notes summarize what we learned about Redis eviction behavior (current as of version 6.2):\n* The same memory budget (`maxmemory`) is shared by key storage and client connection buffers. A spike in demand for client connection buffers counts towards the `maxmemory` limit, in the same way that a spike in key inserts or key size would.\n* Redis performs evictions in the foreground on its main thread. All time spent in `performEvictions` is time not spent handling client requests. Consequently, during an eviction burst, Redis has a lower throughput ceiling.\n* If eviction overhead saturates the main thread’s CPU, then response rate falls below request arrival rate. Redis accumulates a request backlog (which consumes memory), and clients experience this as slowness.\n* The memory used for pending requests requires more evictions, driving the eviction burst until enough clients are stalled that arrival rate falls back below response rate. At that equilibrium point, evictions stop, eviction overhead vanishes, Redis rapidly handles its request backlog, and that backlog’s memory gets freed.\n* Triggering this cycle requires all of the following:\n  * Redis is configured with a `maxmemory` limit, and its memory demand exceeds that size. This memory saturation causes evictions to begin.\n  * Redis main thread’s CPU utilization is high enough under its normal workload that having to also perform evictions drives it to CPU saturation. This reduces response rate below request rate, causing a growing request backlog and high latency.\n  * Many active clients are connected. The duration of the eviction burst and the size of memory spent on client connection buffers increases proportionally to the number of active clients.\n* Prevent this cycle by avoiding either memory or CPU saturation. In our case, avoiding memory saturation was easier (mainly by reducing cache TTL).\n\n## Further reading\n\nThe following lists summarize the analytical tools and methods cited in this article. These tools are all highly versatile and any of them can provide a massive level-up when working on performance engineering problems.\n\nTools:\n* [perf](https://www.brendangregg.com/perf.html) - A Linux performance analysis multitool. In this article, we used `perf` as a sampling profiler, capturing periodic stack traces of the `redis-server` process's main thread when it is actively running on a CPU.\n* [Flamescope](https://github.com/Netflix/flamescope) - A visualization tool for rendering a `perf` profile (and other formats) into an interactive subsecond heat map. This tool invites the user to explore the timeline for microbursts of activity or inactivity and render flamegraphs of those interesting timespans to explore what code paths were active.\n* [BCC](https://github.com/iovisor/bcc) - BCC is a framework for building BPF tools, and it ships with many useful tools out of the box. In this article, we used `funclatency` to measure the call durations of a specific Redis function and render the results as a histogram.\n* [bpftrace](https://github.com/iovisor/bpftrace) - Another BPF framework, ideal for answering ad-hoc questions about your system's behavior. It uses an `awk`-like syntax and is [quick to learn](https://github.com/iovisor/bpftrace#readme). In this article, we wrote a [custom `bpftrace` script](https://gitlab.com/gitlab-com/gl-infra/scalability/uploads/cab2cd03231f8dd4819f77b44d768cb9/redis_snoop.getMaxmemoryState.sha_25a228b839a93a1395907a03f83e1eee448b0f14.production_thresholds.bt) for observing the variables used in computing how much memory to free during each round of evictions. This script's instrumentation points are specific to our particular build of `redis-server`, but the [approach is able to be generalized](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636) and illustrates how versatile this tool can be.\n\nUsage examples:\n* [Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_854745083) - Walkthrough of using `perf` and `flamescope` to capture, filter, and visualize the stack sampling CPU profiles of the Redis main thread.\n* [Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_857869826) - Walkthrough (including safety check) of using `funclatency` to measure the durations of the frequent calls to function `performEvictions`.\n* [Example](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7172#note_971197943) - Experiment for adjusting Redis settings `lazyfree-lazy-eviction` and `maxmemory-eviction-tenacity` and observing the results using `perf`, `funclatency`, `funcslower`, and the Redis metrics for eviction count and memory usage.\n* [Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982498636) - This is a working example (script included) of using `bpftrace` to observe the values of a function's variables. In this case we inspected the `mem_tofree` calculation at the start of `performEvictions`. Also, these [companion notes](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_982499538) discuss some build-specific considerations.\n* [Example](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1601#note_987049036) - Describes the latency injection experiment (the first of the three ideas). This experiment confirmed that memory demand increases at the predicted rate when we slow response rate to below request arrival rate, in the same way evictions do. This result confirmed the request queuing itself is the source of the memory pressure that amplifies the eviction burst once it begins.\n",[23,24,25],"performance","tutorial","DevOps","yml",{},true,"/en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes",{"title":15,"description":16,"ogTitle":15,"ogDescription":16,"noIndex":12,"ogImage":19,"ogUrl":31,"ogSiteName":32,"ogType":33,"canonicalUrls":31},"https://about.gitlab.com/blog/how-we-diagnosed-and-resolved-redis-latency-spikes","https://about.gitlab.com","article","en-us/blog/how-we-diagnosed-and-resolved-redis-latency-spikes",[23,24,36],"devops","CgHzMhslblVYXrk-z-zLdL2y9JglO1sY7wrgcptIMyU",{"data":39},{"logo":40,"freeTrial":45,"sales":50,"login":55,"items":60,"search":368,"minimal":399,"duo":418,"switchNav":427,"pricingDeployment":438},{"config":41},{"href":42,"dataGaName":43,"dataGaLocation":44},"/","gitlab logo","header",{"text":46,"config":47},"Get free trial",{"href":48,"dataGaName":49,"dataGaLocation":44},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":51,"config":52},"Talk to sales",{"href":53,"dataGaName":54,"dataGaLocation":44},"/sales/","sales",{"text":56,"config":57},"Sign in",{"href":58,"dataGaName":59,"dataGaLocation":44},"https://gitlab.com/users/sign_in/","sign in",[61,88,183,188,289,349],{"text":62,"config":63,"cards":65},"Platform",{"dataNavLevelOne":64},"platform",[66,72,80],{"title":62,"description":67,"link":68},"The intelligent orchestration platform for DevSecOps",{"text":69,"config":70},"Explore our Platform",{"href":71,"dataGaName":64,"dataGaLocation":44},"/platform/",{"title":73,"description":74,"link":75},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":76,"config":77},"Meet GitLab Duo",{"href":78,"dataGaName":79,"dataGaLocation":44},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":81,"description":82,"link":83},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":84,"config":85},"Learn more",{"href":86,"dataGaName":87,"dataGaLocation":44},"/why-gitlab/","why gitlab",{"text":89,"left":28,"config":90,"link":92,"lists":96,"footer":165},"Product",{"dataNavLevelOne":91},"solutions",{"text":93,"config":94},"View all Solutions",{"href":95,"dataGaName":91,"dataGaLocation":44},"/solutions/",[97,121,144],{"title":98,"description":99,"link":100,"items":105},"Automation","CI/CD and automation to accelerate deployment",{"config":101},{"icon":102,"href":103,"dataGaName":104,"dataGaLocation":44},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[106,110,113,117],{"text":107,"config":108},"CI/CD",{"href":109,"dataGaLocation":44,"dataGaName":107},"/solutions/continuous-integration/",{"text":73,"config":111},{"href":78,"dataGaLocation":44,"dataGaName":112},"gitlab duo agent platform - product menu",{"text":114,"config":115},"Source Code Management",{"href":116,"dataGaLocation":44,"dataGaName":114},"/solutions/source-code-management/",{"text":118,"config":119},"Automated Software Delivery",{"href":103,"dataGaLocation":44,"dataGaName":120},"Automated software delivery",{"title":122,"description":123,"link":124,"items":129},"Security","Deliver code faster without compromising security",{"config":125},{"href":126,"dataGaName":127,"dataGaLocation":44,"icon":128},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[130,134,139],{"text":131,"config":132},"Application Security Testing",{"href":126,"dataGaName":133,"dataGaLocation":44},"Application security testing",{"text":135,"config":136},"Software Supply Chain Security",{"href":137,"dataGaLocation":44,"dataGaName":138},"/solutions/supply-chain/","Software supply chain security",{"text":140,"config":141},"Software Compliance",{"href":142,"dataGaName":143,"dataGaLocation":44},"/solutions/software-compliance/","software compliance",{"title":145,"link":146,"items":151},"Measurement",{"config":147},{"icon":148,"href":149,"dataGaName":150,"dataGaLocation":44},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[152,156,160],{"text":153,"config":154},"Visibility & Measurement",{"href":149,"dataGaLocation":44,"dataGaName":155},"Visibility and Measurement",{"text":157,"config":158},"Value Stream Management",{"href":159,"dataGaLocation":44,"dataGaName":157},"/solutions/value-stream-management/",{"text":161,"config":162},"Analytics & Insights",{"href":163,"dataGaLocation":44,"dataGaName":164},"/solutions/analytics-and-insights/","Analytics and insights",{"title":166,"items":167},"GitLab for",[168,173,178],{"text":169,"config":170},"Enterprise",{"href":171,"dataGaLocation":44,"dataGaName":172},"/enterprise/","enterprise",{"text":174,"config":175},"Small Business",{"href":176,"dataGaLocation":44,"dataGaName":177},"/small-business/","small business",{"text":179,"config":180},"Public Sector",{"href":181,"dataGaLocation":44,"dataGaName":182},"/solutions/public-sector/","public sector",{"text":184,"config":185},"Pricing",{"href":186,"dataGaName":187,"dataGaLocation":44,"dataNavLevelOne":187},"/pricing/","pricing",{"text":189,"config":190,"link":192,"lists":196,"feature":276},"Resources",{"dataNavLevelOne":191},"resources",{"text":193,"config":194},"View all resources",{"href":195,"dataGaName":191,"dataGaLocation":44},"/resources/",[197,230,248],{"title":198,"items":199},"Getting started",[200,205,210,215,220,225],{"text":201,"config":202},"Install",{"href":203,"dataGaName":204,"dataGaLocation":44},"/install/","install",{"text":206,"config":207},"Quick start guides",{"href":208,"dataGaName":209,"dataGaLocation":44},"/get-started/","quick setup checklists",{"text":211,"config":212},"Learn",{"href":213,"dataGaLocation":44,"dataGaName":214},"https://university.gitlab.com/","learn",{"text":216,"config":217},"Product documentation",{"href":218,"dataGaName":219,"dataGaLocation":44},"https://docs.gitlab.com/","product documentation",{"text":221,"config":222},"Best practice videos",{"href":223,"dataGaName":224,"dataGaLocation":44},"/getting-started-videos/","best practice videos",{"text":226,"config":227},"Integrations",{"href":228,"dataGaName":229,"dataGaLocation":44},"/integrations/","integrations",{"title":231,"items":232},"Discover",[233,238,243],{"text":234,"config":235},"Customer success stories",{"href":236,"dataGaName":237,"dataGaLocation":44},"/customers/","customer success stories",{"text":239,"config":240},"Blog",{"href":241,"dataGaName":242,"dataGaLocation":44},"/blog/","blog",{"text":244,"config":245},"Remote",{"href":246,"dataGaName":247,"dataGaLocation":44},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":249,"items":250},"Connect",[251,256,261,266,271],{"text":252,"config":253},"GitLab Services",{"href":254,"dataGaName":255,"dataGaLocation":44},"/services/","services",{"text":257,"config":258},"Community",{"href":259,"dataGaName":260,"dataGaLocation":44},"/community/","community",{"text":262,"config":263},"Forum",{"href":264,"dataGaName":265,"dataGaLocation":44},"https://forum.gitlab.com/","forum",{"text":267,"config":268},"Events",{"href":269,"dataGaName":270,"dataGaLocation":44},"/events/","events",{"text":272,"config":273},"Partners",{"href":274,"dataGaName":275,"dataGaLocation":44},"/partners/","partners",{"backgroundColor":277,"textColor":278,"text":279,"image":280,"link":284},"#2f2a6b","#fff","Insights for the future of software development",{"altText":281,"config":282},"the source promo card",{"src":283},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":285,"config":286},"Read the latest",{"href":287,"dataGaName":288,"dataGaLocation":44},"/the-source/","the source",{"text":290,"config":291,"lists":293},"Company",{"dataNavLevelOne":292},"company",[294],{"items":295},[296,301,307,309,314,319,324,329,334,339,344],{"text":297,"config":298},"About",{"href":299,"dataGaName":300,"dataGaLocation":44},"/company/","about",{"text":302,"config":303,"footerGa":306},"Jobs",{"href":304,"dataGaName":305,"dataGaLocation":44},"/jobs/","jobs",{"dataGaName":305},{"text":267,"config":308},{"href":269,"dataGaName":270,"dataGaLocation":44},{"text":310,"config":311},"Leadership",{"href":312,"dataGaName":313,"dataGaLocation":44},"/company/team/e-group/","leadership",{"text":315,"config":316},"Team",{"href":317,"dataGaName":318,"dataGaLocation":44},"/company/team/","team",{"text":320,"config":321},"Handbook",{"href":322,"dataGaName":323,"dataGaLocation":44},"https://handbook.gitlab.com/","handbook",{"text":325,"config":326},"Investor relations",{"href":327,"dataGaName":328,"dataGaLocation":44},"https://ir.gitlab.com/","investor relations",{"text":330,"config":331},"Trust Center",{"href":332,"dataGaName":333,"dataGaLocation":44},"/security/","trust center",{"text":335,"config":336},"AI Transparency Center",{"href":337,"dataGaName":338,"dataGaLocation":44},"/ai-transparency-center/","ai transparency center",{"text":340,"config":341},"Newsletter",{"href":342,"dataGaName":343,"dataGaLocation":44},"/company/contact/#contact-forms","newsletter",{"text":345,"config":346},"Press",{"href":347,"dataGaName":348,"dataGaLocation":44},"/press/","press",{"text":350,"config":351,"lists":352},"Contact us",{"dataNavLevelOne":292},[353],{"items":354},[355,358,363],{"text":51,"config":356},{"href":53,"dataGaName":357,"dataGaLocation":44},"talk to sales",{"text":359,"config":360},"Support portal",{"href":361,"dataGaName":362,"dataGaLocation":44},"https://support.gitlab.com","support portal",{"text":364,"config":365},"Customer portal",{"href":366,"dataGaName":367,"dataGaLocation":44},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":369,"login":370,"suggestions":377},"Close",{"text":371,"link":372},"To search repositories and projects, login to",{"text":373,"config":374},"gitlab.com",{"href":58,"dataGaName":375,"dataGaLocation":376},"search login","search",{"text":378,"default":379},"Suggestions",[380,382,386,388,392,396],{"text":73,"config":381},{"href":78,"dataGaName":73,"dataGaLocation":376},{"text":383,"config":384},"Code Suggestions (AI)",{"href":385,"dataGaName":383,"dataGaLocation":376},"/solutions/code-suggestions/",{"text":107,"config":387},{"href":109,"dataGaName":107,"dataGaLocation":376},{"text":389,"config":390},"GitLab on AWS",{"href":391,"dataGaName":389,"dataGaLocation":376},"/partners/technology-partners/aws/",{"text":393,"config":394},"GitLab on Google Cloud",{"href":395,"dataGaName":393,"dataGaLocation":376},"/partners/technology-partners/google-cloud-platform/",{"text":397,"config":398},"Why GitLab?",{"href":86,"dataGaName":397,"dataGaLocation":376},{"freeTrial":400,"mobileIcon":405,"desktopIcon":410,"secondaryButton":413},{"text":401,"config":402},"Start free trial",{"href":403,"dataGaName":49,"dataGaLocation":404},"https://gitlab.com/-/trials/new/","nav",{"altText":406,"config":407},"Gitlab Icon",{"src":408,"dataGaName":409,"dataGaLocation":404},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":406,"config":411},{"src":412,"dataGaName":409,"dataGaLocation":404},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":414,"config":415},"Get Started",{"href":416,"dataGaName":417,"dataGaLocation":404},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":419,"mobileIcon":423,"desktopIcon":425},{"text":420,"config":421},"Learn more about GitLab Duo",{"href":78,"dataGaName":422,"dataGaLocation":404},"gitlab duo",{"altText":406,"config":424},{"src":408,"dataGaName":409,"dataGaLocation":404},{"altText":406,"config":426},{"src":412,"dataGaName":409,"dataGaLocation":404},{"button":428,"mobileIcon":433,"desktopIcon":435},{"text":429,"config":430},"/switch",{"href":431,"dataGaName":432,"dataGaLocation":404},"#contact","switch",{"altText":406,"config":434},{"src":408,"dataGaName":409,"dataGaLocation":404},{"altText":406,"config":436},{"src":437,"dataGaName":409,"dataGaLocation":404},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":439,"mobileIcon":444,"desktopIcon":446},{"text":440,"config":441},"Back to pricing",{"href":186,"dataGaName":442,"dataGaLocation":404,"icon":443},"back to pricing","GoBack",{"altText":406,"config":445},{"src":408,"dataGaName":409,"dataGaLocation":404},{"altText":406,"config":447},{"src":412,"dataGaName":409,"dataGaLocation":404},{"title":449,"button":450,"config":455},"See how agentic AI transforms software delivery",{"text":451,"config":452},"Watch GitLab Transcend now",{"href":453,"dataGaName":454,"dataGaLocation":44},"/events/transcend/virtual/","transcend event",{"layout":456,"icon":457,"disabled":28},"release","AiStar",{"data":459},{"text":460,"source":461,"edit":467,"contribute":472,"config":477,"items":482,"minimal":687},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":462,"config":463},"View page source",{"href":464,"dataGaName":465,"dataGaLocation":466},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":468,"config":469},"Edit this page",{"href":470,"dataGaName":471,"dataGaLocation":466},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":473,"config":474},"Please contribute",{"href":475,"dataGaName":476,"dataGaLocation":466},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":478,"facebook":479,"youtube":480,"linkedin":481},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[483,530,582,626,653],{"title":184,"links":484,"subMenu":499},[485,489,494],{"text":486,"config":487},"View plans",{"href":186,"dataGaName":488,"dataGaLocation":466},"view plans",{"text":490,"config":491},"Why Premium?",{"href":492,"dataGaName":493,"dataGaLocation":466},"/pricing/premium/","why premium",{"text":495,"config":496},"Why Ultimate?",{"href":497,"dataGaName":498,"dataGaLocation":466},"/pricing/ultimate/","why ultimate",[500],{"title":501,"links":502},"Contact Us",[503,506,508,510,515,520,525],{"text":504,"config":505},"Contact sales",{"href":53,"dataGaName":54,"dataGaLocation":466},{"text":359,"config":507},{"href":361,"dataGaName":362,"dataGaLocation":466},{"text":364,"config":509},{"href":366,"dataGaName":367,"dataGaLocation":466},{"text":511,"config":512},"Status",{"href":513,"dataGaName":514,"dataGaLocation":466},"https://status.gitlab.com/","status",{"text":516,"config":517},"Terms of use",{"href":518,"dataGaName":519,"dataGaLocation":466},"/terms/","terms of use",{"text":521,"config":522},"Privacy statement",{"href":523,"dataGaName":524,"dataGaLocation":466},"/privacy/","privacy statement",{"text":526,"config":527},"Cookie preferences",{"dataGaName":528,"dataGaLocation":466,"id":529,"isOneTrustButton":28},"cookie preferences","ot-sdk-btn",{"title":89,"links":531,"subMenu":540},[532,536],{"text":533,"config":534},"DevSecOps platform",{"href":71,"dataGaName":535,"dataGaLocation":466},"devsecops platform",{"text":537,"config":538},"AI-Assisted Development",{"href":78,"dataGaName":539,"dataGaLocation":466},"ai-assisted development",[541],{"title":542,"links":543},"Topics",[544,549,554,557,562,567,572,577],{"text":545,"config":546},"CICD",{"href":547,"dataGaName":548,"dataGaLocation":466},"/topics/ci-cd/","cicd",{"text":550,"config":551},"GitOps",{"href":552,"dataGaName":553,"dataGaLocation":466},"/topics/gitops/","gitops",{"text":25,"config":555},{"href":556,"dataGaName":36,"dataGaLocation":466},"/topics/devops/",{"text":558,"config":559},"Version Control",{"href":560,"dataGaName":561,"dataGaLocation":466},"/topics/version-control/","version control",{"text":563,"config":564},"DevSecOps",{"href":565,"dataGaName":566,"dataGaLocation":466},"/topics/devsecops/","devsecops",{"text":568,"config":569},"Cloud Native",{"href":570,"dataGaName":571,"dataGaLocation":466},"/topics/cloud-native/","cloud native",{"text":573,"config":574},"AI for Coding",{"href":575,"dataGaName":576,"dataGaLocation":466},"/topics/devops/ai-for-coding/","ai for coding",{"text":578,"config":579},"Agentic AI",{"href":580,"dataGaName":581,"dataGaLocation":466},"/topics/agentic-ai/","agentic ai",{"title":583,"links":584},"Solutions",[585,587,589,594,598,601,605,608,610,613,616,621],{"text":131,"config":586},{"href":126,"dataGaName":131,"dataGaLocation":466},{"text":120,"config":588},{"href":103,"dataGaName":104,"dataGaLocation":466},{"text":590,"config":591},"Agile development",{"href":592,"dataGaName":593,"dataGaLocation":466},"/solutions/agile-delivery/","agile delivery",{"text":595,"config":596},"SCM",{"href":116,"dataGaName":597,"dataGaLocation":466},"source code management",{"text":545,"config":599},{"href":109,"dataGaName":600,"dataGaLocation":466},"continuous integration & delivery",{"text":602,"config":603},"Value stream management",{"href":159,"dataGaName":604,"dataGaLocation":466},"value stream management",{"text":550,"config":606},{"href":607,"dataGaName":553,"dataGaLocation":466},"/solutions/gitops/",{"text":169,"config":609},{"href":171,"dataGaName":172,"dataGaLocation":466},{"text":611,"config":612},"Small business",{"href":176,"dataGaName":177,"dataGaLocation":466},{"text":614,"config":615},"Public sector",{"href":181,"dataGaName":182,"dataGaLocation":466},{"text":617,"config":618},"Education",{"href":619,"dataGaName":620,"dataGaLocation":466},"/solutions/education/","education",{"text":622,"config":623},"Financial services",{"href":624,"dataGaName":625,"dataGaLocation":466},"/solutions/finance/","financial services",{"title":189,"links":627},[628,630,632,634,637,639,641,643,645,647,649,651],{"text":201,"config":629},{"href":203,"dataGaName":204,"dataGaLocation":466},{"text":206,"config":631},{"href":208,"dataGaName":209,"dataGaLocation":466},{"text":211,"config":633},{"href":213,"dataGaName":214,"dataGaLocation":466},{"text":216,"config":635},{"href":218,"dataGaName":636,"dataGaLocation":466},"docs",{"text":239,"config":638},{"href":241,"dataGaName":242,"dataGaLocation":466},{"text":234,"config":640},{"href":236,"dataGaName":237,"dataGaLocation":466},{"text":244,"config":642},{"href":246,"dataGaName":247,"dataGaLocation":466},{"text":252,"config":644},{"href":254,"dataGaName":255,"dataGaLocation":466},{"text":257,"config":646},{"href":259,"dataGaName":260,"dataGaLocation":466},{"text":262,"config":648},{"href":264,"dataGaName":265,"dataGaLocation":466},{"text":267,"config":650},{"href":269,"dataGaName":270,"dataGaLocation":466},{"text":272,"config":652},{"href":274,"dataGaName":275,"dataGaLocation":466},{"title":290,"links":654},[655,657,659,661,663,665,667,671,676,678,680,682],{"text":297,"config":656},{"href":299,"dataGaName":292,"dataGaLocation":466},{"text":302,"config":658},{"href":304,"dataGaName":305,"dataGaLocation":466},{"text":310,"config":660},{"href":312,"dataGaName":313,"dataGaLocation":466},{"text":315,"config":662},{"href":317,"dataGaName":318,"dataGaLocation":466},{"text":320,"config":664},{"href":322,"dataGaName":323,"dataGaLocation":466},{"text":325,"config":666},{"href":327,"dataGaName":328,"dataGaLocation":466},{"text":668,"config":669},"Sustainability",{"href":670,"dataGaName":668,"dataGaLocation":466},"/sustainability/",{"text":672,"config":673},"Diversity, inclusion and belonging (DIB)",{"href":674,"dataGaName":675,"dataGaLocation":466},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":330,"config":677},{"href":332,"dataGaName":333,"dataGaLocation":466},{"text":340,"config":679},{"href":342,"dataGaName":343,"dataGaLocation":466},{"text":345,"config":681},{"href":347,"dataGaName":348,"dataGaLocation":466},{"text":683,"config":684},"Modern Slavery Transparency Statement",{"href":685,"dataGaName":686,"dataGaLocation":466},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":688},[689,692,695],{"text":690,"config":691},"Terms",{"href":518,"dataGaName":519,"dataGaLocation":466},{"text":693,"config":694},"Cookies",{"dataGaName":528,"dataGaLocation":466,"id":529,"isOneTrustButton":28},{"text":696,"config":697},"Privacy",{"href":523,"dataGaName":524,"dataGaLocation":466},[699],{"id":700,"title":18,"body":8,"config":701,"content":703,"description":8,"extension":26,"meta":707,"navigation":28,"path":708,"seo":709,"stem":710,"__hash__":711},"blogAuthors/en-us/blog/authors/matt-smiley.yml",{"template":702},"BlogAuthor",{"name":18,"config":704},{"headshot":705,"ctfId":706},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749682529/Blog/Author%20Headshots/msmiley-headshot.jpg","msmiley",{},"/en-us/blog/authors/matt-smiley",{},"en-us/blog/authors/matt-smiley","GaLE5OWIxcwOcDqqFiyWy1UVdAIZeYkr1nYOINaNM8E",[713,727,740],{"content":714,"config":725},{"body":715,"title":716,"description":717,"authors":718,"heroImage":720,"date":721,"category":9,"tags":722},"Most CI/CD tools can run a build and ship a deployment. Where they diverge is what happens when your delivery needs get real: a monorepo with a dozen services, microservices spread across multiple repositories, deployments to dozens of environments, or a platform team trying to enforce standards without becoming a bottleneck.\n  \nGitLab's pipeline execution model was designed for that complexity. Parent-child pipelines, DAG execution, dynamic pipeline generation, multi-project triggers, merge request pipelines with merged results, and CI/CD Components each solve a distinct class of problems. Because they compose, understanding the full model unlocks something more than a faster pipeline. In this article, you'll learn about the five patterns where that model stands out, each mapped to a real engineering scenario with the configuration to match.\n  \nThe configs below are illustrative. The scripts use echo commands to keep the signal-to-noise ratio low. Swap them out for your actual build, test, and deploy steps and they are ready to use.\n\n\n## 1. Monorepos: Parent-child pipelines + DAG execution\n\n\nThe problem: Your monorepo has a frontend, a backend, and a docs site. Every commit triggers a full rebuild of everything, even when only a README changed.\n\n\nGitLab solves this with two complementary features: [parent-child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#parent-child-pipelines) (which let a top-level pipeline spawn isolated sub-pipelines) and [DAG execution via `needs`](https://docs.gitlab.com/ci/yaml/#needs) (which breaks rigid stage-by-stage ordering and lets jobs start the moment their dependencies finish).\n\n\nA parent pipeline detects what changed and triggers only the relevant child pipelines:\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - trigger\n\ntrigger-services:\n  stage: trigger\n  trigger:\n    include:\n      - local: '.gitlab/ci/api-service.yml'\n      - local: '.gitlab/ci/web-service.yml'\n      - local: '.gitlab/ci/worker-service.yml'\n    strategy: depend\n```\n\n\nEach child pipeline is a fully independent pipeline with its own stages, jobs, and artifacts. The parent waits for all of them via [strategy: depend](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#wait-for-downstream-pipeline-to-complete) so you get a single green/red signal at the top level, with full drill-down into each service's pipeline. This organizational separation is the bigger win for large teams: each service owns its pipeline config, changes in one cannot break another, and the complexity stays manageable as the repo grows.\n\n\nOne thing worth knowing: when you pass [multiple files to a single `trigger: include:`](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#combine-multiple-child-pipeline-configuration-files), GitLab merges them into a single child pipeline configuration. This means jobs defined across those files share the same pipeline context and can reference each other with `needs:`, which is what makes the DAG optimization possible. If you split them into separate trigger jobs instead, each would be its own isolated pipeline and cross-file `needs:` references would not work.\n\n\nCombine this with `needs:` inside each child pipeline and you get DAG execution. Your integration tests can start the moment the build finishes, without waiting for other jobs in the same stage.\n\n```yaml\n# .gitlab/ci/api-service.yml\nstages:\n  - build\n  - test\n\nbuild-api:\n  stage: build\n  script:\n    - echo \"Building API service\"\n\ntest-api:\n  stage: test\n  needs: [build-api]\n  script:\n    - echo \"Running API tests\"\n```\n\n\nWhy it matters: Teams with large monorepos typically report significant reductions in pipeline runtime after switching to DAG execution, since jobs no longer wait on unrelated work in the same stage. Parent-child pipelines add the organizational layer that keeps the configuration maintainable as the repo and team grow.\n\n![Local downstream pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738759/Blog/Imported/hackathon-fake-blog-post-s/image3_vwj3rz.png \"Local downstream pipelines\")\n\n## 2. Microservices: Cross-repo, multi-project pipelines\n\n\nThe problem: Your frontend lives in one repo, your backend in another. When the frontend team ships a change, they have no visibility into whether it broke the backend integration and vice versa.\n\n\nGitLab's [multi-project pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#multi-project-pipelines) let one project trigger a pipeline in a completely separate project and wait for the result. The triggering project gets a linked downstream pipeline right in its own pipeline view.\n\n\nThe frontend pipeline builds an API contract artifact and publishes it, then triggers the backend pipeline. The backend fetches that artifact directly using the [Jobs API](https://docs.gitlab.com/ee/api/jobs.html#download-a-single-artifact-file-from-specific-tag-or-branch) and validates it before allowing anything to proceed. If a breaking change is detected, the backend pipeline fails and the frontend pipeline fails with it.\n\n```yaml\n# frontend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n  - trigger-backend\n\nbuild-frontend:\n  stage: build\n  script:\n    - echo \"Building frontend and generating API contract...\"\n    - mkdir -p dist\n    - |\n      echo '{\n        \"api_version\": \"v2\",\n        \"breaking_changes\": false\n      }' > dist/api-contract.json\n    - cat dist/api-contract.json\n  artifacts:\n    paths:\n      - dist/api-contract.json\n    expire_in: 1 hour\n\ntest-frontend:\n  stage: test\n  script:\n    - echo \"All frontend tests passed!\"\n\ntrigger-backend-pipeline:\n  stage: trigger-backend\n  trigger:\n    project: my-org/backend-service\n    branch: main\n    strategy: depend\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n```\n\n```yaml\n# backend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n\nbuild-backend:\n  stage: build\n  script:\n    - echo \"All backend tests passed!\"\n\nintegration-test:\n  stage: test\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"pipeline\"\n  script:\n    - echo \"Fetching API contract from frontend...\"\n    - |\n      curl --silent --fail \\\n        --header \"JOB-TOKEN: $CI_JOB_TOKEN\" \\\n        --output api-contract.json \\\n        \"${CI_API_V4_URL}/projects/${FRONTEND_PROJECT_ID}/jobs/artifacts/main/raw/dist/api-contract.json?job=build-frontend\"\n    - cat api-contract.json\n    - |\n      if grep -q '\"breaking_changes\": true' api-contract.json; then\n        echo \"FAIL: Breaking API changes detected - backend integration blocked!\"\n        exit 1\n      fi\n      echo \"PASS: API contract is compatible!\"\n```\n\n\nA few things worth noting in this config. The `integration-test` job uses `$CI_PIPELINE_SOURCE == \"pipeline\"` to ensure it only runs when triggered by an upstream pipeline, not on a standalone push to the backend repo. The frontend project ID is referenced via `$FRONTEND_PROJECT_ID`, which should be set as a [CI/CD variable](https://docs.gitlab.com/ci/variables/) in the backend project settings to avoid hardcoding it.\n\n\nWhy it matters: Cross-service breakage that previously surfaced in production gets caught in the pipeline instead. The dependency between services stops being invisible and becomes something teams can see, track, and act on.\n\n\n![Cross-project pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738762/Blog/Imported/hackathon-fake-blog-post-s/image4_h6mfsb.png \"Cross-project pipelines\")\n\n\n## 3. Multi-tenant / matrix deployments: Dynamic child pipelines\n\n\nThe problem: You deploy the same application to 15 customer environments, or three cloud regions, or dev/staging/prod. Updating a deploy stage across all of them one by one is the kind of work that leads to configuration drift. Writing a separate pipeline for each environment is unmaintainable from day one.\n\n\nGitLab's [dynamic child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#dynamic-child-pipelines) let you generate a pipeline at runtime. A job runs a script that produces a YAML file, and that YAML becomes the pipeline for the next stage. The pipeline structure itself becomes data.\n\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - generate\n  - trigger-environments\n\ngenerate-config:\n  stage: generate\n  script:\n    - |\n      # ENVIRONMENTS can be passed as a CI variable or read from a config file.\n      # Default to dev, staging, prod if not set.\n      ENVIRONMENTS=${ENVIRONMENTS:-\"dev staging prod\"}\n      for ENV in $ENVIRONMENTS; do\n        cat > ${ENV}-pipeline.yml \u003C\u003C EOF\n      stages:\n        - deploy\n        - verify\n      deploy-${ENV}:\n        stage: deploy\n        script:\n          - echo \"Deploying to ${ENV} environment\"\n      verify-${ENV}:\n        stage: verify\n        script:\n          - echo \"Running smoke tests on ${ENV}\"\n      EOF\n      done\n  artifacts:\n    paths:\n      - \"*.yml\"\n    exclude:\n      - \".gitlab-ci.yml\"\n\n.trigger-template:\n  stage: trigger-environments\n  trigger:\n    strategy: depend\n\ntrigger-dev:\n  extends: .trigger-template\n  trigger:\n    include:\n      - artifact: dev-pipeline.yml\n        job: generate-config\n\ntrigger-staging:\n  extends: .trigger-template\n  needs: [trigger-dev]\n  trigger:\n    include:\n      - artifact: staging-pipeline.yml\n        job: generate-config\n\ntrigger-prod:\n  extends: .trigger-template\n  needs: [trigger-staging]\n  trigger:\n    include:\n      - artifact: prod-pipeline.yml\n        job: generate-config\n  when: manual\n```\n\n\nThe generation script loops over an `ENVIRONMENTS` variable rather than hardcoding each environment separately. Pass in a different list via a CI variable or read it from a config file and the pipeline adapts without touching the YAML. The trigger jobs use [extends:](https://docs.gitlab.com/ci/yaml/#extends) to inherit shared configuration from `.trigger-template`, so `strategy: depend` is defined once rather than repeated on every trigger job. Add a new environment by updating the variable, not by duplicating pipeline config. Add [when: manual](https://docs.gitlab.com/ci/yaml/#when) to the production trigger and you get a promotion gate baked right into the pipeline graph.\n\n\nWhy it matters: SaaS companies and platform teams use this pattern to manage dozens of environments without duplicating pipeline logic. The pipeline structure itself stays lean as the deployment matrix grows.\n\n\n![Dynamic pipeline](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738765/Blog/Imported/hackathon-fake-blog-post-s/image7_wr0kx2.png \"Dynamic pipeline\")\n\n\n## 4. MR-first delivery: Merge request pipelines, merged results, and workflow routing\n\n\nThe problem: Your pipeline runs on every push to every branch. Expensive tests run on feature branches that will never merge. Meanwhile, you have no guarantee that what you tested is actually what will land on `main` after a merge.\n\n\nGitLab has three interlocking features that solve this together:\n\n\n*   [Merge request pipelines](https://docs.gitlab.com/ci/pipelines/merge_request_pipelines/) run only when a merge request exists, not on every branch push. This alone eliminates a significant amount of wasted compute.\n\n*   [Merged results pipelines](https://docs.gitlab.com/ci/pipelines/merged_results_pipelines/) go further. GitLab creates a temporary merge commit (your branch plus the current target branch) and runs the pipeline against that. You are testing what will actually exist after the merge, not just your branch in isolation.\n\n*   [Workflow rules](https://docs.gitlab.com/ci/yaml/workflow/) let you define exactly which pipeline type runs under which conditions and suppress everything else. The `$CI_OPEN_MERGE_REQUESTS` guard below prevents duplicate pipelines firing for both a branch and its open MR simultaneously.\n\n\nWith those three working together, here is what a tiered pipeline looks like:\n\n```yaml\n# .gitlab-ci.yml\nworkflow:\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS\n      when: never\n    - if: $CI_COMMIT_BRANCH\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\nstages:\n  - fast-checks\n  - expensive-tests\n  - deploy\n\nlint-code:\n  stage: fast-checks\n  script:\n    - echo \"Running linter\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nunit-tests:\n  stage: fast-checks\n  script:\n    - echo \"Running unit tests\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nintegration-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running integration tests (15 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\ne2e-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running E2E tests (30 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nnightly-comprehensive-scan:\n  stage: expensive-tests\n  script:\n    - echo \"Running full nightly suite (2 hours)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\ndeploy-production:\n  stage: deploy\n  script:\n    - echo \"Deploying to production\"\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n      when: manual\n```\n\nWith this setup, the pipeline behaves differently depending on context. A push to a feature branch with no open MR runs lint and unit tests only. Once an MR is opened, the workflow rules switch from a branch pipeline to an MR pipeline, and the full integration and E2E suite runs against the merged result. Merging to `main` queues a manual production deployment. A nightly schedule runs the comprehensive scan once, not on every commit.\n\n\nWhy it matters: Teams routinely cut CI costs significantly with this pattern, not by running fewer tests, but by running the right tests at the right time. Merged results pipelines catch the class of bugs that only appear after a merge, before they ever reach `main`.\n\n\n![Conditional pipelines (within a branch with no MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738768/Blog/Imported/hackathon-fake-blog-post-s/image6_dnfcny.png \"Conditional pipelines (within a branch with no MR)\")\n\n\n\n![Conditional pipelines (within an MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738772/Blog/Imported/hackathon-fake-blog-post-s/image1_wyiafu.png \"Conditional pipelines (within an MR)\")\n\n\n\n![Conditional pipelines (on the main branch)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738774/Blog/Imported/hackathon-fake-blog-post-s/image5_r6lkfd.png \"Conditional pipelines (on the main branch)\")\n\n## 5. Governed pipelines: CI/CD Components\n\n\nThe problem: Your platform team has defined the right way to build, test, and deploy. But every team has their own `.gitlab-ci.yml` with subtle variations. Security scanning gets skipped. Deployment standards drift. Audits are painful.\n\n\nGitLab [CI/CD Components](https://docs.gitlab.com/ci/components/) let platform teams publish versioned, reusable pipeline building blocks. Application teams consume them with a single `include:` line and optional inputs — no copy-paste, no drift. Components are discoverable through the [CI/CD Catalog](https://docs.gitlab.com/ci/components/#cicd-catalog), which means teams can find and adopt approved building blocks without needing to go through the platform team directly.\n\n\nHere is a component definition from a shared library:\n\n```yaml\n# templates/deploy.yml\nspec:\n  inputs:\n    stage:\n      default: deploy\n    environment:\n      default: production\n---\ndeploy-job:\n  stage: $[[ inputs.stage ]]\n  script:\n    - echo \"Deploying $APP_NAME to $[[ inputs.environment ]]\"\n    - echo \"Deploy URL: $DEPLOY_URL\"\n  environment:\n    name: $[[ inputs.environment ]]\n```\nAnd here is how an application team consumes it:\n\n```yaml\n# Application repo: .gitlab-ci.yml\nvariables:\n  APP_NAME: \"my-awesome-app\"\n  DEPLOY_URL: \"https://api.example.com\"\n\ninclude:\n  - component: gitlab.com/my-org/component-library/build@v1.0.6\n  - component: gitlab.com/my-org/component-library/test@v1.0.6\n  - component: gitlab.com/my-org/component-library/deploy@v1.0.6\n    inputs:\n      environment: staging\n\nstages:\n  - build\n  - test\n  - deploy\n```\n\nThree lines of `include:` replace hundreds of lines of duplicated YAML. The platform team can push a security fix to `v1.0.7` and teams opt in on their own schedule — or the platform team can pin everyone to a minimum version. Either way, one change propagates everywhere instead of needing to be applied repo by repo.\n\n\nPair this with [resource groups](https://docs.gitlab.com/ci/resource_groups/) to prevent concurrent deployments to the same environment, and [protected environments](https://docs.gitlab.com/ci/environments/protected_environments/) to enforce approval gates - and you have a governed delivery platform where compliance is the default, not the exception.\n\n\nWhy it matters: This is the pattern that makes GitLab CI/CD scale across hundreds of teams. Platform engineering teams enforce compliance without becoming a bottleneck. Application teams get a fast path to a working pipeline without reinventing the wheel.\n\n\n![Component pipeline (imported jobs)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738776/Blog/Imported/hackathon-fake-blog-post-s/image2_pizuxd.png \"Component pipeline (imported jobs)\")\n\n## Putting it all together\n\nNone of these features exist in isolation. The reason GitLab's pipeline model is worth understanding deeply is that these primitives compose:\n\n*   A monorepo uses parent-child pipelines, and each child uses DAG execution\n\n*   A microservices platform uses multi-project pipelines, and each project uses MR pipelines with merged results\n\n*   A governed platform uses CI/CD components to standardize the patterns above across every team\n\n\nMost teams discover one of these features when they hit a specific pain point. The ones who invest in understanding the full model end up with a delivery system that actually reflects how their engineering organization works, not a pipeline that fights it.\n\n## Other patterns worth exploring\n\n\nThe five patterns above cover the most common structural pain points, but GitLab's pipeline model goes further. A few others worth looking into as your needs grow:\n\n\n*   [Review apps with dynamic environments](https://docs.gitlab.com/ci/environments/) let you spin up a live preview for every feature branch and tear it down automatically when the MR closes. Useful for teams doing frontend work or API changes that need stakeholder sign-off before merging.\n\n*   [Caching and artifact strategies](https://docs.gitlab.com/ci/caching/) are often the fastest way to cut pipeline runtime after the structural work is done. Structuring `cache:` keys around dependency lockfiles and being deliberate about what gets passed between jobs with [artifacts:](https://docs.gitlab.com/ci/yaml/#artifacts) can make a significant difference without changing your pipeline shape at all.\n\n*   [Scheduled and API-triggered pipelines](https://docs.gitlab.com/ci/pipelines/schedules/) are worth knowing about because not everything should run on a code push. Nightly security scans, compliance reports, and release automation are better modeled as scheduled or [API-triggered](https://docs.gitlab.com/ci/triggers/) pipelines with `$CI_PIPELINE_SOURCE` routing the right jobs for each context.\n\n## How to get started\n\nModern software delivery is complex. Teams are managing monorepos with dozens of services, coordinating across multiple repositories, deploying to many environments at once, and trying to keep standards consistent as organizations grow. GitLab's pipeline model was built with all of that in mind.\n\nWhat makes it worth investing time in is how well the pieces fit together. Parent-child pipelines bring structure to large codebases. Multi-project pipelines make cross-team dependencies visible and testable. Dynamic pipelines turn environment management into something that scales gracefully. MR-first delivery with merged results ensures confidence at every step of the review process. And CI/CD Components give platform teams a way to share best practices across an entire organization without becoming a bottleneck.\n\nEach of these features is powerful on its own, and even more so when combined. GitLab gives you the building blocks to design a delivery system that fits how your team actually works, and grows with you as your needs evolve.\n\n> [Start a free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/) to use pipeline logic today.\n\n## Read more\n\n*   [Variable and artifact sharing in GitLab parent-child pipelines](https://about.gitlab.com/blog/variable-and-artifact-sharing-in-gitlab-parent-child-pipelines/)\n*   [CI/CD inputs: Secure and preferred method to pass parameters to a pipeline](https://about.gitlab.com/blog/ci-cd-inputs-secure-and-preferred-method-to-pass-parameters-to-a-pipeline/)\n*   [Tutorial: How to set up your first GitLab CI/CD component](https://about.gitlab.com/blog/tutorial-how-to-set-up-your-first-gitlab-ci-cd-component/)\n*   [How to include file references in your CI/CD components](https://about.gitlab.com/blog/how-to-include-file-references-in-your-ci-cd-components/)\n*   [FAQ: GitLab CI/CD Catalog](https://about.gitlab.com/blog/faq-gitlab-ci-cd-catalog/)\n*   [Building a GitLab CI/CD pipeline for a monorepo the easy way](https://about.gitlab.com/blog/building-a-gitlab-ci-cd-pipeline-for-a-monorepo-the-easy-way/)\n*   [A CI/CD component builder's journey](https://about.gitlab.com/blog/a-ci-component-builders-journey/)\n*   [CI/CD Catalog goes GA: No more building pipelines from scratch](https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/)","5 ways GitLab pipeline logic solves real engineering problems","Learn how to scale CI/CD with composable patterns for monorepos, microservices, environments, and governance.",[719],"Omid Khan","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772721753/frfsm1qfscwrmsyzj1qn.png","2026-04-09",[107,723,24,724],"DevOps platform","features",{"featured":28,"template":13,"slug":726},"5-ways-gitlab-pipeline-logic-solves-real-engineering-problems",{"content":728,"config":738},{"title":729,"description":730,"authors":731,"heroImage":733,"date":734,"body":735,"category":9,"tags":736},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[732],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficult to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[24,737,724],"product",{"featured":12,"template":13,"slug":739},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"content":741,"config":751},{"title":742,"description":743,"authors":744,"heroImage":746,"date":747,"category":9,"tags":748,"body":750},"How IIT Bombay students are coding the future with GitLab","At GitLab, we often talk about how software accelerates innovation. But sometimes, you have to step away from the Zoom calls and stand in a crowded university hall to remember why we do this.",[745],"Nick Veenhof","https://res.cloudinary.com/about-gitlab-com/image/upload/v1750099013/Blog/Hero%20Images/Blog/Hero%20Images/blog-image-template-1800x945%20%2814%29_6VTUA8mUhOZNDaRVNPeKwl_1750099012960.png","2026-01-08",[260,620,749],"open source","The GitLab team recently had the privilege of judging the **iHack Hackathon** at **IIT Bombay's E-Summit**. The energy was electric, the coffee was flowing, and the talent was undeniable. But what struck us most wasn't just the code — it was the sheer determination of students to solve real-world problems, often overcoming significant logistical and financial hurdles to simply be in the room.\n\n\nThrough our [GitLab for Education program](https://about.gitlab.com/solutions/education/), we aim to empower the next generation of developers with tools and opportunity. Here is a look at what the students built, and how they used GitLab to bridge the gap between idea and reality.\n\n## The challenge: Build faster, build securely\n\nThe premise for the GitLab track of the hackathon was simple: Don't just show us a product; show us how you built it. We wanted to see how students utilized GitLab's platform — from Issue Boards to CI/CD pipelines — to accelerate the development lifecycle.\n\nThe results were inspiring.\n\n## The winners\n\n### 1st place: Team Decode — Democratizing Scientific Research\n\n**Project:** FIRE (Fast Integrated Research Environment)\n\nTeam Decode took home the top prize with a solution that warms a developer's heart: a local-first, blazing-fast data processing tool built with [Rust](https://about.gitlab.com/blog/secure-rust-development-with-gitlab/) and Tauri. They identified a massive pain point for data science students: existing tools are fragmented, slow, and expensive.\n\nTheir solution, FIRE, allows researchers to visualize complex formats (like NetCDF) instantly. What impressed the judges most was their \"hacker\" ethos. They didn't just build a tool; they built it to be open and accessible.\n\n**How they used GitLab:** Since the team lived far apart, asynchronous communication was key. They utilized **GitLab Issue Boards** and **Milestones** to track progress and integrated their repo with Telegram to get real-time push notifications. As one team member noted, \"Coordinating all these technologies was really difficult, and what helped us was GitLab... the Issue Board really helped us track who was doing what.\"\n\n![Team Decode](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/epqazj1jc5c7zkgqun9h.jpg)\n\n### 2nd place: Team BichdeHueDost — Reuniting to Solve Payments\n\n**Project:** SemiPay (RFID Cashless Payment for Schools)\n\nThe team name, BichdeHueDost, translates to \"Friends who have been set apart.\" It's a fitting name for a group of friends who went to different colleges but reunited to build this project. They tackled a unique problem: handling cash in schools for young children. Their solution used RFID cards backed by a blockchain ledger to ensure secure, cashless transactions for students.\n\n**How they used GitLab:** They utilized [GitLab CI/CD](https://about.gitlab.com/topics/ci-cd/) to automate the build process for their Flutter application (APK), ensuring that every commit resulted in a testable artifact. This allowed them to iterate quickly despite the \"flaky\" nature of cross-platform mobile development.\n\n![Team BichdeHueDost](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/pkukrjgx2miukb6nrj5g.jpg)\n\n### 3rd place: Team ZenYukti — Agentic Repository Intelligence\n\n**Project:** RepoInsight AI (AI-powered, GitLab-native intelligence platform)\n\nTeam ZenYukti impressed us with a solution that tackles a universal developer pain point: understanding unfamiliar codebases. What stood out to the judges was the tool's practical approach to onboarding and code comprehension: RepoInsight-AI automatically generates documentation, visualizes repository structure, and even helps identify bugs, all while maintaining context about the entire codebase.\n\n**How they used GitLab:** The team built a comprehensive CI/CD pipeline that showcased GitLab's security and DevOps capabilities. They integrated [GitLab's Security Templates](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/templates/Security) (SAST, Dependency Scanning, and Secret Detection), and utilized [GitLab Container Registry](https://docs.gitlab.com/user/packages/container_registry/) to manage their Docker images for backend and frontend components. They created an AI auto-review bot that runs on merge requests, demonstrating an \"agentic workflow\" where AI assists in the development process itself.\n\n![Team ZenYukti](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/ymlzqoruv5al1secatba.jpg)\n\n## Beyond the code: A lesson in inclusion\n\nWhile the code was impressive, the most powerful moment of the event happened away from the keyboard.\n\nDuring the feedback session, we learned about the journey Team ZenYukti took to get to Mumbai. They traveled over 24 hours, covering nearly 1,800 kilometers. Because flights were too expensive and trains were booked, they traveled in the \"General Coach,\" a non-reserved, severely overcrowded carriage.\n\nAs one student described it:\n\n*\"You cannot even imagine something like this... there are no seats... people sit on the top of the train. This is what we have endured.\"*\n\nThis hit home. [Diversity, Inclusion, and Belonging](https://handbook.gitlab.com/handbook/company/culture/inclusion/) are core values at GitLab. We realized that for these students, the barrier to entry wasn't intellect or skill, it was access.\n\nIn that moment, we decided to break that barrier. We committed to reimbursing the travel expenses for the participants who struggled to get there. It's a small step, but it underlines a massive truth: **talent is distributed equally, but opportunity is not.**\n\n![hackathon class together](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380252/o5aqmboquz8ehusxvgom.jpg)\n\n### The future is bright (and automated)\n\nWe also saw incredible potential in teams like Prometheus, who attempted to build an autonomous patch remediation tool (DevGuardian), and Team Arrakis, who built a voice-first job portal for blue-collar workers using [GitLab Duo](https://about.gitlab.com/gitlab-duo-agent-platform/) to troubleshoot their pipelines.\n\nTo all the students who participated: You are the future. Through [GitLab for Education](https://about.gitlab.com/solutions/education/), we are committed to providing you with the top-tier tools (like GitLab Ultimate) you need to learn, collaborate, and change the world — whether you are coding from a dorm room, a lab, or a train carriage. **Keep shipping.**\n\n> :bulb: Learn more about the [GitLab for Education program](https://about.gitlab.com/solutions/education/).\n",{"slug":752,"featured":12,"template":13},"how-iit-bombay-students-code-future-with-gitlab",{"promotions":754},[755,769,780,792],{"id":756,"categories":757,"header":759,"text":760,"button":761,"image":766},"ai-modernization",[758],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":762,"config":763},"Get your AI maturity score",{"href":764,"dataGaName":765,"dataGaLocation":242},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":767},{"src":768},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":770,"categories":771,"header":772,"text":760,"button":773,"image":777},"devops-modernization",[737,566],"Are you just managing tools or shipping innovation?",{"text":774,"config":775},"Get your DevOps maturity score",{"href":776,"dataGaName":765,"dataGaLocation":242},"/assessments/devops-modernization-assessment/",{"config":778},{"src":779},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":781,"categories":782,"header":784,"text":760,"button":785,"image":789},"security-modernization",[783],"security","Are you trading speed for security?",{"text":786,"config":787},"Get your security maturity score",{"href":788,"dataGaName":765,"dataGaLocation":242},"/assessments/security-modernization-assessment/",{"config":790},{"src":791},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":793,"paths":794,"header":797,"text":798,"button":799,"image":804},"github-azure-migration",[795,796],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":800,"config":801},"See how GitLab compares to GitHub",{"href":802,"dataGaName":803,"dataGaLocation":242},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":805},{"src":779},{"header":807,"blurb":808,"button":809,"secondaryButton":814},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":810,"config":811},"Get your free trial",{"href":812,"dataGaName":49,"dataGaLocation":813},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":504,"config":815},{"href":53,"dataGaName":54,"dataGaLocation":813},1776449945139]