[{"data":1,"prerenderedAt":816},["ShallowReactive",2],{"/en-us/blog/the-consul-outage-that-never-happened":3,"navigation-en-us":35,"banner-en-us":445,"footer-en-us":455,"blog-post-authors-en-us-Devin Sylva":697,"blog-related-posts-en-us-the-consul-outage-that-never-happened":711,"blog-promotions-en-us":753,"next-steps-en-us":806},{"id":4,"title":5,"authorSlugs":6,"body":8,"categorySlug":9,"config":10,"content":14,"description":8,"extension":24,"isFeatured":12,"meta":25,"navigation":26,"path":27,"publishedDate":20,"seo":28,"stem":32,"tagSlugs":33,"__hash__":34},"blogPosts/en-us/blog/the-consul-outage-that-never-happened.yml","The Consul Outage That Never Happened",[7],"devin-sylva",null,"engineering",{"slug":11,"featured":12,"template":13},"the-consul-outage-that-never-happened",false,"BlogPost",{"title":15,"description":16,"authors":17,"heroImage":19,"date":20,"body":21,"category":9,"tags":22},"The Consul outage that never happened","Sometimes a good plan is the best tool for the job.",[18],"Devin Sylva","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749679092/Blog/Hero%20Images/consul-outage-image.jpg","2019-11-08","When things go wrong on a large website, it can be fun to read the dramatic stories of high pressure incidents where nothing goes as planned. It makes for good reading. Every once in a while though, we get a success story. Every once in a while, things go exactly as planned.\n\n[GitLab.com](http://GitLab.com) is a large, high availability instance of GitLab. It is maintained by the [Infrastructure group](/company/team/?department=infrastructure-department), which currently consists of 20 to 24 engineers (depending on how you count), four managers, and a director, distributed all around the world. Distributed, in this case, does not mean across a few different offices. There are three or four major cities which have more than one engineer but with the exception of coworking days nobody is working from the same building.\n\nIn order to handle the load generated by about four million users working on around 12 million projects, GitLab.com breaks out the individual components of the GitLab product and currently spreads them out over 271 production servers.\n\nThe site is slowly migrating to using Hashicorp's [Consul](https://www.consul.io) for service location. Consul can be thought of like DNS, in that it associates a well-known name with the actual physical location of that service. It also provides other useful functions such as storing dynamic configuration for services, as well as locking for clusters. All of the Consul client and server components talk to each other over encrypted connections. These connections require a certificate at each end to validate the identity of the client and server and to provide the encryption key. The main component of GitLab.com which currently relies on this service is the database and its high availability system [Patroni](https://patroni.readthedocs.io/en/latest/). Like any website that provides functionality and not just information, the database is the central service that everything else depends on. Without the database, the website, API, CI pipelines, and git services will all deny requests and return errors.\n\n## Troubleshooting\n\nThe [issue](https://gitlab.com/gitlab-com/gl-infra/production/issues/1037) came to our attention when a database engineer noticed that one of our database servers in the staging environment could not reconnect to the staging Consul server after the database node was restarted.\n\nIt turns out that the TLS certificate was expired. This is normally a simple fix. Someone would go to the Certificate Authority (CA) and request a renewal – or if that fails, generate a new certificate to be signed by the same CA. That certificate would replace the expired copy and the service would be restarted. All of the connections should reestablish using the new certificate and just like with any other rolling configuration change, it should be transparent to all users.\n\nAfter looking everywhere, and asking everyone on the team, we got the definitive answer that the CA key we created a year ago for this self-signed certificate had been lost.\n\nThese test certificates were generated for the original proof-of-concept installation for this service and were never intended to be transitioned into production. However, since everything was working perfectly, the expired test certificate had not been calling attention to itself. A few things should have been done, including: Rebuilding the service with production in mind; conducting a production readiness review; and monitoring. But a year ago, our production team was in a very different place. We were small with just four engineers, and three new team members: A manager, director, and engineer, all of whom were still onboarding. We were less focused on the gaps that led to this oversight a year ago and more focused on fixing the urgent problem today.\n\n### Validating the problem\n\nFirst, we needed to validate the problem using the information we'd gathered. Since we couldn't update the existing certificates, we turned validation off on the client that couldn't connect. Turning validation off didn't change anything since the encrypted connections validate both the cluster side and client side. Next, we changed the setting on one server node in the cluster and so the restarted client could then connect to the server node. The problem now was that the server could no longer connect to any other cluster node and could not rejoin the cluster. The server we changed was not validating connections, meaning it was ignoring the expired certificate of its peers in the cluster but the peers were not returning the favor. They were shunning it, putting the whole cluster in a degraded state.\n\nWe realized that no matter what we did, some servers and some clients would not be able to connect to each other until after the change had been made everywhere and after every service was restarted. Unfortunately, we were talking about 255 of our 271 servers. Our tool set is designed for gradual rollouts, not simultaneous actions.\n\nWe were unsure why the site was even still online because if the clients and services could not connect it was unclear why anything was still working. We ran a small test, confirming the site was only working because the connections were already established when the certificates expired. Any interruption of these long-running connections would cause them to revalidate the new connections, resulting in them rejecting all new connections across the fleet.\n\n> Effectively, we were in the middle of an outage that had already started, but hadn't yet gotten to the point of taking down the site.\n\n### Testing in staging\n\nWe declared an incident and began testing every angle we could think of in the staging environment, including:\n\n* Reloading the configuration of the running service, which worked fine and did not drop connections, but the [certificate settings](https://github.com/hashicorp/consul/pull/4204) are [not included in the reloadable settings](https://www.consul.io/docs/agent/options.html#reloadable-configuration) for our version of Consul.\n* Simultaneous restarts of various services, which worked, but our tools wouldn't allow us to do that with ALL of the nodes at once.\n\nEverything we tried indicated that we had to break those existing connections in order to activate any change, and that we could only avoid downtime if that happened on **ALL nodes at precisely the same time**.\n\nEvery problem uncovered other problems and as we were troubleshooting one of our production Consul servers became unresponsive, disconnected all SSH sessions, and would not allow anyone to reconnect. The server did not log any errors. It was still sending monitoring data and was still participating in the Consul cluster. If we restarted the server, then it would not have been able to reconnect to its peers and we would have an even number of nodes. Not having quorum in the cluster would have been dangerous when we went to restart all of the nodes, so we left it in that state for the moment.\n\n## Planning\n\nOnce the troubleshooting was finished [it was time to start planning](https://gitlab.com/gitlab-com/gl-infra/production/issues/1042).\n\nThere were a few ways to solve the problem. We could:\n\n* Replace the CA and the certificates with new self-signed ones.\n* Change the CA setting to point to the system store, allowing us to use certificates signed by our standard certificate provider and then replace the certificates.\n* Disable the validation of the dates so that the expired certificate would not cause connections to fail.\n\nAll of these options would incur the same risks and involve the same risky restart of all services at once.\n\nWe picked the last option. Our reasoning was that disabling the validation would eliminate the immediate risk and give us time to slowly roll out a properly robust solution in the near future, without having to worry about disrupting the whole system. It was also the [smallest and most incremental change](https://handbook.gitlab.com/handbook/values/#iteration).\n\n### Working asynchronously to tackle the problem\n\nWhile there was some time pressure due to the [risk of network connections being interrupted](https://gitlab.com/gitlab-com/gl-infra/production/issues/1037#note_201745119), we had to consider the reality of working across timezones as we planned our solution.\n\n> We decided not to hand it off to the European shift, who were coming online soon. Being a [globally distributed](https://handbook.gitlab.com/handbook/company/culture/all-remote/) team, we had already handed things off from the end of the day in Mongolia, through Eastern and Western Europe and across the Americas, and were approaching the end of the day in Hawaii and New Zealand.\n\nAustralia still had a few more hours and Mongolia had started the day again, but the folks who had been troubleshooting it throughout the day had a pretty good handle on what needed to happen and what could go wrong. It made sense for them to be the ones to do the work. We decided to make a \"Break Glass\" plan instead. This was a merge request with all of the changes and information necessary for the European shift to get us back into a good state in case a full outage happened before anyone who had been working on it woke up. Everyone slept better knowing that we had a plan that would work even if it could not be executed without causing down time. If we were already experiencing down time, there would be no problem.\n\n### Designing our approach\n\nIn the morning (HST) everything was how we left it so we started planning how to change the settings and restart all of the services without downtime. Our normal management tools were out because of the time it takes to roll out changes. Even sequential tools such as `knife ssh`, `mussh`, or `ansible` wouldn't work because the change had to be **precisely simultaneous**. Someone joked about setting it up in `cron` which led us to the standard linux `at` command (a relative of the more widely used `batch`). `cron` would require cleanup afterward but an `at` command can be pushed out ahead of time with a sequential tool and will run a command at a precise time on all machines. Back in the days of hands-on, bare metal system administration, it was a useful trick for running one-time maintenance in the middle of the night or making it look like you were working when you weren't. Now `at` has become more obscure with the trend toward managing fleets of servers rather than big monolithic central machines. We chose to run the command `sudo systemctl restart consul.service`. We tested this in staging to verify that our Ubuntu distribution made environment variables like `$PATH` available, and that `sudo` did not ask for a password. On some distributions (older CentOS especially) this is not always the case.\n\nWith those successful tests, we still needed to change the config files. Luckily, there is nothing that prevents changing these ahead of time since the changes aren't picked up until the service restarts. We didn't want to do this step at the same time as the service restart so we could validate the changes and keep the `at` command as small as possible. We decided not to use Chef to push out the change because we needed complete and immediate transparency. Any nodes that did not get the change would fail after the restart. `mussh` was the tool that offered the most control and visibility while still being able to change all hosts with one command.\n\nWe also had to disable the Chef client so that it didn't overwrite the changes between when they were written and when the service restarted.\n\nBefore running anything we also needed to address the one Consul server that we couldn't access. It likely just needed to be rebooted and would come up and be unable to reconnect to the cluster. The best option was to do this manually just before starting the rest of the procedure.\n\nOnce we had mapped out the plan we practiced it in the disaster recovery environment. We used the disaster recovery environment instead of the staging environment because all of the nodes in the staging environment had already been restarted, so there were no long-running connections to test. Making the disaster recovery environment was the next best option. It did not go perfectly since the database in this environment was already in an unhealthy state but it gave us valuable information to adjust the plan.\n\n## Pre-execution\n\n### A moment of panic\n\nIt was almost time to fix the inaccessible Consul node. The team connected in to one of the other nodes to monitor and watch logs. Suddenly, the second node started disconnecting people. It was behaving exactly like the inaccessible node had the previous day. 😱 Suspiciously, it didn't disconnect everyone. Those who were still logged in noticed that `sshguard` was blocking access to some of the bastion servers that all of our ssh traffic flows through when accessing the internal nodes: [Infrastructure#7484](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7484). We have three bastion servers, and two were blocked because so many of us connected so many sessions so quickly. Disabling `sshguard` allowed everyone back in and that information was the hint we needed to manually find the one bastion which hadn't yet been blocked. It got us back into the original problem server. Disabling `sshguard` there left us with a fully functional node and with the ability to accept the `at` command to restart the Consul service at exactly the same time as the others.\n\nWe verified that we had an accurate and instantaneous way to monitor the state of the services. Watching the output of the `consul operator raft list-peers` command every second gave us view that looked like this:\n\n```text\nNode                Address          State     Voter  RaftProtocol\nconsul-01-inf-gprd  10.218.1.4:8300  follower  true   3\nconsul-03-inf-gprd  10.218.1.2:8300  leader    true   3\nconsul-05-inf-gprd  10.218.1.6:8300  follower  true   3\nconsul-04-inf-gprd  10.218.1.5:8300  follower  true   3\nconsul-02-inf-gprd  10.218.1.3:8300  follower  true   3\n```\n\n### More nodes, more problems\n\nEven the most thorough plans always miss something. At this point we realized that one of the three `pgbouncer` nodes which direct traffic to the correct database instance was not showing as healthy in the load balancer. One is normally in this state as a warm spare, but one of the side effects of disconnecting the `pgbouncer` nodes from Consul is that they would all fail their load balancer health checks. If all health checks are failing, GCP load balancers send requests to ALL nodes as a safety feature. This would lead to too many connections to our database servers, causing unintended consequences. We worked around this by removing the unhealthy node from the load balancer pool for the remainder of this activity.\n\n* We checked that the lag on the database replicas was zero, and that they weren't trying to replicate any large and time-consuming transactions.\n* We generated a text list of all of the nodes that run the Consul client or server.\n* We verified the time zone (UTC) and time synchronization on all of those servers to ensure that when the `at` command executed the restart, an unsynchronized clock wouldn't cause unintended behavior.\n* We also verified the `at` scheduler was running on all of those nodes, and that `sudo` would not ask for a password.\n* We verified the script that would edit the config files, and tested it against the staging environment.\n* We also made sure `sshguard` was disabled and wasn't going to lock out the scripted process for behaving like a scripted process.\n\nThis might seem like a lot of steps but without any of these prerequisites the whole process would fail. Once all of that was done, everything was ready to go.\n\n## Execution\n\nIn the end, we scheduled a maintenance window and distilled all of the research and troubleshooting down to the [steps in this issue](https://gitlab.com/gitlab-com/gl-infra/production/issues/1042).\n\nEverything was staged and it was time to make the changes. This course of action included four key steps. First, we paused the Patroni database high availability subsystem. Pausing would freeze database failover and keep the high availability configuration static until we were done. It would have been bad if we had a database failure during this time so minimizing the amount of time in this state was important.\n\nNext, we ran a script on every machine that stopped the Chef client service and then changed the verify lines in the config files from true to false. It wouldn't help to have Chef trying to reconfigure anything as we made changes. We did this using `mussh` in batches of 20 servers at a time. Any more in parallel and our SSH agent and Yubikeys may not have been able to keep up. We were not expecting change in the state of anything from this step. The config files on disk should have the new values but the running services wouldn't change, and more importantly, no TCP connections would disconnect. That was what we got so it was time for some verification.\n\nOur third step was to check all of the servers and a random sampling of client nodes to make sure config files had been modified appropriately. It was also a good time to double-check that the Chef client was disabled. This check turned out to be a good thing to do, because there were a few nodes that still had the Chef client active. It turned out that those nodes were in the middle of a run when we disabled the service, and it reenabled the service for us when the run completed. Chef can be _so_ helpful. We disabled it manually on the few machines that were affected. This delayed our maintenance window by a few minutes, so we were very glad we didn't schedule the `at` commands first.\n\nFinally, we needed to remove the inactive `pgbouncer` node from the load balancer, so when the load balancer went into its safety mode, it would only send traffic to the two that were in a known state. You might think that removing it from the load balancer would be enough, but since it also participates in a cluster via Consul the whole service needed to be shut down along with the health check, which the load balancer uses to determine whether to send it traffic. We made a note of the full command line from the process table, shut it down, and removed it from the pool.\n\n### The anxiety builds\n\nNow was the moment of truth. It was 02:10 UTC. We pushed the following command to every server (20 at a time, using `mussh`): `echo 'sudo systemctl restart consul.service' | at 02:20` – it took about four minutes to complete. Then we waited. We monitored the Consul servers by running `watch -n 1 consul operator raft list-peers` on each of them in a separate terminal. We bit our nails. We watched the dashboards for signs of db connection errors from the frontend nodes. We all held our breath, and watched the database for signs of distress. Six minutes is a long time to think: \"It's 4am in Europe, so they won't notice\" and \"It's dinner time on the US west coast, maybe they won't notice\". Trust me, six minutes is a _really_ long time: \"Sorry APAC users for your day, which we are about to ruin by missing something\".\n\nWe counted down the last few seconds and watched. In the first second, the Consul servers all shut down, severing the connections that were keeping everything working. All 255 of the clients restarted at the same time. In the next second, we watched the servers return `Unexpected response code: 500`, which means \"connection refused\" in this case. The third second... still returning \"panic now\" or maybe it was \"connection refused\"... The fourth second all nodes returned `no leader found`, which meant that the connection was not being refused but the cluster was not healthy. The fifth second, no change. I'm thinking, just breathe, they were probably all discovering each other. In the sixth second, still no change: Maybe they're electing a leader? Second seven was the appropriate time for worry and panic. Then, the eighth second brought good news `node 04 is the leader`. All other nodes healthy and communicating properly. In the ninth second, we let out a collective (and globally distributed) exhale.\n\n### A quick assessment\n\nNow it was time to check what damage that painfully long eight seconds had done. We went through our checklist:\n\n* The database was still processing requests, no change.\n* The web and API nodes hadn't thrown any errors. They must have restarted fast enough that the cached database addresses were still being used.\n* The most important metric – the graph of 500 errors seen by customers: There was no change.\n\nWe expected to see a small spike in errors, or at least some identifiable change, but there was nothing but the noise floor. This was excellent news! 🎉\n\nThen we checked whether the database was communicating with the Consul servers. It was not. Everyone quickly turned their attention to the backend database servers. If they had been running normally and the high availability tool hadn't been paused, an unplanned failover would be the minimum outage we could have hoped for. It's likely that they would have gotten into a very bad state. We started to troubleshoot why it wasn't communicating with the Consul server, but about one minute into the change, the connection came up and everything synced. Apparently it just needed a little more time than the others. We verified everything, and when everyone was satisfied we turned the high availability back on.\n\n## Cleanup\n\nNow that everything in the critical path was working as expected, we released the tension from our shoulders. We re-enabled Chef and merged the MR pinning the Chef recipes to the newer version, and the MR's CI job pushed the newer version to our Chef server. After picking a few low-impact servers, we manually kicked off Chef runs after checking the `md5sum` of the Consul client config files. After Chef finished, there was no change to the file, and the Chef client service was running normally again. We followed the same process on the Consul servers with the same result, and manually implemented it on the database servers, just for good measure. Once those all looked good, we used `mussh` to kick off a Chef run on all of the servers using the same technique we used to turn them off.\n\nNow all that was left was to straighten everything out with `pgbouncer` and the database load balancer and then we could fully relax. Looking at the heath checks, we noticed that the two previously healthy nodes were not returning healthy. The health checks are used to tell the load balancer which `pgbouncer` nodes have a Consul lock and therefore which nodes to send the traffic. A little digging showed that after retrying to connect to the Consul service a few times, they gave up. This was not ideal, so we [opened an Infrastructure issue](https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7612) to fix it later and restarted the health checks manually. Everything showed normal so we added the inactive node back to the load balancer. The inactive node's health check told the load balancer not to select it, and since the load balancer was no longer in failsafe mode (due to the other node's health checks succeeding) the load balancer refrained from sending it traffic.\n\n## Conclusion\n\nSimultaneously restarting all of the Consul components with the new configuration put everything back into its original state, other than the validation setting which we set to false, and the TCP sessions which we restarted. After this change, the Consul clients will still be using TLS encryption but will ignore the fact that our cert is now expired. This is still not an ideal state but it gives us time to get there in a thoughtful way rather than as a rushed workaround.\n\nEvery once in a while we get into a situation that all of the fancy management tools just can't fix. There is no run book for situations such as the one we encountered. The question we were asked most frequently once people got up to speed was: \"Isn't there some instructional walkthrough published somewhere for this type of thing?\". For replacing a certificate from the same authority, yes definitely. For replacing a certificate on machines that can have downtime, there are plenty. But for keeping traffic flowing when hundreds of nodes need to change a setting and reconnect within a few seconds of each other... that's just not something that comes up very often. Even if someone wrote up the procedure it wouldn't work in our environment with all of the peripheral moving parts that required our attention.\n\nIn these types of situations there is no shortcut around thinking things through methodically. In this case, there were no tools or technologies that could solve the problem. Even in this new world of infrastructure as code, site reliability engineering, and cloud automation, there is still room for old fashioned system administrator tricks. There is just no substitute for understanding how everything works. We can try to abstract it away to make our day-to-day responsibilities easier, but when it comes down to it there will always be times when the best tool for the job is a solid plan.\n\nCover image by [Thomas Jensen](https://unsplash.com/@thomasjsn?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com)\n",[23],"production","yml",{},true,"/en-us/blog/the-consul-outage-that-never-happened",{"title":15,"description":16,"ogTitle":15,"ogDescription":16,"noIndex":12,"ogImage":19,"ogUrl":29,"ogSiteName":30,"ogType":31,"canonicalUrls":29},"https://about.gitlab.com/blog/the-consul-outage-that-never-happened","https://about.gitlab.com","article","en-us/blog/the-consul-outage-that-never-happened",[23],"vDIljsw1JtjVwIPpKncUZQBieNytkQrR_q7Gn96kZ-g",{"data":36},{"logo":37,"freeTrial":42,"sales":47,"login":52,"items":57,"search":365,"minimal":396,"duo":415,"switchNav":424,"pricingDeployment":435},{"config":38},{"href":39,"dataGaName":40,"dataGaLocation":41},"/","gitlab logo","header",{"text":43,"config":44},"Get free trial",{"href":45,"dataGaName":46,"dataGaLocation":41},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":48,"config":49},"Talk to sales",{"href":50,"dataGaName":51,"dataGaLocation":41},"/sales/","sales",{"text":53,"config":54},"Sign in",{"href":55,"dataGaName":56,"dataGaLocation":41},"https://gitlab.com/users/sign_in/","sign in",[58,85,180,185,286,346],{"text":59,"config":60,"cards":62},"Platform",{"dataNavLevelOne":61},"platform",[63,69,77],{"title":59,"description":64,"link":65},"The intelligent orchestration platform for DevSecOps",{"text":66,"config":67},"Explore our Platform",{"href":68,"dataGaName":61,"dataGaLocation":41},"/platform/",{"title":70,"description":71,"link":72},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":73,"config":74},"Meet GitLab Duo",{"href":75,"dataGaName":76,"dataGaLocation":41},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":78,"description":79,"link":80},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":81,"config":82},"Learn more",{"href":83,"dataGaName":84,"dataGaLocation":41},"/why-gitlab/","why gitlab",{"text":86,"left":26,"config":87,"link":89,"lists":93,"footer":162},"Product",{"dataNavLevelOne":88},"solutions",{"text":90,"config":91},"View all Solutions",{"href":92,"dataGaName":88,"dataGaLocation":41},"/solutions/",[94,118,141],{"title":95,"description":96,"link":97,"items":102},"Automation","CI/CD and automation to accelerate deployment",{"config":98},{"icon":99,"href":100,"dataGaName":101,"dataGaLocation":41},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[103,107,110,114],{"text":104,"config":105},"CI/CD",{"href":106,"dataGaLocation":41,"dataGaName":104},"/solutions/continuous-integration/",{"text":70,"config":108},{"href":75,"dataGaLocation":41,"dataGaName":109},"gitlab duo agent platform - product menu",{"text":111,"config":112},"Source Code Management",{"href":113,"dataGaLocation":41,"dataGaName":111},"/solutions/source-code-management/",{"text":115,"config":116},"Automated Software Delivery",{"href":100,"dataGaLocation":41,"dataGaName":117},"Automated software delivery",{"title":119,"description":120,"link":121,"items":126},"Security","Deliver code faster without compromising security",{"config":122},{"href":123,"dataGaName":124,"dataGaLocation":41,"icon":125},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[127,131,136],{"text":128,"config":129},"Application Security Testing",{"href":123,"dataGaName":130,"dataGaLocation":41},"Application security testing",{"text":132,"config":133},"Software Supply Chain Security",{"href":134,"dataGaLocation":41,"dataGaName":135},"/solutions/supply-chain/","Software supply chain security",{"text":137,"config":138},"Software Compliance",{"href":139,"dataGaName":140,"dataGaLocation":41},"/solutions/software-compliance/","software compliance",{"title":142,"link":143,"items":148},"Measurement",{"config":144},{"icon":145,"href":146,"dataGaName":147,"dataGaLocation":41},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[149,153,157],{"text":150,"config":151},"Visibility & Measurement",{"href":146,"dataGaLocation":41,"dataGaName":152},"Visibility and Measurement",{"text":154,"config":155},"Value Stream Management",{"href":156,"dataGaLocation":41,"dataGaName":154},"/solutions/value-stream-management/",{"text":158,"config":159},"Analytics & Insights",{"href":160,"dataGaLocation":41,"dataGaName":161},"/solutions/analytics-and-insights/","Analytics and insights",{"title":163,"items":164},"GitLab for",[165,170,175],{"text":166,"config":167},"Enterprise",{"href":168,"dataGaLocation":41,"dataGaName":169},"/enterprise/","enterprise",{"text":171,"config":172},"Small Business",{"href":173,"dataGaLocation":41,"dataGaName":174},"/small-business/","small business",{"text":176,"config":177},"Public Sector",{"href":178,"dataGaLocation":41,"dataGaName":179},"/solutions/public-sector/","public sector",{"text":181,"config":182},"Pricing",{"href":183,"dataGaName":184,"dataGaLocation":41,"dataNavLevelOne":184},"/pricing/","pricing",{"text":186,"config":187,"link":189,"lists":193,"feature":273},"Resources",{"dataNavLevelOne":188},"resources",{"text":190,"config":191},"View all resources",{"href":192,"dataGaName":188,"dataGaLocation":41},"/resources/",[194,227,245],{"title":195,"items":196},"Getting started",[197,202,207,212,217,222],{"text":198,"config":199},"Install",{"href":200,"dataGaName":201,"dataGaLocation":41},"/install/","install",{"text":203,"config":204},"Quick start guides",{"href":205,"dataGaName":206,"dataGaLocation":41},"/get-started/","quick setup checklists",{"text":208,"config":209},"Learn",{"href":210,"dataGaLocation":41,"dataGaName":211},"https://university.gitlab.com/","learn",{"text":213,"config":214},"Product documentation",{"href":215,"dataGaName":216,"dataGaLocation":41},"https://docs.gitlab.com/","product documentation",{"text":218,"config":219},"Best practice videos",{"href":220,"dataGaName":221,"dataGaLocation":41},"/getting-started-videos/","best practice videos",{"text":223,"config":224},"Integrations",{"href":225,"dataGaName":226,"dataGaLocation":41},"/integrations/","integrations",{"title":228,"items":229},"Discover",[230,235,240],{"text":231,"config":232},"Customer success stories",{"href":233,"dataGaName":234,"dataGaLocation":41},"/customers/","customer success stories",{"text":236,"config":237},"Blog",{"href":238,"dataGaName":239,"dataGaLocation":41},"/blog/","blog",{"text":241,"config":242},"Remote",{"href":243,"dataGaName":244,"dataGaLocation":41},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":246,"items":247},"Connect",[248,253,258,263,268],{"text":249,"config":250},"GitLab Services",{"href":251,"dataGaName":252,"dataGaLocation":41},"/services/","services",{"text":254,"config":255},"Community",{"href":256,"dataGaName":257,"dataGaLocation":41},"/community/","community",{"text":259,"config":260},"Forum",{"href":261,"dataGaName":262,"dataGaLocation":41},"https://forum.gitlab.com/","forum",{"text":264,"config":265},"Events",{"href":266,"dataGaName":267,"dataGaLocation":41},"/events/","events",{"text":269,"config":270},"Partners",{"href":271,"dataGaName":272,"dataGaLocation":41},"/partners/","partners",{"backgroundColor":274,"textColor":275,"text":276,"image":277,"link":281},"#2f2a6b","#fff","Insights for the future of software development",{"altText":278,"config":279},"the source promo card",{"src":280},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":282,"config":283},"Read the latest",{"href":284,"dataGaName":285,"dataGaLocation":41},"/the-source/","the source",{"text":287,"config":288,"lists":290},"Company",{"dataNavLevelOne":289},"company",[291],{"items":292},[293,298,304,306,311,316,321,326,331,336,341],{"text":294,"config":295},"About",{"href":296,"dataGaName":297,"dataGaLocation":41},"/company/","about",{"text":299,"config":300,"footerGa":303},"Jobs",{"href":301,"dataGaName":302,"dataGaLocation":41},"/jobs/","jobs",{"dataGaName":302},{"text":264,"config":305},{"href":266,"dataGaName":267,"dataGaLocation":41},{"text":307,"config":308},"Leadership",{"href":309,"dataGaName":310,"dataGaLocation":41},"/company/team/e-group/","leadership",{"text":312,"config":313},"Team",{"href":314,"dataGaName":315,"dataGaLocation":41},"/company/team/","team",{"text":317,"config":318},"Handbook",{"href":319,"dataGaName":320,"dataGaLocation":41},"https://handbook.gitlab.com/","handbook",{"text":322,"config":323},"Investor relations",{"href":324,"dataGaName":325,"dataGaLocation":41},"https://ir.gitlab.com/","investor relations",{"text":327,"config":328},"Trust Center",{"href":329,"dataGaName":330,"dataGaLocation":41},"/security/","trust center",{"text":332,"config":333},"AI Transparency Center",{"href":334,"dataGaName":335,"dataGaLocation":41},"/ai-transparency-center/","ai transparency center",{"text":337,"config":338},"Newsletter",{"href":339,"dataGaName":340,"dataGaLocation":41},"/company/contact/#contact-forms","newsletter",{"text":342,"config":343},"Press",{"href":344,"dataGaName":345,"dataGaLocation":41},"/press/","press",{"text":347,"config":348,"lists":349},"Contact us",{"dataNavLevelOne":289},[350],{"items":351},[352,355,360],{"text":48,"config":353},{"href":50,"dataGaName":354,"dataGaLocation":41},"talk to sales",{"text":356,"config":357},"Support portal",{"href":358,"dataGaName":359,"dataGaLocation":41},"https://support.gitlab.com","support portal",{"text":361,"config":362},"Customer portal",{"href":363,"dataGaName":364,"dataGaLocation":41},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":366,"login":367,"suggestions":374},"Close",{"text":368,"link":369},"To search repositories and projects, login to",{"text":370,"config":371},"gitlab.com",{"href":55,"dataGaName":372,"dataGaLocation":373},"search login","search",{"text":375,"default":376},"Suggestions",[377,379,383,385,389,393],{"text":70,"config":378},{"href":75,"dataGaName":70,"dataGaLocation":373},{"text":380,"config":381},"Code Suggestions (AI)",{"href":382,"dataGaName":380,"dataGaLocation":373},"/solutions/code-suggestions/",{"text":104,"config":384},{"href":106,"dataGaName":104,"dataGaLocation":373},{"text":386,"config":387},"GitLab on AWS",{"href":388,"dataGaName":386,"dataGaLocation":373},"/partners/technology-partners/aws/",{"text":390,"config":391},"GitLab on Google Cloud",{"href":392,"dataGaName":390,"dataGaLocation":373},"/partners/technology-partners/google-cloud-platform/",{"text":394,"config":395},"Why GitLab?",{"href":83,"dataGaName":394,"dataGaLocation":373},{"freeTrial":397,"mobileIcon":402,"desktopIcon":407,"secondaryButton":410},{"text":398,"config":399},"Start free trial",{"href":400,"dataGaName":46,"dataGaLocation":401},"https://gitlab.com/-/trials/new/","nav",{"altText":403,"config":404},"Gitlab Icon",{"src":405,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":403,"config":408},{"src":409,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":411,"config":412},"Get Started",{"href":413,"dataGaName":414,"dataGaLocation":401},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":416,"mobileIcon":420,"desktopIcon":422},{"text":417,"config":418},"Learn more about GitLab Duo",{"href":75,"dataGaName":419,"dataGaLocation":401},"gitlab duo",{"altText":403,"config":421},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":423},{"src":409,"dataGaName":406,"dataGaLocation":401},{"button":425,"mobileIcon":430,"desktopIcon":432},{"text":426,"config":427},"/switch",{"href":428,"dataGaName":429,"dataGaLocation":401},"#contact","switch",{"altText":403,"config":431},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":433},{"src":434,"dataGaName":406,"dataGaLocation":401},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":436,"mobileIcon":441,"desktopIcon":443},{"text":437,"config":438},"Back to pricing",{"href":183,"dataGaName":439,"dataGaLocation":401,"icon":440},"back to pricing","GoBack",{"altText":403,"config":442},{"src":405,"dataGaName":406,"dataGaLocation":401},{"altText":403,"config":444},{"src":409,"dataGaName":406,"dataGaLocation":401},{"title":446,"button":447,"config":452},"See how agentic AI transforms software delivery",{"text":448,"config":449},"Watch GitLab Transcend now",{"href":450,"dataGaName":451,"dataGaLocation":41},"/events/transcend/virtual/","transcend event",{"layout":453,"icon":454,"disabled":26},"release","AiStar",{"data":456},{"text":457,"source":458,"edit":464,"contribute":469,"config":474,"items":479,"minimal":686},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":459,"config":460},"View page source",{"href":461,"dataGaName":462,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":465,"config":466},"Edit this page",{"href":467,"dataGaName":468,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":470,"config":471},"Please contribute",{"href":472,"dataGaName":473,"dataGaLocation":463},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":475,"facebook":476,"youtube":477,"linkedin":478},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[480,527,581,625,652],{"title":181,"links":481,"subMenu":496},[482,486,491],{"text":483,"config":484},"View plans",{"href":183,"dataGaName":485,"dataGaLocation":463},"view plans",{"text":487,"config":488},"Why Premium?",{"href":489,"dataGaName":490,"dataGaLocation":463},"/pricing/premium/","why premium",{"text":492,"config":493},"Why Ultimate?",{"href":494,"dataGaName":495,"dataGaLocation":463},"/pricing/ultimate/","why ultimate",[497],{"title":498,"links":499},"Contact Us",[500,503,505,507,512,517,522],{"text":501,"config":502},"Contact sales",{"href":50,"dataGaName":51,"dataGaLocation":463},{"text":356,"config":504},{"href":358,"dataGaName":359,"dataGaLocation":463},{"text":361,"config":506},{"href":363,"dataGaName":364,"dataGaLocation":463},{"text":508,"config":509},"Status",{"href":510,"dataGaName":511,"dataGaLocation":463},"https://status.gitlab.com/","status",{"text":513,"config":514},"Terms of use",{"href":515,"dataGaName":516,"dataGaLocation":463},"/terms/","terms of use",{"text":518,"config":519},"Privacy statement",{"href":520,"dataGaName":521,"dataGaLocation":463},"/privacy/","privacy statement",{"text":523,"config":524},"Cookie preferences",{"dataGaName":525,"dataGaLocation":463,"id":526,"isOneTrustButton":26},"cookie preferences","ot-sdk-btn",{"title":86,"links":528,"subMenu":537},[529,533],{"text":530,"config":531},"DevSecOps platform",{"href":68,"dataGaName":532,"dataGaLocation":463},"devsecops platform",{"text":534,"config":535},"AI-Assisted Development",{"href":75,"dataGaName":536,"dataGaLocation":463},"ai-assisted development",[538],{"title":539,"links":540},"Topics",[541,546,551,556,561,566,571,576],{"text":542,"config":543},"CICD",{"href":544,"dataGaName":545,"dataGaLocation":463},"/topics/ci-cd/","cicd",{"text":547,"config":548},"GitOps",{"href":549,"dataGaName":550,"dataGaLocation":463},"/topics/gitops/","gitops",{"text":552,"config":553},"DevOps",{"href":554,"dataGaName":555,"dataGaLocation":463},"/topics/devops/","devops",{"text":557,"config":558},"Version Control",{"href":559,"dataGaName":560,"dataGaLocation":463},"/topics/version-control/","version control",{"text":562,"config":563},"DevSecOps",{"href":564,"dataGaName":565,"dataGaLocation":463},"/topics/devsecops/","devsecops",{"text":567,"config":568},"Cloud Native",{"href":569,"dataGaName":570,"dataGaLocation":463},"/topics/cloud-native/","cloud native",{"text":572,"config":573},"AI for Coding",{"href":574,"dataGaName":575,"dataGaLocation":463},"/topics/devops/ai-for-coding/","ai for coding",{"text":577,"config":578},"Agentic AI",{"href":579,"dataGaName":580,"dataGaLocation":463},"/topics/agentic-ai/","agentic ai",{"title":582,"links":583},"Solutions",[584,586,588,593,597,600,604,607,609,612,615,620],{"text":128,"config":585},{"href":123,"dataGaName":128,"dataGaLocation":463},{"text":117,"config":587},{"href":100,"dataGaName":101,"dataGaLocation":463},{"text":589,"config":590},"Agile development",{"href":591,"dataGaName":592,"dataGaLocation":463},"/solutions/agile-delivery/","agile delivery",{"text":594,"config":595},"SCM",{"href":113,"dataGaName":596,"dataGaLocation":463},"source code management",{"text":542,"config":598},{"href":106,"dataGaName":599,"dataGaLocation":463},"continuous integration & delivery",{"text":601,"config":602},"Value stream management",{"href":156,"dataGaName":603,"dataGaLocation":463},"value stream management",{"text":547,"config":605},{"href":606,"dataGaName":550,"dataGaLocation":463},"/solutions/gitops/",{"text":166,"config":608},{"href":168,"dataGaName":169,"dataGaLocation":463},{"text":610,"config":611},"Small business",{"href":173,"dataGaName":174,"dataGaLocation":463},{"text":613,"config":614},"Public sector",{"href":178,"dataGaName":179,"dataGaLocation":463},{"text":616,"config":617},"Education",{"href":618,"dataGaName":619,"dataGaLocation":463},"/solutions/education/","education",{"text":621,"config":622},"Financial services",{"href":623,"dataGaName":624,"dataGaLocation":463},"/solutions/finance/","financial services",{"title":186,"links":626},[627,629,631,633,636,638,640,642,644,646,648,650],{"text":198,"config":628},{"href":200,"dataGaName":201,"dataGaLocation":463},{"text":203,"config":630},{"href":205,"dataGaName":206,"dataGaLocation":463},{"text":208,"config":632},{"href":210,"dataGaName":211,"dataGaLocation":463},{"text":213,"config":634},{"href":215,"dataGaName":635,"dataGaLocation":463},"docs",{"text":236,"config":637},{"href":238,"dataGaName":239,"dataGaLocation":463},{"text":231,"config":639},{"href":233,"dataGaName":234,"dataGaLocation":463},{"text":241,"config":641},{"href":243,"dataGaName":244,"dataGaLocation":463},{"text":249,"config":643},{"href":251,"dataGaName":252,"dataGaLocation":463},{"text":254,"config":645},{"href":256,"dataGaName":257,"dataGaLocation":463},{"text":259,"config":647},{"href":261,"dataGaName":262,"dataGaLocation":463},{"text":264,"config":649},{"href":266,"dataGaName":267,"dataGaLocation":463},{"text":269,"config":651},{"href":271,"dataGaName":272,"dataGaLocation":463},{"title":287,"links":653},[654,656,658,660,662,664,666,670,675,677,679,681],{"text":294,"config":655},{"href":296,"dataGaName":289,"dataGaLocation":463},{"text":299,"config":657},{"href":301,"dataGaName":302,"dataGaLocation":463},{"text":307,"config":659},{"href":309,"dataGaName":310,"dataGaLocation":463},{"text":312,"config":661},{"href":314,"dataGaName":315,"dataGaLocation":463},{"text":317,"config":663},{"href":319,"dataGaName":320,"dataGaLocation":463},{"text":322,"config":665},{"href":324,"dataGaName":325,"dataGaLocation":463},{"text":667,"config":668},"Sustainability",{"href":669,"dataGaName":667,"dataGaLocation":463},"/sustainability/",{"text":671,"config":672},"Diversity, inclusion and belonging (DIB)",{"href":673,"dataGaName":674,"dataGaLocation":463},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":327,"config":676},{"href":329,"dataGaName":330,"dataGaLocation":463},{"text":337,"config":678},{"href":339,"dataGaName":340,"dataGaLocation":463},{"text":342,"config":680},{"href":344,"dataGaName":345,"dataGaLocation":463},{"text":682,"config":683},"Modern Slavery Transparency Statement",{"href":684,"dataGaName":685,"dataGaLocation":463},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":687},[688,691,694],{"text":689,"config":690},"Terms",{"href":515,"dataGaName":516,"dataGaLocation":463},{"text":692,"config":693},"Cookies",{"dataGaName":525,"dataGaLocation":463,"id":526,"isOneTrustButton":26},{"text":695,"config":696},"Privacy",{"href":520,"dataGaName":521,"dataGaLocation":463},[698],{"id":699,"title":18,"body":8,"config":700,"content":702,"description":8,"extension":24,"meta":706,"navigation":26,"path":707,"seo":708,"stem":709,"__hash__":710},"blogAuthors/en-us/blog/authors/devin-sylva.yml",{"template":701},"BlogAuthor",{"name":18,"config":703},{"headshot":704,"ctfId":705},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749679087/Blog/Author%20Headshots/devin-headshot.jpg","devin",{},"/en-us/blog/authors/devin-sylva",{},"en-us/blog/authors/devin-sylva","l3KIHXrXrDogTi0AWpKv8FmF_A-YcDv3OvZal3qLfCQ",[712,727,740],{"content":713,"config":725},{"body":714,"title":715,"description":716,"authors":717,"heroImage":719,"date":720,"category":9,"tags":721},"Most CI/CD tools can run a build and ship a deployment. Where they diverge is what happens when your delivery needs get real: a monorepo with a dozen services, microservices spread across multiple repositories, deployments to dozens of environments, or a platform team trying to enforce standards without becoming a bottleneck.\n  \nGitLab's pipeline execution model was designed for that complexity. Parent-child pipelines, DAG execution, dynamic pipeline generation, multi-project triggers, merge request pipelines with merged results, and CI/CD Components each solve a distinct class of problems. Because they compose, understanding the full model unlocks something more than a faster pipeline. In this article, you'll learn about the five patterns where that model stands out, each mapped to a real engineering scenario with the configuration to match.\n  \nThe configs below are illustrative. The scripts use echo commands to keep the signal-to-noise ratio low. Swap them out for your actual build, test, and deploy steps and they are ready to use.\n\n\n## 1. Monorepos: Parent-child pipelines + DAG execution\n\n\nThe problem: Your monorepo has a frontend, a backend, and a docs site. Every commit triggers a full rebuild of everything, even when only a README changed.\n\n\nGitLab solves this with two complementary features: [parent-child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#parent-child-pipelines) (which let a top-level pipeline spawn isolated sub-pipelines) and [DAG execution via `needs`](https://docs.gitlab.com/ci/yaml/#needs) (which breaks rigid stage-by-stage ordering and lets jobs start the moment their dependencies finish).\n\n\nA parent pipeline detects what changed and triggers only the relevant child pipelines:\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - trigger\n\ntrigger-services:\n  stage: trigger\n  trigger:\n    include:\n      - local: '.gitlab/ci/api-service.yml'\n      - local: '.gitlab/ci/web-service.yml'\n      - local: '.gitlab/ci/worker-service.yml'\n    strategy: depend\n```\n\n\nEach child pipeline is a fully independent pipeline with its own stages, jobs, and artifacts. The parent waits for all of them via [strategy: depend](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#wait-for-downstream-pipeline-to-complete) so you get a single green/red signal at the top level, with full drill-down into each service's pipeline. This organizational separation is the bigger win for large teams: each service owns its pipeline config, changes in one cannot break another, and the complexity stays manageable as the repo grows.\n\n\nOne thing worth knowing: when you pass [multiple files to a single `trigger: include:`](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#combine-multiple-child-pipeline-configuration-files), GitLab merges them into a single child pipeline configuration. This means jobs defined across those files share the same pipeline context and can reference each other with `needs:`, which is what makes the DAG optimization possible. If you split them into separate trigger jobs instead, each would be its own isolated pipeline and cross-file `needs:` references would not work.\n\n\nCombine this with `needs:` inside each child pipeline and you get DAG execution. Your integration tests can start the moment the build finishes, without waiting for other jobs in the same stage.\n\n```yaml\n# .gitlab/ci/api-service.yml\nstages:\n  - build\n  - test\n\nbuild-api:\n  stage: build\n  script:\n    - echo \"Building API service\"\n\ntest-api:\n  stage: test\n  needs: [build-api]\n  script:\n    - echo \"Running API tests\"\n```\n\n\nWhy it matters: Teams with large monorepos typically report significant reductions in pipeline runtime after switching to DAG execution, since jobs no longer wait on unrelated work in the same stage. Parent-child pipelines add the organizational layer that keeps the configuration maintainable as the repo and team grow.\n\n![Local downstream pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738759/Blog/Imported/hackathon-fake-blog-post-s/image3_vwj3rz.png \"Local downstream pipelines\")\n\n## 2. Microservices: Cross-repo, multi-project pipelines\n\n\nThe problem: Your frontend lives in one repo, your backend in another. When the frontend team ships a change, they have no visibility into whether it broke the backend integration and vice versa.\n\n\nGitLab's [multi-project pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#multi-project-pipelines) let one project trigger a pipeline in a completely separate project and wait for the result. The triggering project gets a linked downstream pipeline right in its own pipeline view.\n\n\nThe frontend pipeline builds an API contract artifact and publishes it, then triggers the backend pipeline. The backend fetches that artifact directly using the [Jobs API](https://docs.gitlab.com/ee/api/jobs.html#download-a-single-artifact-file-from-specific-tag-or-branch) and validates it before allowing anything to proceed. If a breaking change is detected, the backend pipeline fails and the frontend pipeline fails with it.\n\n```yaml\n# frontend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n  - trigger-backend\n\nbuild-frontend:\n  stage: build\n  script:\n    - echo \"Building frontend and generating API contract...\"\n    - mkdir -p dist\n    - |\n      echo '{\n        \"api_version\": \"v2\",\n        \"breaking_changes\": false\n      }' > dist/api-contract.json\n    - cat dist/api-contract.json\n  artifacts:\n    paths:\n      - dist/api-contract.json\n    expire_in: 1 hour\n\ntest-frontend:\n  stage: test\n  script:\n    - echo \"All frontend tests passed!\"\n\ntrigger-backend-pipeline:\n  stage: trigger-backend\n  trigger:\n    project: my-org/backend-service\n    branch: main\n    strategy: depend\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n```\n\n```yaml\n# backend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n\nbuild-backend:\n  stage: build\n  script:\n    - echo \"All backend tests passed!\"\n\nintegration-test:\n  stage: test\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"pipeline\"\n  script:\n    - echo \"Fetching API contract from frontend...\"\n    - |\n      curl --silent --fail \\\n        --header \"JOB-TOKEN: $CI_JOB_TOKEN\" \\\n        --output api-contract.json \\\n        \"${CI_API_V4_URL}/projects/${FRONTEND_PROJECT_ID}/jobs/artifacts/main/raw/dist/api-contract.json?job=build-frontend\"\n    - cat api-contract.json\n    - |\n      if grep -q '\"breaking_changes\": true' api-contract.json; then\n        echo \"FAIL: Breaking API changes detected - backend integration blocked!\"\n        exit 1\n      fi\n      echo \"PASS: API contract is compatible!\"\n```\n\n\nA few things worth noting in this config. The `integration-test` job uses `$CI_PIPELINE_SOURCE == \"pipeline\"` to ensure it only runs when triggered by an upstream pipeline, not on a standalone push to the backend repo. The frontend project ID is referenced via `$FRONTEND_PROJECT_ID`, which should be set as a [CI/CD variable](https://docs.gitlab.com/ci/variables/) in the backend project settings to avoid hardcoding it.\n\n\nWhy it matters: Cross-service breakage that previously surfaced in production gets caught in the pipeline instead. The dependency between services stops being invisible and becomes something teams can see, track, and act on.\n\n\n![Cross-project pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738762/Blog/Imported/hackathon-fake-blog-post-s/image4_h6mfsb.png \"Cross-project pipelines\")\n\n\n## 3. Multi-tenant / matrix deployments: Dynamic child pipelines\n\n\nThe problem: You deploy the same application to 15 customer environments, or three cloud regions, or dev/staging/prod. Updating a deploy stage across all of them one by one is the kind of work that leads to configuration drift. Writing a separate pipeline for each environment is unmaintainable from day one.\n\n\nGitLab's [dynamic child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#dynamic-child-pipelines) let you generate a pipeline at runtime. A job runs a script that produces a YAML file, and that YAML becomes the pipeline for the next stage. The pipeline structure itself becomes data.\n\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - generate\n  - trigger-environments\n\ngenerate-config:\n  stage: generate\n  script:\n    - |\n      # ENVIRONMENTS can be passed as a CI variable or read from a config file.\n      # Default to dev, staging, prod if not set.\n      ENVIRONMENTS=${ENVIRONMENTS:-\"dev staging prod\"}\n      for ENV in $ENVIRONMENTS; do\n        cat > ${ENV}-pipeline.yml \u003C\u003C EOF\n      stages:\n        - deploy\n        - verify\n      deploy-${ENV}:\n        stage: deploy\n        script:\n          - echo \"Deploying to ${ENV} environment\"\n      verify-${ENV}:\n        stage: verify\n        script:\n          - echo \"Running smoke tests on ${ENV}\"\n      EOF\n      done\n  artifacts:\n    paths:\n      - \"*.yml\"\n    exclude:\n      - \".gitlab-ci.yml\"\n\n.trigger-template:\n  stage: trigger-environments\n  trigger:\n    strategy: depend\n\ntrigger-dev:\n  extends: .trigger-template\n  trigger:\n    include:\n      - artifact: dev-pipeline.yml\n        job: generate-config\n\ntrigger-staging:\n  extends: .trigger-template\n  needs: [trigger-dev]\n  trigger:\n    include:\n      - artifact: staging-pipeline.yml\n        job: generate-config\n\ntrigger-prod:\n  extends: .trigger-template\n  needs: [trigger-staging]\n  trigger:\n    include:\n      - artifact: prod-pipeline.yml\n        job: generate-config\n  when: manual\n```\n\n\nThe generation script loops over an `ENVIRONMENTS` variable rather than hardcoding each environment separately. Pass in a different list via a CI variable or read it from a config file and the pipeline adapts without touching the YAML. The trigger jobs use [extends:](https://docs.gitlab.com/ci/yaml/#extends) to inherit shared configuration from `.trigger-template`, so `strategy: depend` is defined once rather than repeated on every trigger job. Add a new environment by updating the variable, not by duplicating pipeline config. Add [when: manual](https://docs.gitlab.com/ci/yaml/#when) to the production trigger and you get a promotion gate baked right into the pipeline graph.\n\n\nWhy it matters: SaaS companies and platform teams use this pattern to manage dozens of environments without duplicating pipeline logic. The pipeline structure itself stays lean as the deployment matrix grows.\n\n\n![Dynamic pipeline](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738765/Blog/Imported/hackathon-fake-blog-post-s/image7_wr0kx2.png \"Dynamic pipeline\")\n\n\n## 4. MR-first delivery: Merge request pipelines, merged results, and workflow routing\n\n\nThe problem: Your pipeline runs on every push to every branch. Expensive tests run on feature branches that will never merge. Meanwhile, you have no guarantee that what you tested is actually what will land on `main` after a merge.\n\n\nGitLab has three interlocking features that solve this together:\n\n\n*   [Merge request pipelines](https://docs.gitlab.com/ci/pipelines/merge_request_pipelines/) run only when a merge request exists, not on every branch push. This alone eliminates a significant amount of wasted compute.\n\n*   [Merged results pipelines](https://docs.gitlab.com/ci/pipelines/merged_results_pipelines/) go further. GitLab creates a temporary merge commit (your branch plus the current target branch) and runs the pipeline against that. You are testing what will actually exist after the merge, not just your branch in isolation.\n\n*   [Workflow rules](https://docs.gitlab.com/ci/yaml/workflow/) let you define exactly which pipeline type runs under which conditions and suppress everything else. The `$CI_OPEN_MERGE_REQUESTS` guard below prevents duplicate pipelines firing for both a branch and its open MR simultaneously.\n\n\nWith those three working together, here is what a tiered pipeline looks like:\n\n```yaml\n# .gitlab-ci.yml\nworkflow:\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS\n      when: never\n    - if: $CI_COMMIT_BRANCH\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\nstages:\n  - fast-checks\n  - expensive-tests\n  - deploy\n\nlint-code:\n  stage: fast-checks\n  script:\n    - echo \"Running linter\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nunit-tests:\n  stage: fast-checks\n  script:\n    - echo \"Running unit tests\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nintegration-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running integration tests (15 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\ne2e-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running E2E tests (30 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nnightly-comprehensive-scan:\n  stage: expensive-tests\n  script:\n    - echo \"Running full nightly suite (2 hours)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\ndeploy-production:\n  stage: deploy\n  script:\n    - echo \"Deploying to production\"\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n      when: manual\n```\n\nWith this setup, the pipeline behaves differently depending on context. A push to a feature branch with no open MR runs lint and unit tests only. Once an MR is opened, the workflow rules switch from a branch pipeline to an MR pipeline, and the full integration and E2E suite runs against the merged result. Merging to `main` queues a manual production deployment. A nightly schedule runs the comprehensive scan once, not on every commit.\n\n\nWhy it matters: Teams routinely cut CI costs significantly with this pattern, not by running fewer tests, but by running the right tests at the right time. Merged results pipelines catch the class of bugs that only appear after a merge, before they ever reach `main`.\n\n\n![Conditional pipelines (within a branch with no MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738768/Blog/Imported/hackathon-fake-blog-post-s/image6_dnfcny.png \"Conditional pipelines (within a branch with no MR)\")\n\n\n\n![Conditional pipelines (within an MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738772/Blog/Imported/hackathon-fake-blog-post-s/image1_wyiafu.png \"Conditional pipelines (within an MR)\")\n\n\n\n![Conditional pipelines (on the main branch)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738774/Blog/Imported/hackathon-fake-blog-post-s/image5_r6lkfd.png \"Conditional pipelines (on the main branch)\")\n\n## 5. Governed pipelines: CI/CD Components\n\n\nThe problem: Your platform team has defined the right way to build, test, and deploy. But every team has their own `.gitlab-ci.yml` with subtle variations. Security scanning gets skipped. Deployment standards drift. Audits are painful.\n\n\nGitLab [CI/CD Components](https://docs.gitlab.com/ci/components/) let platform teams publish versioned, reusable pipeline building blocks. Application teams consume them with a single `include:` line and optional inputs — no copy-paste, no drift. Components are discoverable through the [CI/CD Catalog](https://docs.gitlab.com/ci/components/#cicd-catalog), which means teams can find and adopt approved building blocks without needing to go through the platform team directly.\n\n\nHere is a component definition from a shared library:\n\n```yaml\n# templates/deploy.yml\nspec:\n  inputs:\n    stage:\n      default: deploy\n    environment:\n      default: production\n---\ndeploy-job:\n  stage: $[[ inputs.stage ]]\n  script:\n    - echo \"Deploying $APP_NAME to $[[ inputs.environment ]]\"\n    - echo \"Deploy URL: $DEPLOY_URL\"\n  environment:\n    name: $[[ inputs.environment ]]\n```\nAnd here is how an application team consumes it:\n\n```yaml\n# Application repo: .gitlab-ci.yml\nvariables:\n  APP_NAME: \"my-awesome-app\"\n  DEPLOY_URL: \"https://api.example.com\"\n\ninclude:\n  - component: gitlab.com/my-org/component-library/build@v1.0.6\n  - component: gitlab.com/my-org/component-library/test@v1.0.6\n  - component: gitlab.com/my-org/component-library/deploy@v1.0.6\n    inputs:\n      environment: staging\n\nstages:\n  - build\n  - test\n  - deploy\n```\n\nThree lines of `include:` replace hundreds of lines of duplicated YAML. The platform team can push a security fix to `v1.0.7` and teams opt in on their own schedule — or the platform team can pin everyone to a minimum version. Either way, one change propagates everywhere instead of needing to be applied repo by repo.\n\n\nPair this with [resource groups](https://docs.gitlab.com/ci/resource_groups/) to prevent concurrent deployments to the same environment, and [protected environments](https://docs.gitlab.com/ci/environments/protected_environments/) to enforce approval gates - and you have a governed delivery platform where compliance is the default, not the exception.\n\n\nWhy it matters: This is the pattern that makes GitLab CI/CD scale across hundreds of teams. Platform engineering teams enforce compliance without becoming a bottleneck. Application teams get a fast path to a working pipeline without reinventing the wheel.\n\n\n![Component pipeline (imported jobs)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738776/Blog/Imported/hackathon-fake-blog-post-s/image2_pizuxd.png \"Component pipeline (imported jobs)\")\n\n## Putting it all together\n\nNone of these features exist in isolation. The reason GitLab's pipeline model is worth understanding deeply is that these primitives compose:\n\n*   A monorepo uses parent-child pipelines, and each child uses DAG execution\n\n*   A microservices platform uses multi-project pipelines, and each project uses MR pipelines with merged results\n\n*   A governed platform uses CI/CD components to standardize the patterns above across every team\n\n\nMost teams discover one of these features when they hit a specific pain point. The ones who invest in understanding the full model end up with a delivery system that actually reflects how their engineering organization works, not a pipeline that fights it.\n\n## Other patterns worth exploring\n\n\nThe five patterns above cover the most common structural pain points, but GitLab's pipeline model goes further. A few others worth looking into as your needs grow:\n\n\n*   [Review apps with dynamic environments](https://docs.gitlab.com/ci/environments/) let you spin up a live preview for every feature branch and tear it down automatically when the MR closes. Useful for teams doing frontend work or API changes that need stakeholder sign-off before merging.\n\n*   [Caching and artifact strategies](https://docs.gitlab.com/ci/caching/) are often the fastest way to cut pipeline runtime after the structural work is done. Structuring `cache:` keys around dependency lockfiles and being deliberate about what gets passed between jobs with [artifacts:](https://docs.gitlab.com/ci/yaml/#artifacts) can make a significant difference without changing your pipeline shape at all.\n\n*   [Scheduled and API-triggered pipelines](https://docs.gitlab.com/ci/pipelines/schedules/) are worth knowing about because not everything should run on a code push. Nightly security scans, compliance reports, and release automation are better modeled as scheduled or [API-triggered](https://docs.gitlab.com/ci/triggers/) pipelines with `$CI_PIPELINE_SOURCE` routing the right jobs for each context.\n\n## How to get started\n\nModern software delivery is complex. Teams are managing monorepos with dozens of services, coordinating across multiple repositories, deploying to many environments at once, and trying to keep standards consistent as organizations grow. GitLab's pipeline model was built with all of that in mind.\n\nWhat makes it worth investing time in is how well the pieces fit together. Parent-child pipelines bring structure to large codebases. Multi-project pipelines make cross-team dependencies visible and testable. Dynamic pipelines turn environment management into something that scales gracefully. MR-first delivery with merged results ensures confidence at every step of the review process. And CI/CD Components give platform teams a way to share best practices across an entire organization without becoming a bottleneck.\n\nEach of these features is powerful on its own, and even more so when combined. GitLab gives you the building blocks to design a delivery system that fits how your team actually works, and grows with you as your needs evolve.\n\n> [Start a free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/) to use pipeline logic today.\n\n## Read more\n\n*   [Variable and artifact sharing in GitLab parent-child pipelines](https://about.gitlab.com/blog/variable-and-artifact-sharing-in-gitlab-parent-child-pipelines/)\n*   [CI/CD inputs: Secure and preferred method to pass parameters to a pipeline](https://about.gitlab.com/blog/ci-cd-inputs-secure-and-preferred-method-to-pass-parameters-to-a-pipeline/)\n*   [Tutorial: How to set up your first GitLab CI/CD component](https://about.gitlab.com/blog/tutorial-how-to-set-up-your-first-gitlab-ci-cd-component/)\n*   [How to include file references in your CI/CD components](https://about.gitlab.com/blog/how-to-include-file-references-in-your-ci-cd-components/)\n*   [FAQ: GitLab CI/CD Catalog](https://about.gitlab.com/blog/faq-gitlab-ci-cd-catalog/)\n*   [Building a GitLab CI/CD pipeline for a monorepo the easy way](https://about.gitlab.com/blog/building-a-gitlab-ci-cd-pipeline-for-a-monorepo-the-easy-way/)\n*   [A CI/CD component builder's journey](https://about.gitlab.com/blog/a-ci-component-builders-journey/)\n*   [CI/CD Catalog goes GA: No more building pipelines from scratch](https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/)","5 ways GitLab pipeline logic solves real engineering problems","Learn how to scale CI/CD with composable patterns for monorepos, microservices, environments, and governance.",[718],"Omid Khan","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772721753/frfsm1qfscwrmsyzj1qn.png","2026-04-09",[104,722,723,724],"DevOps platform","tutorial","features",{"featured":26,"template":13,"slug":726},"5-ways-gitlab-pipeline-logic-solves-real-engineering-problems",{"content":728,"config":738},{"title":729,"description":730,"authors":731,"heroImage":733,"date":734,"body":735,"category":9,"tags":736},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[732],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficult to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[723,737,724],"product",{"featured":12,"template":13,"slug":739},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"content":741,"config":751},{"title":742,"description":743,"authors":744,"heroImage":746,"date":747,"category":9,"tags":748,"body":750},"How IIT Bombay students are coding the future with GitLab","At GitLab, we often talk about how software accelerates innovation. But sometimes, you have to step away from the Zoom calls and stand in a crowded university hall to remember why we do this.",[745],"Nick Veenhof","https://res.cloudinary.com/about-gitlab-com/image/upload/v1750099013/Blog/Hero%20Images/Blog/Hero%20Images/blog-image-template-1800x945%20%2814%29_6VTUA8mUhOZNDaRVNPeKwl_1750099012960.png","2026-01-08",[257,619,749],"open source","The GitLab team recently had the privilege of judging the **iHack Hackathon** at **IIT Bombay's E-Summit**. The energy was electric, the coffee was flowing, and the talent was undeniable. But what struck us most wasn't just the code — it was the sheer determination of students to solve real-world problems, often overcoming significant logistical and financial hurdles to simply be in the room.\n\n\nThrough our [GitLab for Education program](https://about.gitlab.com/solutions/education/), we aim to empower the next generation of developers with tools and opportunity. Here is a look at what the students built, and how they used GitLab to bridge the gap between idea and reality.\n\n## The challenge: Build faster, build securely\n\nThe premise for the GitLab track of the hackathon was simple: Don't just show us a product; show us how you built it. We wanted to see how students utilized GitLab's platform — from Issue Boards to CI/CD pipelines — to accelerate the development lifecycle.\n\nThe results were inspiring.\n\n## The winners\n\n### 1st place: Team Decode — Democratizing Scientific Research\n\n**Project:** FIRE (Fast Integrated Research Environment)\n\nTeam Decode took home the top prize with a solution that warms a developer's heart: a local-first, blazing-fast data processing tool built with [Rust](https://about.gitlab.com/blog/secure-rust-development-with-gitlab/) and Tauri. They identified a massive pain point for data science students: existing tools are fragmented, slow, and expensive.\n\nTheir solution, FIRE, allows researchers to visualize complex formats (like NetCDF) instantly. What impressed the judges most was their \"hacker\" ethos. They didn't just build a tool; they built it to be open and accessible.\n\n**How they used GitLab:** Since the team lived far apart, asynchronous communication was key. They utilized **GitLab Issue Boards** and **Milestones** to track progress and integrated their repo with Telegram to get real-time push notifications. As one team member noted, \"Coordinating all these technologies was really difficult, and what helped us was GitLab... the Issue Board really helped us track who was doing what.\"\n\n![Team Decode](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/epqazj1jc5c7zkgqun9h.jpg)\n\n### 2nd place: Team BichdeHueDost — Reuniting to Solve Payments\n\n**Project:** SemiPay (RFID Cashless Payment for Schools)\n\nThe team name, BichdeHueDost, translates to \"Friends who have been set apart.\" It's a fitting name for a group of friends who went to different colleges but reunited to build this project. They tackled a unique problem: handling cash in schools for young children. Their solution used RFID cards backed by a blockchain ledger to ensure secure, cashless transactions for students.\n\n**How they used GitLab:** They utilized [GitLab CI/CD](https://about.gitlab.com/topics/ci-cd/) to automate the build process for their Flutter application (APK), ensuring that every commit resulted in a testable artifact. This allowed them to iterate quickly despite the \"flaky\" nature of cross-platform mobile development.\n\n![Team BichdeHueDost](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/pkukrjgx2miukb6nrj5g.jpg)\n\n### 3rd place: Team ZenYukti — Agentic Repository Intelligence\n\n**Project:** RepoInsight AI (AI-powered, GitLab-native intelligence platform)\n\nTeam ZenYukti impressed us with a solution that tackles a universal developer pain point: understanding unfamiliar codebases. What stood out to the judges was the tool's practical approach to onboarding and code comprehension: RepoInsight-AI automatically generates documentation, visualizes repository structure, and even helps identify bugs, all while maintaining context about the entire codebase.\n\n**How they used GitLab:** The team built a comprehensive CI/CD pipeline that showcased GitLab's security and DevOps capabilities. They integrated [GitLab's Security Templates](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/templates/Security) (SAST, Dependency Scanning, and Secret Detection), and utilized [GitLab Container Registry](https://docs.gitlab.com/user/packages/container_registry/) to manage their Docker images for backend and frontend components. They created an AI auto-review bot that runs on merge requests, demonstrating an \"agentic workflow\" where AI assists in the development process itself.\n\n![Team ZenYukti](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/ymlzqoruv5al1secatba.jpg)\n\n## Beyond the code: A lesson in inclusion\n\nWhile the code was impressive, the most powerful moment of the event happened away from the keyboard.\n\nDuring the feedback session, we learned about the journey Team ZenYukti took to get to Mumbai. They traveled over 24 hours, covering nearly 1,800 kilometers. Because flights were too expensive and trains were booked, they traveled in the \"General Coach,\" a non-reserved, severely overcrowded carriage.\n\nAs one student described it:\n\n*\"You cannot even imagine something like this... there are no seats... people sit on the top of the train. This is what we have endured.\"*\n\nThis hit home. [Diversity, Inclusion, and Belonging](https://handbook.gitlab.com/handbook/company/culture/inclusion/) are core values at GitLab. We realized that for these students, the barrier to entry wasn't intellect or skill, it was access.\n\nIn that moment, we decided to break that barrier. We committed to reimbursing the travel expenses for the participants who struggled to get there. It's a small step, but it underlines a massive truth: **talent is distributed equally, but opportunity is not.**\n\n![hackathon class together](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380252/o5aqmboquz8ehusxvgom.jpg)\n\n### The future is bright (and automated)\n\nWe also saw incredible potential in teams like Prometheus, who attempted to build an autonomous patch remediation tool (DevGuardian), and Team Arrakis, who built a voice-first job portal for blue-collar workers using [GitLab Duo](https://about.gitlab.com/gitlab-duo-agent-platform/) to troubleshoot their pipelines.\n\nTo all the students who participated: You are the future. Through [GitLab for Education](https://about.gitlab.com/solutions/education/), we are committed to providing you with the top-tier tools (like GitLab Ultimate) you need to learn, collaborate, and change the world — whether you are coding from a dorm room, a lab, or a train carriage. **Keep shipping.**\n\n> :bulb: Learn more about the [GitLab for Education program](https://about.gitlab.com/solutions/education/).\n",{"slug":752,"featured":12,"template":13},"how-iit-bombay-students-code-future-with-gitlab",{"promotions":754},[755,769,780,792],{"id":756,"categories":757,"header":759,"text":760,"button":761,"image":766},"ai-modernization",[758],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":762,"config":763},"Get your AI maturity score",{"href":764,"dataGaName":765,"dataGaLocation":239},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":767},{"src":768},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":770,"categories":771,"header":772,"text":760,"button":773,"image":777},"devops-modernization",[737,565],"Are you just managing tools or shipping innovation?",{"text":774,"config":775},"Get your DevOps maturity score",{"href":776,"dataGaName":765,"dataGaLocation":239},"/assessments/devops-modernization-assessment/",{"config":778},{"src":779},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":781,"categories":782,"header":784,"text":760,"button":785,"image":789},"security-modernization",[783],"security","Are you trading speed for security?",{"text":786,"config":787},"Get your security maturity score",{"href":788,"dataGaName":765,"dataGaLocation":239},"/assessments/security-modernization-assessment/",{"config":790},{"src":791},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":793,"paths":794,"header":797,"text":798,"button":799,"image":804},"github-azure-migration",[795,796],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":800,"config":801},"See how GitLab compares to GitHub",{"href":802,"dataGaName":803,"dataGaLocation":239},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":805},{"src":779},{"header":807,"blurb":808,"button":809,"secondaryButton":814},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":810,"config":811},"Get your free trial",{"href":812,"dataGaName":46,"dataGaLocation":813},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":501,"config":815},{"href":50,"dataGaName":51,"dataGaLocation":813},1776454406527]