[{"data":1,"prerenderedAt":832},["ShallowReactive",2],{"/en-us/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions":3,"navigation-en-us":39,"banner-en-us":449,"footer-en-us":459,"blog-post-authors-en-us-Grzegorz Bizon|Stan Hu":701,"blog-related-posts-en-us-why-we-spent-the-last-month-eliminating-postgresql-subtransactions":727,"blog-promotions-en-us":769,"next-steps-en-us":822},{"id":4,"title":5,"authorSlugs":6,"body":9,"categorySlug":10,"config":11,"content":15,"description":9,"extension":28,"isFeatured":13,"meta":29,"navigation":30,"path":31,"publishedDate":22,"seo":32,"stem":36,"tagSlugs":37,"__hash__":38},"blogPosts/en-us/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions.yml","Why We Spent The Last Month Eliminating Postgresql Subtransactions",[7,8],"grzegorz-bizon","stan-hu",null,"engineering",{"slug":12,"featured":13,"template":14},"why-we-spent-the-last-month-eliminating-postgresql-subtransactions",false,"BlogPost",{"title":16,"description":17,"authors":18,"heroImage":21,"date":22,"body":23,"category":10,"tags":24},"Why we spent the last month eliminating PostgreSQL subtransactions","How a mysterious stall in database queries uncovered a performance limitation with PostgreSQL.",[19,20],"Grzegorz Bizon","Stan Hu","https://res.cloudinary.com/about-gitlab-com/image/upload/v1749669470/Blog/Hero%20Images/nessie.jpg","2021-09-29","Since last June, we noticed the database on GitLab.com would\nmysteriously stall for minutes, which would lead to users seeing 500\nerrors during this time. Through a painstaking investigation over\nseveral weeks, we finally uncovered the cause of this: initiating a\nsubtransaction via the [`SAVEPOINT` SQL query](https://www.postgresql.org/docs/current/sql-savepoint.html) while\na long transaction is in progress can wreak havoc on database\nreplicas. Thus launched a race, which we recently completed, to\neliminate all `SAVEPOINT` queries from our code. Here's what happened,\nhow we discovered the problem, and what we did to fix it.\n\n### The symptoms begin\n\nOn June 24th, we noticed that our CI/CD runners service reported a high\nerror rate:\n\n![runners errors](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/ci-runners-errors.png)\n\nA quick investigation revealed that database queries used to retrieve\nCI/CD builds data were timing out and that the unprocessed builds\nbacklog grew at a high rate:\n\n![builds queue](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/builds-queue.png)\n\nOur monitoring also showed that some of the SQL queries were waiting for\nPostgreSQL lightweight locks (`LWLocks`):\n\n![aggregated lwlocks](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/aggregated-lwlocks.png)\n\nIn the following weeks we had experienced a few incidents like this. We were\nsurprised to see how sudden these performance degradations were, and how\nquickly things could go back to normal:\n\n![ci queries latency](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/ci-queries-latency.png)\n\n### Introducing Nessie: Stalled database queries\n\nIn order to learn more, we extended our observability tooling [to sample\nmore data from `pg_stat_activity`](https://gitlab.com/gitlab-cookbooks/gitlab-exporters/-/merge_requests/231). In PostgreSQL, the `pg_stat_activity`\nvirtual table contains the list of all database connections in the system as\nwell as what they are waiting for, such as a SQL query from the\nclient. We observed a consistent pattern: the queries were waiting on\n`SubtransControlLock`. Below shows a graph of the URLs or jobs that were\nstalled:\n\n![endpoints locked](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/endpoints-locked.png)\n\nThe purple line shows the sampled number of transactions locked by\n`SubtransControlLock` for the `POST /api/v4/jobs/request` endpoint that\nwe use for internal communication between GitLab and GitLab Runners\nprocessing CI/CD jobs.\n\nAlthough this endpoint was impacted the most, the whole database cluster\nappeared to be affected as many other, unrelated queries timed out.\n\nThis same pattern would rear its head on random days. A week would pass\nby without incident, and then it would show up for 15 minutes and\ndisappear for days. Were we chasing the Loch Ness Monster?\n\nLet's call these stalled queries Nessie for fun and profit.\n\n### What is a `SAVEPOINT`?\n\nTo understand `SubtransControlLock` ([PostgreSQL\n13](https://www.postgresql.org/docs/13/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW)\nrenamed this to `SubtransSLRU`), we first must understand how\nsubtransactions work in PostgreSQL. In PostgreSQL, a transaction can\nstart via a `BEGIN` statement, and a subtransaction can be started with\na subsequent `SAVEPOINT` query. PostgreSQL assigns each of these a\ntransaction ID (XID for short) [when a transaction or a subtransaction\nneeds one, usually before a client modifies data](https://gitlab.com/postgres/postgres/blob/a00c138b78521b9bc68b480490a8d601ecdeb816/src/backend/access/transam/README#L193-L198).\n\n#### Why would you use a `SAVEPOINT`?\n\nFor example, let's say you were running an online store and a customer\nplaced an order. Before the order is fullfilled, the system needs to\nensure a credit card account exists for that user. In Rails, a common\npattern is to start a transaction for the order and call\n[`find_or_create_by`](https://apidock.com/rails/v5.2.3/ActiveRecord/Relation/find_or_create_by). For\nexample:\n\n```ruby\nOrder.transaction do\n  begin\n    CreditAccount.transaction(requires_new: true) do\n      CreditAccount.find_or_create_by(customer_id: customer.id)\n  rescue ActiveRecord::RecordNotUnique\n    retry\n  end\n  # Fulfill the order\n  # ...\nend\n```\n\nIf two orders were placed around the same time, you wouldn't want the\ncreation of a duplicate account to fail one of the orders. Instead, you\nwould want the system to say, \"Oh, an account was just created; let me\nuse that.\"\n\nThat's where subtransactions come in handy: the `requires_new: true`\ntells Rails to start a new subtransaction if the application already is\nin a transaction. The code above translates into several SQL calls that\nlook something like:\n```sql\n--- Start a transaction\nBEGIN\nSAVEPOINT active_record_1\n--- Look up the account\nSELECT * FROM credit_accounts WHERE customer_id = 1\n--- Insert the account; this may fail due to a duplicate constraint\nINSERT INTO credit_accounts (customer_id) VALUES (1)\n--- Abort this by rolling back\nROLLBACK TO active_record_1\n--- Retry here: Start a new subtransaction\nSAVEPOINT active_record_2\n--- Find the newly-created account\nSELECT * FROM credit_accounts WHERE customer_id = 1\n--- Save the data\nRELEASE SAVEPOINT active_record_2\nCOMMIT\n```\n\nOn line 7 above, the `INSERT` might fail if the customer account was\nalready created, and the database unique constraint would prevent a\nduplicate entry. Without the first `SAVEPOINT` and `ROLLBACK` block, the\nwhole transaction would have failed. With that subtransaction, the\ntransaction can retry gracefully and look up the existing account.\n\n### What is `SubtransControlLock`?\n\nAs we mentioned earlier, Nessie returned at random times with queries\nwaiting for `SubtransControlLock`. `SubtransControlLock` indicates that\nthe query is waiting for PostgreSQL to load subtransaction data from\ndisk into shared memory.\n\nWhy is this data needed? When a client runs a `SELECT`, for example,\nPostgreSQL needs to decide whether each version of a row, known as a\ntuple, is actually visible within the current transaction. It's possible\nthat a tuple has been deleted or has yet to be committed by another\ntransaction. Since only a top-level transaction can actually commit\ndata, PostgreSQL needs to map a subtransaction ID (subXID) to its parent\nXID.\n\nThis mapping of subXID to parent XID is stored on disk in the\n`pg_subtrans` directory. Since reading from disk is slow, PostgreSQL\nadds a simple least-recently used (SLRU) cache in front for each\nbackend process. The lookup is fast if the desired page is already\ncached. However, as [Laurenz Albe discussed in his blog\npost](https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/),\nPostgreSQL may need to read from disk if the number of active\nsubtransactions exceeds 64 in a given transaction, a condition\nPostgreSQL terms `suboverflow`. Think of it as the feeling you might get\nif you ate too many Subway sandwiches.\n\nSuboverflowing (is that a word?) can bog down performance because as\nLaurenz said, \"Other transactions have to update `pg_subtrans` to\nregister subtransactions, and you can see in the perf output how they\nvie for lightweight locks with the readers.\"\n\n### Hunting for nested subtransactions\n\nLaurenz's blog post suggested that we might be using too many\nsubtransactions in one transaction. At first, we suspected we might be\ndoing this in some of our expensive background jobs, such as project\nexport or import. However, while we did see numerous `SAVEPOINT` calls\nin these jobs, we didn't see an unusual degree of nesting in local\ntesting.\n\nTo isolate the cause, we started by [adding Prometheus metrics to track\nsubtransactions as a Prometheus metric by model](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66477).\nThis led to nice graphs as the following:\n\n![subtransactions plot](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/subtransactions-plot.png)\n\nWhile this was helpful in seeing the rate of subtransactions over time,\nwe didn't see any obvious spikes that occurred around the time of the\ndatabase stalls. Still, it was possible that suboverflow was happening.\n\nTo see if that was happening, we [instrumented our application to track\nsubtransactions and log a message whenever we detected more than 32\n`SAVEPOINT` calls in a given transaction](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67918). Rails\nmakes it possible for the application to subscribe to all of its SQL\nqueries via `ActiveSupport` notifications. Our instrumentation looked\nsomething like this, simplified for the purposes of discussion:\n\n```ruby\nActiveSupport::Notifications.subscribe('sql.active_record') do |event|\n  sql = event.payload.dig(:sql).to_s\n  connection = event.payload[:connection]\n  manager = connection&.transaction_manager\n\n  context = manager.transaction_context\n  return if context.nil?\n\n  if sql.start_with?('BEGIN')\n    context.set_depth(0)\n  elsif cmd.start_with?('SAVEPOINT', 'EXCEPTION')\n    context.increment_savepoints\n  elsif cmd.start_with?('ROLLBACK TO SAVEPOINT')\n    context.increment_rollbacks\n  elsif cmd.start_with?('RELEASE SAVEPOINT')\n    context.increment_releases\n  elsif sql.start_with?('COMMIT', 'ROLLBACK')\n    context.finish_transaction\n  end\nend\n```\n\nThis code looks for the key SQL commands that initiate transactions and\nsubtransactions and increments counters when they occurred. After a\n`COMMIT,` we log a JSON message that contained the backtrace and the\nnumber of `SAVEPOINT` and `RELEASES` calls. For example:\n\n```json\n{\n  \"sql\": \"/*application:web,correlation_id:01FEBFH1YTMSFEEHS57FA8C6JX,endpoint_id:POST /api/:version/projects/:id/merge_requests/:merge_request_iid/approve*/ BEGIN\",\n  \"savepoints_count\": 1,\n  \"savepoint_backtraces\": [\n    [\n      \"app/models/application_record.rb:75:in `block in safe_find_or_create_by'\",\n      \"app/models/application_record.rb:75:in `safe_find_or_create_by'\",\n      \"app/models/merge_request.rb:1859:in `ensure_metrics'\",\n      \"ee/lib/analytics/merge_request_metrics_refresh.rb:11:in `block in execute'\",\n      \"ee/lib/analytics/merge_request_metrics_refresh.rb:10:in `each'\",\n      \"ee/lib/analytics/merge_request_metrics_refresh.rb:10:in `execute'\",\n      \"ee/app/services/ee/merge_requests/approval_service.rb:57:in `calculate_approvals_metrics'\",\n      \"ee/app/services/ee/merge_requests/approval_service.rb:45:in `block in create_event'\",\n      \"ee/app/services/ee/merge_requests/approval_service.rb:43:in `create_event'\",\n      \"app/services/merge_requests/approval_service.rb:13:in `execute'\",\n      \"ee/app/services/ee/merge_requests/approval_service.rb:14:in `execute'\",\n      \"lib/api/merge_request_approvals.rb:58:in `block (3 levels) in \u003Cclass:MergeRequestApprovals>'\",\n    ]\n  \"rollbacks_count\": 0,\n  \"releases_count\": 1\n}\n```\n\nThis log message contains not only the number of subtransactions via\n`savepoints_count`, but it also contains a handy backtrace that\nidentifies the exact source of the problem. The `sql` field also\ncontains [Marginalia comments](https://github.com/basecamp/marginalia)\nthat we tack onto every SQL query. These comments make it possible to\nidentify what HTTP request initiated the SQL query.\n\n### Taking a hard look at PostgreSQL\n\nThe new instrumentation showed that while the application regularly used\nsubtransactions, it never exceeded 10 nested `SAVEPOINT` calls.\n\nMeanwhile, [Nikolay Samokhvalov](https://gitlab.com/NikolayS), founder\nof [Postgres.ai](https://postgres.ai/), performed a battery of tests [trying to replicate the problem](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/20).\nHe replicated Laurenz's results when a single transaction exceeded 64\nsubtransactions, but that wasn't happening here.\n\nWhen the database stalls occurred, we observed a number of patterns:\n\n1. Only the replicas were affected; the primary remained unaffected.\n1. There was a long-running transaction, usually relating to\nPostgreSQL's autovacuuming, during the time. The stalls stopped quickly after the transaction ended.\n\nWhy would this matter? Analyzing the PostgreSQL source code, Senior\nSupport Engineer [Catalin Irimie](https://gitlab.com/cat) [posed an\nintriguing question that led to a breakthrough in our understanding](https://gitlab.com/gitlab-org/gitlab/-/issues/338410#note_652056284):\n\n> Does this mean that, having subtransactions spanning more than 32 cache pages, concurrently, would trigger the exclusive SubtransControlLock because we still end up reading them from the disk?\n\n### Reproducing the problem with replicas\n\nTo answer this, Nikolay immediately modified his test [to involve replicas and long-running transactions](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/21#note_653453774). Within a day, he reproduced the problem:\n\n![Nikolay experiment](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/nikolay-experiment.png)\n\nThe image above shows that transaction rates remain steady around\n360,000 transactions per second (TPS). Everything was proceeding fine\nuntil the long-running transaction started on the primary. Then suddenly\nthe transaction rates plummeted to 50,000 TPS on the replicas. Canceling\nthe long transaction immediately caused the transaction rate to return.\n\n### What is going on here?\n\nIn his blog post, Nikolay called the problem [Subtrans SLRU overflow](https://v2.postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful#problem-4-subtrans-slru-overflow).\nIn a busy database, it's possible for the size of the subtransaction log\nto grow so large that the working set no longer fits into memory. This\nresults in a lot of cache misses, which in turn causes a high amount of\ndisk I/O and CPU as PostgreSQL furiously tries to load data from disk to\nkeep up with all the lookups.\n\nAs mentioned earlier, the subtransaction cache holds a mapping of the\nsubXID to the parent XID. When PostgreSQL needs to look up the subXID,\nit calculates in which memory page this ID would live, and then does a\nlinear search to find in the memory page. If the page is not in the\ncache, it evicts one page and loads the desired one into memory. The\ndiagram below shows the memory layout of the subtransaction SLRU.\n\n![Subtrans SLRU](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/subtrans-slru.png)\n\nBy default, each SLRU page is an 8K buffer holding 4-byte parent\nXIDs. This means 8192/4 = 2048 transaction IDs can be stored in each\npage.\n\nNote that there may be gaps in each page. PostgreSQL will cache XIDs as\nneeded, so a single XID can occupy an entire page.\n\nThere are 32 (`NUM_SUBTRANS_BUFFERS`) pages, which means up to 65K\ntransaction IDs can be stored in memory. Nikolay demonstrated that in a\nbusy system, it took about 18 seconds to fill up all 65K entries. Then\nperformance dropped off a cliff, making the database replicas unusable.\n\nTo our surprise, our experiments also demonstrated that a single\n`SAVEPOINT` during a long-transaction [could initiate this problem if\nmany writes also occurred simultaneously](https://gitlab.com/gitlab-org/gitlab/-/issues/338865#note_655312474). That\nis, it wasn't enough just to reduce the frequency of `SAVEPOINT`; we had\nto eliminate them completely.\n\n#### Why does a single `SAVEPOINT` cause problems?\n\nTo answer this question, we need to understand what happens when a\n`SAVEPOINT` occurs in one query while a long-running transaction is\nrunning.\n\nWe mentioned earlier that PostgreSQL needs to decide whether a given row\nis visible to support a feature called [multi-version concurrency control](https://www.postgresql.org/docs/current/mvcc.html), or MVCC for\nshort. It does this by storing hidden columns, `xmin` and `xmax`, in\neach tuple.\n\n`xmin` holds the XID of when the tuple was created, and `xmax` holds the\nXID when it was marked as dead (0 if the row is still present). In\naddition, at the beginning of a transaction, PostgreSQL records metadata\nin a database snapshot. Among other items, this snapshot records the\noldest XID and the newest XID in its own `xmin` and `xmax` values.\n\nThis metadata helps [PostgreSQL determine whether a tuple is visible](https://www.interdb.jp/pg/pgsql05.html).\nFor example, a committed XID that started before `xmin` is definitely\nvisible, while anything after `xmax` is invisible.\n\n### What does this have to do with long transactions?\n\nLong transactions are bad in general because they can tie up\nconnections, but they can cause a subtly different problem on a\nreplica. On the replica, a single `SAVEPOINT` during a long transaction\ncauses a snapshot to suboverflow. Remember that dragged down performance\nin the case where we had more than 64 subtransactions.\n\nFundamentally, the problem happens because a replica behaves differently\nfrom a primary when creating snapshots and checking for tuple\nvisibility. The diagram below illustrates an example with some of the\ndata structures used in PostgreSQL:\n\n![Diagram of subtransaction handling in replicas](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/pg-replica-subtransaction-diagram.png)\n\nOn the top of this diagram, we can see the XIDs increase at the\nbeginning of a subtransaction: the `INSERT` after the `BEGIN` gets 1,\nand the subsequent `INSERT` in `SAVEPOINT` gets 2. Another client comes\nalong and performs a `INSERT` and `SELECT` at XID 3.\n\nOn the primary, PostgreSQL stores the transactions in progress in a\nshared memory segment. The process array (`procarray`) stores XID 1 with\nthe first connection, and the database also writes that information to\nthe `pg_xact` directory. XID 2 gets stored in the `pg_subtrans`\ndirectory, mapped to its parent, XID 1.\n\nIf a read happens on the primary, the snapshot generated contains `xmin`\nas 1, and `xmax` as 3. `txip` holds a list of transactions in progress,\nand `subxip` holds a list of subtransactions in progress.\n\nHowever, neither the `procarray` nor the snapshot are shared directly\nwith the replica. The replica receives all the data it needs from the\nwrite-ahead log (WAL).\n\nPlaying the WAL back one entry at time, the replica populates a shared data\nstructure called `KnownAssignedIds`. It contains all the transactions in\nprogress on the primary. Since this structure can only hold a limited number of\nIDs, a busy database with a lot of active subtransactions could easily fill\nthis buffer. PostgreSQL made a design choice to kick out all subXIDs from this\nlist and store them in the `pg_subtrans` directory.\n\nWhen a snapshot is generated on the replica, notice how `txip` is\nblank. A PostgreSQL replica treats **all** XIDs as though they are\nsubtransactions and throws them into the `subxip` bucket. That works\nbecause if a XID has a parent XID, then it's a subtransaction. Otherwise, it's a normal transaction. [The code comments\nexplain the rationale](https://gitlab.com/postgres/postgres/blob/9f540f840665936132dd30bd8e58e9a67e648f22/src/backend/storage/ipc/procarray.c#L1665-L1681).\n\nHowever, this means the snapshot is missing subXIDs, and that could be\nbad for MVCC. To deal with that, the [replica also updates `lastOverflowedXID`](https://gitlab.com/postgres/postgres/blob/9f540f840665936132dd30bd8e58e9a67e648f22/src/backend/storage/ipc/procarray.c#L3176-L3182):\n\n```c\n\n * When we throw away subXIDs from KnownAssignedXids, we need to keep track of\n * that, similarly to tracking overflow of a PGPROC's subxids array.  We do\n * that by remembering the lastOverflowedXID, ie the last thrown-away subXID.\n * As long as that is within the range of interesting XIDs, we have to assume\n * that subXIDs are missing from snapshots.  (Note that subXID overflow occurs\n * on primary when 65th subXID arrives, whereas on standby it occurs when 64th\n * subXID arrives - that is not an error.)\n\n```\n\nWhat is this \"range of interesting XIDs\"? We can see this in [the code below](https://gitlab.com/postgres/postgres/blob/4bf0bce161097869be5a56706b31388ba15e0113/src/backend/storage/ipc/procarray.c#L1702-L1703):\n\n```c\nif (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))\n    suboverflowed = true;\n\n```\n\nIf `lastOverflowedXid` is smaller than our snapshot's `xmin`, it means\nthat all subtransactions have completed, so we don't need to check for\nsubtransactions. However, in our example:\n\n1. `xmin` is 1 because of the transaction.\n2. `lastOverflowXid` is 2 because of the `SAVEPOINT`.\n\nThis means `suboverflowed` is set to `true` here, which tells PostgreSQL\nthat whenever a XID needs to be checked, check to see if it has a parent\nXID. Remember that this causes PostgreSQL to:\n\n1. Look up the subXID for the parent XID in the SLRU cache.\n1. If this doesn't exist in the cache, fetch the data from `pg_trans`.\n\nIn a busy system, the requested XIDs could span an ever-growing range of\nvalues, which could easily exhaust the 64K entries in the SLRU\ncache. This range will continue to grow as long as the transaction runs;\nthe rate of increase depends on how many updates are happening on the\nprmary. As soon as the transaction terminates, the `suboverflowed` state\ngets set to `false`.\n\nIn other words, we've replicated the same conditions as we saw with 64\nsubtransactions, only with a single `SAVEPOINT` and a long transaction.\n\n### What can we do about getting rid of Nessie?\n\nThere are three options:\n\n1. Eliminate `SAVEPOINT` calls completely.\n1. Eliminate all long-running transactions.\n1. Apply [Andrey Borodin's patches to PostgreSQL and increase the subtransaction cache](https://www.postgresql.org/message-id/flat/494C5E7F-E410-48FA-A93E-F7723D859561%40yandex-team.ru#18c79477bf7fc44a3ac3d1ce55e4c169).\n\nWe chose the first option because most uses of subtransaction could be\nremoved fairly easily. There were a [number of approaches](https://gitlab.com/groups/gitlab-org/-/epics/6540) we took:\n\n1. Perform updates outside of a subtransaction. Examples: [1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68471), [2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68690)\n1. Rewrite a query to use a `INSERT` or an `UPDATE` with an `ON CONFLICT` clause to deal with duplicate constraint violations. Examples: [1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68433), [2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69240), [3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68509)\n1. Live with a non-atomic `find_or_create_by`. We used this approach sparingly. Example: [1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68649)\n\nIn addition, we added [an alert whenever the application used a a single `SAVEPOINT`](https://gitlab.com/gitlab-com/runbooks/-/merge_requests/3881):\n\n![subtransaction alert](https://about.gitlab.com/images/blogimages/postgresql-subtransactions/subtransactions-alert-example.png)\n\nThis had the side benefit of flagging a [minor bug](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70889).\n\n#### Why not eliminate all long-running transactions?\n\nIn our database, it wasn't practical to eliminate all long-running\ntransactions because we think many of them happened via [database\nautovacuuming](https://www.postgresql.org/docs/current/runtime-config-autovacuum.html),\nbut [we're not able to reproduce this yet](https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/21#note_669698320).\nWe are working on partitioning the tables and sharding the database, but this is a much more time-consuming problem\nthan removing all subtransactions.\n\n#### What about the PostgreSQL patches?\n\nAlthough we tested Andrey's PostgreSQL patches, we did not feel comfortable\ndeviating from the official PostgreSQL releases. Plus, maintaining a\ncustom patched release over upgrades would add a significant maintenance\nburden for our infrastructure team. Our self-managed customers would\nalso not benefit unless they used a patched database.\n\nAndrey's patches do two main things:\n\n1. Allow administrators to change the SLRU size to any value.\n1. Adds an [associative cache](https://www.youtube.com/watch?v=A0vR-ks3hsQ).\nto make it performant to use a large cache value.\n\nRemember that the SLRU cache does a linear search for the desired\npage. That works fine when there are only 32 pages to search, but if you\nincrease the cache size to 100 MB the search becomes much more\nexpensive. The associative cache makes the lookup fast by indexing pages\nwith a bitmask and looking up the entry with offsets from the remaining\nbits. This mitigates the problem because a transaction would need to be\nseveral magnitudes longer to cause a problem.\n\nNikolay demonstrated that the `SAVEPOINT` problem disappeared as soon as\nwe increased the SLRU size to 100 MB with those patches. With a 100 MB\ncache, PostgreSQL can cache 26.2 million IDs (104857600/4), far more\nthan the measely 65K.\n\nThese [patches are currently awaiting review](https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful#ideas-for-postgresql-development),\nbut in our opinion they should be given high priority for PostgreSQL 15.\n\n### Conclusion\n\nSince removing all `SAVEPOINT` queries, we have not seen Nessie rear her\nhead again. If you are running PostgreSQL with read replicas, we\nstrongly recommend that you also remove *all* subtransactions until\nfurther notice.\n\nPostgreSQL is a fantastic database, and its well-commented code makes it\npossible to understand its limitations under different configurations.\n\nWe would like to thank the GitLab community for bearing with us while we\niron out this production issue.\n\nWe are also grateful for the support from [Nikolay\nSamokhvalov](https://gitlab.com/NikolayS) and [Catalin\nIrimie](https://gitlab.com/cat), who contributed to understanding where our\nLoch Ness Monster was hiding.\n\nCover image by [Khadi Ganiev](https://www.istockphoto.com/portfolio/Ganiev?mediatype=photography) on [iStock](https://istock.com), licensed under [standard license](https://www.istockphoto.com/legal/license-agreement)\n",[25,26,27],"performance","contributors","frontend","yml",{},true,"/en-us/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions",{"title":16,"description":17,"ogTitle":16,"ogDescription":17,"noIndex":13,"ogImage":21,"ogUrl":33,"ogSiteName":34,"ogType":35,"canonicalUrls":33},"https://about.gitlab.com/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions","https://about.gitlab.com","article","en-us/blog/why-we-spent-the-last-month-eliminating-postgresql-subtransactions",[25,26,27],"ZpJy2XtSnifKzsb9iCoDxZzbR0brJQ2PhftyxdA_4Fg",{"data":40},{"logo":41,"freeTrial":46,"sales":51,"login":56,"items":61,"search":369,"minimal":400,"duo":419,"switchNav":428,"pricingDeployment":439},{"config":42},{"href":43,"dataGaName":44,"dataGaLocation":45},"/","gitlab logo","header",{"text":47,"config":48},"Get free trial",{"href":49,"dataGaName":50,"dataGaLocation":45},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":52,"config":53},"Talk to sales",{"href":54,"dataGaName":55,"dataGaLocation":45},"/sales/","sales",{"text":57,"config":58},"Sign in",{"href":59,"dataGaName":60,"dataGaLocation":45},"https://gitlab.com/users/sign_in/","sign in",[62,89,184,189,290,350],{"text":63,"config":64,"cards":66},"Platform",{"dataNavLevelOne":65},"platform",[67,73,81],{"title":63,"description":68,"link":69},"The intelligent orchestration platform for DevSecOps",{"text":70,"config":71},"Explore our Platform",{"href":72,"dataGaName":65,"dataGaLocation":45},"/platform/",{"title":74,"description":75,"link":76},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":77,"config":78},"Meet GitLab Duo",{"href":79,"dataGaName":80,"dataGaLocation":45},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":82,"description":83,"link":84},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":85,"config":86},"Learn more",{"href":87,"dataGaName":88,"dataGaLocation":45},"/why-gitlab/","why gitlab",{"text":90,"left":30,"config":91,"link":93,"lists":97,"footer":166},"Product",{"dataNavLevelOne":92},"solutions",{"text":94,"config":95},"View all Solutions",{"href":96,"dataGaName":92,"dataGaLocation":45},"/solutions/",[98,122,145],{"title":99,"description":100,"link":101,"items":106},"Automation","CI/CD and automation to accelerate deployment",{"config":102},{"icon":103,"href":104,"dataGaName":105,"dataGaLocation":45},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[107,111,114,118],{"text":108,"config":109},"CI/CD",{"href":110,"dataGaLocation":45,"dataGaName":108},"/solutions/continuous-integration/",{"text":74,"config":112},{"href":79,"dataGaLocation":45,"dataGaName":113},"gitlab duo agent platform - product menu",{"text":115,"config":116},"Source Code Management",{"href":117,"dataGaLocation":45,"dataGaName":115},"/solutions/source-code-management/",{"text":119,"config":120},"Automated Software Delivery",{"href":104,"dataGaLocation":45,"dataGaName":121},"Automated software delivery",{"title":123,"description":124,"link":125,"items":130},"Security","Deliver code faster without compromising security",{"config":126},{"href":127,"dataGaName":128,"dataGaLocation":45,"icon":129},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[131,135,140],{"text":132,"config":133},"Application Security Testing",{"href":127,"dataGaName":134,"dataGaLocation":45},"Application security testing",{"text":136,"config":137},"Software Supply Chain Security",{"href":138,"dataGaLocation":45,"dataGaName":139},"/solutions/supply-chain/","Software supply chain security",{"text":141,"config":142},"Software Compliance",{"href":143,"dataGaName":144,"dataGaLocation":45},"/solutions/software-compliance/","software compliance",{"title":146,"link":147,"items":152},"Measurement",{"config":148},{"icon":149,"href":150,"dataGaName":151,"dataGaLocation":45},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[153,157,161],{"text":154,"config":155},"Visibility & Measurement",{"href":150,"dataGaLocation":45,"dataGaName":156},"Visibility and Measurement",{"text":158,"config":159},"Value Stream Management",{"href":160,"dataGaLocation":45,"dataGaName":158},"/solutions/value-stream-management/",{"text":162,"config":163},"Analytics & Insights",{"href":164,"dataGaLocation":45,"dataGaName":165},"/solutions/analytics-and-insights/","Analytics and insights",{"title":167,"items":168},"GitLab for",[169,174,179],{"text":170,"config":171},"Enterprise",{"href":172,"dataGaLocation":45,"dataGaName":173},"/enterprise/","enterprise",{"text":175,"config":176},"Small Business",{"href":177,"dataGaLocation":45,"dataGaName":178},"/small-business/","small business",{"text":180,"config":181},"Public Sector",{"href":182,"dataGaLocation":45,"dataGaName":183},"/solutions/public-sector/","public sector",{"text":185,"config":186},"Pricing",{"href":187,"dataGaName":188,"dataGaLocation":45,"dataNavLevelOne":188},"/pricing/","pricing",{"text":190,"config":191,"link":193,"lists":197,"feature":277},"Resources",{"dataNavLevelOne":192},"resources",{"text":194,"config":195},"View all resources",{"href":196,"dataGaName":192,"dataGaLocation":45},"/resources/",[198,231,249],{"title":199,"items":200},"Getting started",[201,206,211,216,221,226],{"text":202,"config":203},"Install",{"href":204,"dataGaName":205,"dataGaLocation":45},"/install/","install",{"text":207,"config":208},"Quick start guides",{"href":209,"dataGaName":210,"dataGaLocation":45},"/get-started/","quick setup checklists",{"text":212,"config":213},"Learn",{"href":214,"dataGaLocation":45,"dataGaName":215},"https://university.gitlab.com/","learn",{"text":217,"config":218},"Product documentation",{"href":219,"dataGaName":220,"dataGaLocation":45},"https://docs.gitlab.com/","product documentation",{"text":222,"config":223},"Best practice videos",{"href":224,"dataGaName":225,"dataGaLocation":45},"/getting-started-videos/","best practice videos",{"text":227,"config":228},"Integrations",{"href":229,"dataGaName":230,"dataGaLocation":45},"/integrations/","integrations",{"title":232,"items":233},"Discover",[234,239,244],{"text":235,"config":236},"Customer success stories",{"href":237,"dataGaName":238,"dataGaLocation":45},"/customers/","customer success stories",{"text":240,"config":241},"Blog",{"href":242,"dataGaName":243,"dataGaLocation":45},"/blog/","blog",{"text":245,"config":246},"Remote",{"href":247,"dataGaName":248,"dataGaLocation":45},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":250,"items":251},"Connect",[252,257,262,267,272],{"text":253,"config":254},"GitLab Services",{"href":255,"dataGaName":256,"dataGaLocation":45},"/services/","services",{"text":258,"config":259},"Community",{"href":260,"dataGaName":261,"dataGaLocation":45},"/community/","community",{"text":263,"config":264},"Forum",{"href":265,"dataGaName":266,"dataGaLocation":45},"https://forum.gitlab.com/","forum",{"text":268,"config":269},"Events",{"href":270,"dataGaName":271,"dataGaLocation":45},"/events/","events",{"text":273,"config":274},"Partners",{"href":275,"dataGaName":276,"dataGaLocation":45},"/partners/","partners",{"backgroundColor":278,"textColor":279,"text":280,"image":281,"link":285},"#2f2a6b","#fff","Insights for the future of software development",{"altText":282,"config":283},"the source promo card",{"src":284},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758208064/dzl0dbift9xdizyelkk4.svg",{"text":286,"config":287},"Read the latest",{"href":288,"dataGaName":289,"dataGaLocation":45},"/the-source/","the source",{"text":291,"config":292,"lists":294},"Company",{"dataNavLevelOne":293},"company",[295],{"items":296},[297,302,308,310,315,320,325,330,335,340,345],{"text":298,"config":299},"About",{"href":300,"dataGaName":301,"dataGaLocation":45},"/company/","about",{"text":303,"config":304,"footerGa":307},"Jobs",{"href":305,"dataGaName":306,"dataGaLocation":45},"/jobs/","jobs",{"dataGaName":306},{"text":268,"config":309},{"href":270,"dataGaName":271,"dataGaLocation":45},{"text":311,"config":312},"Leadership",{"href":313,"dataGaName":314,"dataGaLocation":45},"/company/team/e-group/","leadership",{"text":316,"config":317},"Team",{"href":318,"dataGaName":319,"dataGaLocation":45},"/company/team/","team",{"text":321,"config":322},"Handbook",{"href":323,"dataGaName":324,"dataGaLocation":45},"https://handbook.gitlab.com/","handbook",{"text":326,"config":327},"Investor relations",{"href":328,"dataGaName":329,"dataGaLocation":45},"https://ir.gitlab.com/","investor relations",{"text":331,"config":332},"Trust Center",{"href":333,"dataGaName":334,"dataGaLocation":45},"/security/","trust center",{"text":336,"config":337},"AI Transparency Center",{"href":338,"dataGaName":339,"dataGaLocation":45},"/ai-transparency-center/","ai transparency center",{"text":341,"config":342},"Newsletter",{"href":343,"dataGaName":344,"dataGaLocation":45},"/company/contact/#contact-forms","newsletter",{"text":346,"config":347},"Press",{"href":348,"dataGaName":349,"dataGaLocation":45},"/press/","press",{"text":351,"config":352,"lists":353},"Contact us",{"dataNavLevelOne":293},[354],{"items":355},[356,359,364],{"text":52,"config":357},{"href":54,"dataGaName":358,"dataGaLocation":45},"talk to sales",{"text":360,"config":361},"Support portal",{"href":362,"dataGaName":363,"dataGaLocation":45},"https://support.gitlab.com","support portal",{"text":365,"config":366},"Customer portal",{"href":367,"dataGaName":368,"dataGaLocation":45},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":370,"login":371,"suggestions":378},"Close",{"text":372,"link":373},"To search repositories and projects, login to",{"text":374,"config":375},"gitlab.com",{"href":59,"dataGaName":376,"dataGaLocation":377},"search login","search",{"text":379,"default":380},"Suggestions",[381,383,387,389,393,397],{"text":74,"config":382},{"href":79,"dataGaName":74,"dataGaLocation":377},{"text":384,"config":385},"Code Suggestions (AI)",{"href":386,"dataGaName":384,"dataGaLocation":377},"/solutions/code-suggestions/",{"text":108,"config":388},{"href":110,"dataGaName":108,"dataGaLocation":377},{"text":390,"config":391},"GitLab on AWS",{"href":392,"dataGaName":390,"dataGaLocation":377},"/partners/technology-partners/aws/",{"text":394,"config":395},"GitLab on Google Cloud",{"href":396,"dataGaName":394,"dataGaLocation":377},"/partners/technology-partners/google-cloud-platform/",{"text":398,"config":399},"Why GitLab?",{"href":87,"dataGaName":398,"dataGaLocation":377},{"freeTrial":401,"mobileIcon":406,"desktopIcon":411,"secondaryButton":414},{"text":402,"config":403},"Start free trial",{"href":404,"dataGaName":50,"dataGaLocation":405},"https://gitlab.com/-/trials/new/","nav",{"altText":407,"config":408},"Gitlab Icon",{"src":409,"dataGaName":410,"dataGaLocation":405},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":407,"config":412},{"src":413,"dataGaName":410,"dataGaLocation":405},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":415,"config":416},"Get Started",{"href":417,"dataGaName":418,"dataGaLocation":405},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":420,"mobileIcon":424,"desktopIcon":426},{"text":421,"config":422},"Learn more about GitLab Duo",{"href":79,"dataGaName":423,"dataGaLocation":405},"gitlab duo",{"altText":407,"config":425},{"src":409,"dataGaName":410,"dataGaLocation":405},{"altText":407,"config":427},{"src":413,"dataGaName":410,"dataGaLocation":405},{"button":429,"mobileIcon":434,"desktopIcon":436},{"text":430,"config":431},"/switch",{"href":432,"dataGaName":433,"dataGaLocation":405},"#contact","switch",{"altText":407,"config":435},{"src":409,"dataGaName":410,"dataGaLocation":405},{"altText":407,"config":437},{"src":438,"dataGaName":410,"dataGaLocation":405},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":440,"mobileIcon":445,"desktopIcon":447},{"text":441,"config":442},"Back to pricing",{"href":187,"dataGaName":443,"dataGaLocation":405,"icon":444},"back to pricing","GoBack",{"altText":407,"config":446},{"src":409,"dataGaName":410,"dataGaLocation":405},{"altText":407,"config":448},{"src":413,"dataGaName":410,"dataGaLocation":405},{"title":450,"button":451,"config":456},"See how agentic AI transforms software delivery",{"text":452,"config":453},"Watch GitLab Transcend now",{"href":454,"dataGaName":455,"dataGaLocation":45},"/events/transcend/virtual/","transcend event",{"layout":457,"icon":458,"disabled":30},"release","AiStar",{"data":460},{"text":461,"source":462,"edit":468,"contribute":473,"config":478,"items":483,"minimal":690},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":463,"config":464},"View page source",{"href":465,"dataGaName":466,"dataGaLocation":467},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":469,"config":470},"Edit this page",{"href":471,"dataGaName":472,"dataGaLocation":467},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":474,"config":475},"Please contribute",{"href":476,"dataGaName":477,"dataGaLocation":467},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":479,"facebook":480,"youtube":481,"linkedin":482},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[484,531,585,629,656],{"title":185,"links":485,"subMenu":500},[486,490,495],{"text":487,"config":488},"View plans",{"href":187,"dataGaName":489,"dataGaLocation":467},"view plans",{"text":491,"config":492},"Why Premium?",{"href":493,"dataGaName":494,"dataGaLocation":467},"/pricing/premium/","why premium",{"text":496,"config":497},"Why Ultimate?",{"href":498,"dataGaName":499,"dataGaLocation":467},"/pricing/ultimate/","why ultimate",[501],{"title":502,"links":503},"Contact Us",[504,507,509,511,516,521,526],{"text":505,"config":506},"Contact sales",{"href":54,"dataGaName":55,"dataGaLocation":467},{"text":360,"config":508},{"href":362,"dataGaName":363,"dataGaLocation":467},{"text":365,"config":510},{"href":367,"dataGaName":368,"dataGaLocation":467},{"text":512,"config":513},"Status",{"href":514,"dataGaName":515,"dataGaLocation":467},"https://status.gitlab.com/","status",{"text":517,"config":518},"Terms of use",{"href":519,"dataGaName":520,"dataGaLocation":467},"/terms/","terms of use",{"text":522,"config":523},"Privacy statement",{"href":524,"dataGaName":525,"dataGaLocation":467},"/privacy/","privacy statement",{"text":527,"config":528},"Cookie preferences",{"dataGaName":529,"dataGaLocation":467,"id":530,"isOneTrustButton":30},"cookie preferences","ot-sdk-btn",{"title":90,"links":532,"subMenu":541},[533,537],{"text":534,"config":535},"DevSecOps platform",{"href":72,"dataGaName":536,"dataGaLocation":467},"devsecops platform",{"text":538,"config":539},"AI-Assisted Development",{"href":79,"dataGaName":540,"dataGaLocation":467},"ai-assisted development",[542],{"title":543,"links":544},"Topics",[545,550,555,560,565,570,575,580],{"text":546,"config":547},"CICD",{"href":548,"dataGaName":549,"dataGaLocation":467},"/topics/ci-cd/","cicd",{"text":551,"config":552},"GitOps",{"href":553,"dataGaName":554,"dataGaLocation":467},"/topics/gitops/","gitops",{"text":556,"config":557},"DevOps",{"href":558,"dataGaName":559,"dataGaLocation":467},"/topics/devops/","devops",{"text":561,"config":562},"Version Control",{"href":563,"dataGaName":564,"dataGaLocation":467},"/topics/version-control/","version control",{"text":566,"config":567},"DevSecOps",{"href":568,"dataGaName":569,"dataGaLocation":467},"/topics/devsecops/","devsecops",{"text":571,"config":572},"Cloud Native",{"href":573,"dataGaName":574,"dataGaLocation":467},"/topics/cloud-native/","cloud native",{"text":576,"config":577},"AI for Coding",{"href":578,"dataGaName":579,"dataGaLocation":467},"/topics/devops/ai-for-coding/","ai for coding",{"text":581,"config":582},"Agentic AI",{"href":583,"dataGaName":584,"dataGaLocation":467},"/topics/agentic-ai/","agentic ai",{"title":586,"links":587},"Solutions",[588,590,592,597,601,604,608,611,613,616,619,624],{"text":132,"config":589},{"href":127,"dataGaName":132,"dataGaLocation":467},{"text":121,"config":591},{"href":104,"dataGaName":105,"dataGaLocation":467},{"text":593,"config":594},"Agile development",{"href":595,"dataGaName":596,"dataGaLocation":467},"/solutions/agile-delivery/","agile delivery",{"text":598,"config":599},"SCM",{"href":117,"dataGaName":600,"dataGaLocation":467},"source code management",{"text":546,"config":602},{"href":110,"dataGaName":603,"dataGaLocation":467},"continuous integration & delivery",{"text":605,"config":606},"Value stream management",{"href":160,"dataGaName":607,"dataGaLocation":467},"value stream management",{"text":551,"config":609},{"href":610,"dataGaName":554,"dataGaLocation":467},"/solutions/gitops/",{"text":170,"config":612},{"href":172,"dataGaName":173,"dataGaLocation":467},{"text":614,"config":615},"Small business",{"href":177,"dataGaName":178,"dataGaLocation":467},{"text":617,"config":618},"Public sector",{"href":182,"dataGaName":183,"dataGaLocation":467},{"text":620,"config":621},"Education",{"href":622,"dataGaName":623,"dataGaLocation":467},"/solutions/education/","education",{"text":625,"config":626},"Financial services",{"href":627,"dataGaName":628,"dataGaLocation":467},"/solutions/finance/","financial services",{"title":190,"links":630},[631,633,635,637,640,642,644,646,648,650,652,654],{"text":202,"config":632},{"href":204,"dataGaName":205,"dataGaLocation":467},{"text":207,"config":634},{"href":209,"dataGaName":210,"dataGaLocation":467},{"text":212,"config":636},{"href":214,"dataGaName":215,"dataGaLocation":467},{"text":217,"config":638},{"href":219,"dataGaName":639,"dataGaLocation":467},"docs",{"text":240,"config":641},{"href":242,"dataGaName":243,"dataGaLocation":467},{"text":235,"config":643},{"href":237,"dataGaName":238,"dataGaLocation":467},{"text":245,"config":645},{"href":247,"dataGaName":248,"dataGaLocation":467},{"text":253,"config":647},{"href":255,"dataGaName":256,"dataGaLocation":467},{"text":258,"config":649},{"href":260,"dataGaName":261,"dataGaLocation":467},{"text":263,"config":651},{"href":265,"dataGaName":266,"dataGaLocation":467},{"text":268,"config":653},{"href":270,"dataGaName":271,"dataGaLocation":467},{"text":273,"config":655},{"href":275,"dataGaName":276,"dataGaLocation":467},{"title":291,"links":657},[658,660,662,664,666,668,670,674,679,681,683,685],{"text":298,"config":659},{"href":300,"dataGaName":293,"dataGaLocation":467},{"text":303,"config":661},{"href":305,"dataGaName":306,"dataGaLocation":467},{"text":311,"config":663},{"href":313,"dataGaName":314,"dataGaLocation":467},{"text":316,"config":665},{"href":318,"dataGaName":319,"dataGaLocation":467},{"text":321,"config":667},{"href":323,"dataGaName":324,"dataGaLocation":467},{"text":326,"config":669},{"href":328,"dataGaName":329,"dataGaLocation":467},{"text":671,"config":672},"Sustainability",{"href":673,"dataGaName":671,"dataGaLocation":467},"/sustainability/",{"text":675,"config":676},"Diversity, inclusion and belonging (DIB)",{"href":677,"dataGaName":678,"dataGaLocation":467},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":331,"config":680},{"href":333,"dataGaName":334,"dataGaLocation":467},{"text":341,"config":682},{"href":343,"dataGaName":344,"dataGaLocation":467},{"text":346,"config":684},{"href":348,"dataGaName":349,"dataGaLocation":467},{"text":686,"config":687},"Modern Slavery Transparency Statement",{"href":688,"dataGaName":689,"dataGaLocation":467},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":691},[692,695,698],{"text":693,"config":694},"Terms",{"href":519,"dataGaName":520,"dataGaLocation":467},{"text":696,"config":697},"Cookies",{"dataGaName":529,"dataGaLocation":467,"id":530,"isOneTrustButton":30},{"text":699,"config":700},"Privacy",{"href":524,"dataGaName":525,"dataGaLocation":467},[702,715],{"id":703,"title":19,"body":9,"config":704,"content":706,"description":9,"extension":28,"meta":710,"navigation":30,"path":711,"seo":712,"stem":713,"__hash__":714},"blogAuthors/en-us/blog/authors/grzegorz-bizon.yml",{"template":705},"BlogAuthor",{"name":19,"config":707},{"headshot":708,"ctfId":709},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749659488/Blog/Author%20Headshots/gitlab-logo-extra-whitespace.png","Grzegorz-Bizon",{},"/en-us/blog/authors/grzegorz-bizon",{},"en-us/blog/authors/grzegorz-bizon","Z1aKSWxubaVpieVMM-H8afTapWLnKt0xl0AyT32DmJE",{"id":716,"title":20,"body":9,"config":717,"content":718,"description":9,"extension":28,"meta":722,"navigation":30,"path":723,"seo":724,"stem":725,"__hash__":726},"blogAuthors/en-us/blog/authors/stan-hu.yml",{"template":705},{"name":20,"config":719},{"headshot":720,"ctfId":721},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1749659504/Blog/Author%20Headshots/stanhu-headshot.jpg","stanhu",{},"/en-us/blog/authors/stan-hu",{},"en-us/blog/authors/stan-hu","KmQVCb_7YcWghHApaS2EI3J2bQ0dRustgOz4wYyOnVk",[728,743,756],{"content":729,"config":741},{"body":730,"title":731,"description":732,"authors":733,"heroImage":735,"date":736,"category":10,"tags":737},"Most CI/CD tools can run a build and ship a deployment. Where they diverge is what happens when your delivery needs get real: a monorepo with a dozen services, microservices spread across multiple repositories, deployments to dozens of environments, or a platform team trying to enforce standards without becoming a bottleneck.\n  \nGitLab's pipeline execution model was designed for that complexity. Parent-child pipelines, DAG execution, dynamic pipeline generation, multi-project triggers, merge request pipelines with merged results, and CI/CD Components each solve a distinct class of problems. Because they compose, understanding the full model unlocks something more than a faster pipeline. In this article, you'll learn about the five patterns where that model stands out, each mapped to a real engineering scenario with the configuration to match.\n  \nThe configs below are illustrative. The scripts use echo commands to keep the signal-to-noise ratio low. Swap them out for your actual build, test, and deploy steps and they are ready to use.\n\n\n## 1. Monorepos: Parent-child pipelines + DAG execution\n\n\nThe problem: Your monorepo has a frontend, a backend, and a docs site. Every commit triggers a full rebuild of everything, even when only a README changed.\n\n\nGitLab solves this with two complementary features: [parent-child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#parent-child-pipelines) (which let a top-level pipeline spawn isolated sub-pipelines) and [DAG execution via `needs`](https://docs.gitlab.com/ci/yaml/#needs) (which breaks rigid stage-by-stage ordering and lets jobs start the moment their dependencies finish).\n\n\nA parent pipeline detects what changed and triggers only the relevant child pipelines:\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - trigger\n\ntrigger-services:\n  stage: trigger\n  trigger:\n    include:\n      - local: '.gitlab/ci/api-service.yml'\n      - local: '.gitlab/ci/web-service.yml'\n      - local: '.gitlab/ci/worker-service.yml'\n    strategy: depend\n```\n\n\nEach child pipeline is a fully independent pipeline with its own stages, jobs, and artifacts. The parent waits for all of them via [strategy: depend](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#wait-for-downstream-pipeline-to-complete) so you get a single green/red signal at the top level, with full drill-down into each service's pipeline. This organizational separation is the bigger win for large teams: each service owns its pipeline config, changes in one cannot break another, and the complexity stays manageable as the repo grows.\n\n\nOne thing worth knowing: when you pass [multiple files to a single `trigger: include:`](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#combine-multiple-child-pipeline-configuration-files), GitLab merges them into a single child pipeline configuration. This means jobs defined across those files share the same pipeline context and can reference each other with `needs:`, which is what makes the DAG optimization possible. If you split them into separate trigger jobs instead, each would be its own isolated pipeline and cross-file `needs:` references would not work.\n\n\nCombine this with `needs:` inside each child pipeline and you get DAG execution. Your integration tests can start the moment the build finishes, without waiting for other jobs in the same stage.\n\n```yaml\n# .gitlab/ci/api-service.yml\nstages:\n  - build\n  - test\n\nbuild-api:\n  stage: build\n  script:\n    - echo \"Building API service\"\n\ntest-api:\n  stage: test\n  needs: [build-api]\n  script:\n    - echo \"Running API tests\"\n```\n\n\nWhy it matters: Teams with large monorepos typically report significant reductions in pipeline runtime after switching to DAG execution, since jobs no longer wait on unrelated work in the same stage. Parent-child pipelines add the organizational layer that keeps the configuration maintainable as the repo and team grow.\n\n![Local downstream pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738759/Blog/Imported/hackathon-fake-blog-post-s/image3_vwj3rz.png \"Local downstream pipelines\")\n\n## 2. Microservices: Cross-repo, multi-project pipelines\n\n\nThe problem: Your frontend lives in one repo, your backend in another. When the frontend team ships a change, they have no visibility into whether it broke the backend integration and vice versa.\n\n\nGitLab's [multi-project pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#multi-project-pipelines) let one project trigger a pipeline in a completely separate project and wait for the result. The triggering project gets a linked downstream pipeline right in its own pipeline view.\n\n\nThe frontend pipeline builds an API contract artifact and publishes it, then triggers the backend pipeline. The backend fetches that artifact directly using the [Jobs API](https://docs.gitlab.com/ee/api/jobs.html#download-a-single-artifact-file-from-specific-tag-or-branch) and validates it before allowing anything to proceed. If a breaking change is detected, the backend pipeline fails and the frontend pipeline fails with it.\n\n```yaml\n# frontend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n  - trigger-backend\n\nbuild-frontend:\n  stage: build\n  script:\n    - echo \"Building frontend and generating API contract...\"\n    - mkdir -p dist\n    - |\n      echo '{\n        \"api_version\": \"v2\",\n        \"breaking_changes\": false\n      }' > dist/api-contract.json\n    - cat dist/api-contract.json\n  artifacts:\n    paths:\n      - dist/api-contract.json\n    expire_in: 1 hour\n\ntest-frontend:\n  stage: test\n  script:\n    - echo \"All frontend tests passed!\"\n\ntrigger-backend-pipeline:\n  stage: trigger-backend\n  trigger:\n    project: my-org/backend-service\n    branch: main\n    strategy: depend\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n```\n\n```yaml\n# backend repo: .gitlab-ci.yml\nstages:\n  - build\n  - test\n\nbuild-backend:\n  stage: build\n  script:\n    - echo \"All backend tests passed!\"\n\nintegration-test:\n  stage: test\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"pipeline\"\n  script:\n    - echo \"Fetching API contract from frontend...\"\n    - |\n      curl --silent --fail \\\n        --header \"JOB-TOKEN: $CI_JOB_TOKEN\" \\\n        --output api-contract.json \\\n        \"${CI_API_V4_URL}/projects/${FRONTEND_PROJECT_ID}/jobs/artifacts/main/raw/dist/api-contract.json?job=build-frontend\"\n    - cat api-contract.json\n    - |\n      if grep -q '\"breaking_changes\": true' api-contract.json; then\n        echo \"FAIL: Breaking API changes detected - backend integration blocked!\"\n        exit 1\n      fi\n      echo \"PASS: API contract is compatible!\"\n```\n\n\nA few things worth noting in this config. The `integration-test` job uses `$CI_PIPELINE_SOURCE == \"pipeline\"` to ensure it only runs when triggered by an upstream pipeline, not on a standalone push to the backend repo. The frontend project ID is referenced via `$FRONTEND_PROJECT_ID`, which should be set as a [CI/CD variable](https://docs.gitlab.com/ci/variables/) in the backend project settings to avoid hardcoding it.\n\n\nWhy it matters: Cross-service breakage that previously surfaced in production gets caught in the pipeline instead. The dependency between services stops being invisible and becomes something teams can see, track, and act on.\n\n\n![Cross-project pipelines](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738762/Blog/Imported/hackathon-fake-blog-post-s/image4_h6mfsb.png \"Cross-project pipelines\")\n\n\n## 3. Multi-tenant / matrix deployments: Dynamic child pipelines\n\n\nThe problem: You deploy the same application to 15 customer environments, or three cloud regions, or dev/staging/prod. Updating a deploy stage across all of them one by one is the kind of work that leads to configuration drift. Writing a separate pipeline for each environment is unmaintainable from day one.\n\n\nGitLab's [dynamic child pipelines](https://docs.gitlab.com/ci/pipelines/downstream_pipelines/#dynamic-child-pipelines) let you generate a pipeline at runtime. A job runs a script that produces a YAML file, and that YAML becomes the pipeline for the next stage. The pipeline structure itself becomes data.\n\n\n```yaml\n# .gitlab-ci.yml\nstages:\n  - generate\n  - trigger-environments\n\ngenerate-config:\n  stage: generate\n  script:\n    - |\n      # ENVIRONMENTS can be passed as a CI variable or read from a config file.\n      # Default to dev, staging, prod if not set.\n      ENVIRONMENTS=${ENVIRONMENTS:-\"dev staging prod\"}\n      for ENV in $ENVIRONMENTS; do\n        cat > ${ENV}-pipeline.yml \u003C\u003C EOF\n      stages:\n        - deploy\n        - verify\n      deploy-${ENV}:\n        stage: deploy\n        script:\n          - echo \"Deploying to ${ENV} environment\"\n      verify-${ENV}:\n        stage: verify\n        script:\n          - echo \"Running smoke tests on ${ENV}\"\n      EOF\n      done\n  artifacts:\n    paths:\n      - \"*.yml\"\n    exclude:\n      - \".gitlab-ci.yml\"\n\n.trigger-template:\n  stage: trigger-environments\n  trigger:\n    strategy: depend\n\ntrigger-dev:\n  extends: .trigger-template\n  trigger:\n    include:\n      - artifact: dev-pipeline.yml\n        job: generate-config\n\ntrigger-staging:\n  extends: .trigger-template\n  needs: [trigger-dev]\n  trigger:\n    include:\n      - artifact: staging-pipeline.yml\n        job: generate-config\n\ntrigger-prod:\n  extends: .trigger-template\n  needs: [trigger-staging]\n  trigger:\n    include:\n      - artifact: prod-pipeline.yml\n        job: generate-config\n  when: manual\n```\n\n\nThe generation script loops over an `ENVIRONMENTS` variable rather than hardcoding each environment separately. Pass in a different list via a CI variable or read it from a config file and the pipeline adapts without touching the YAML. The trigger jobs use [extends:](https://docs.gitlab.com/ci/yaml/#extends) to inherit shared configuration from `.trigger-template`, so `strategy: depend` is defined once rather than repeated on every trigger job. Add a new environment by updating the variable, not by duplicating pipeline config. Add [when: manual](https://docs.gitlab.com/ci/yaml/#when) to the production trigger and you get a promotion gate baked right into the pipeline graph.\n\n\nWhy it matters: SaaS companies and platform teams use this pattern to manage dozens of environments without duplicating pipeline logic. The pipeline structure itself stays lean as the deployment matrix grows.\n\n\n![Dynamic pipeline](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738765/Blog/Imported/hackathon-fake-blog-post-s/image7_wr0kx2.png \"Dynamic pipeline\")\n\n\n## 4. MR-first delivery: Merge request pipelines, merged results, and workflow routing\n\n\nThe problem: Your pipeline runs on every push to every branch. Expensive tests run on feature branches that will never merge. Meanwhile, you have no guarantee that what you tested is actually what will land on `main` after a merge.\n\n\nGitLab has three interlocking features that solve this together:\n\n\n*   [Merge request pipelines](https://docs.gitlab.com/ci/pipelines/merge_request_pipelines/) run only when a merge request exists, not on every branch push. This alone eliminates a significant amount of wasted compute.\n\n*   [Merged results pipelines](https://docs.gitlab.com/ci/pipelines/merged_results_pipelines/) go further. GitLab creates a temporary merge commit (your branch plus the current target branch) and runs the pipeline against that. You are testing what will actually exist after the merge, not just your branch in isolation.\n\n*   [Workflow rules](https://docs.gitlab.com/ci/yaml/workflow/) let you define exactly which pipeline type runs under which conditions and suppress everything else. The `$CI_OPEN_MERGE_REQUESTS` guard below prevents duplicate pipelines firing for both a branch and its open MR simultaneously.\n\n\nWith those three working together, here is what a tiered pipeline looks like:\n\n```yaml\n# .gitlab-ci.yml\nworkflow:\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH && $CI_OPEN_MERGE_REQUESTS\n      when: never\n    - if: $CI_COMMIT_BRANCH\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\nstages:\n  - fast-checks\n  - expensive-tests\n  - deploy\n\nlint-code:\n  stage: fast-checks\n  script:\n    - echo \"Running linter\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nunit-tests:\n  stage: fast-checks\n  script:\n    - echo \"Running unit tests\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"push\"\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nintegration-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running integration tests (15 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\ne2e-tests:\n  stage: expensive-tests\n  script:\n    - echo \"Running E2E tests (30 min)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"merge_request_event\"\n    - if: $CI_COMMIT_BRANCH == \"main\"\n\nnightly-comprehensive-scan:\n  stage: expensive-tests\n  script:\n    - echo \"Running full nightly suite (2 hours)\"\n  rules:\n    - if: $CI_PIPELINE_SOURCE == \"schedule\"\n\ndeploy-production:\n  stage: deploy\n  script:\n    - echo \"Deploying to production\"\n  rules:\n    - if: $CI_COMMIT_BRANCH == \"main\"\n      when: manual\n```\n\nWith this setup, the pipeline behaves differently depending on context. A push to a feature branch with no open MR runs lint and unit tests only. Once an MR is opened, the workflow rules switch from a branch pipeline to an MR pipeline, and the full integration and E2E suite runs against the merged result. Merging to `main` queues a manual production deployment. A nightly schedule runs the comprehensive scan once, not on every commit.\n\n\nWhy it matters: Teams routinely cut CI costs significantly with this pattern, not by running fewer tests, but by running the right tests at the right time. Merged results pipelines catch the class of bugs that only appear after a merge, before they ever reach `main`.\n\n\n![Conditional pipelines (within a branch with no MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738768/Blog/Imported/hackathon-fake-blog-post-s/image6_dnfcny.png \"Conditional pipelines (within a branch with no MR)\")\n\n\n\n![Conditional pipelines (within an MR)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738772/Blog/Imported/hackathon-fake-blog-post-s/image1_wyiafu.png \"Conditional pipelines (within an MR)\")\n\n\n\n![Conditional pipelines (on the main branch)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738774/Blog/Imported/hackathon-fake-blog-post-s/image5_r6lkfd.png \"Conditional pipelines (on the main branch)\")\n\n## 5. Governed pipelines: CI/CD Components\n\n\nThe problem: Your platform team has defined the right way to build, test, and deploy. But every team has their own `.gitlab-ci.yml` with subtle variations. Security scanning gets skipped. Deployment standards drift. Audits are painful.\n\n\nGitLab [CI/CD Components](https://docs.gitlab.com/ci/components/) let platform teams publish versioned, reusable pipeline building blocks. Application teams consume them with a single `include:` line and optional inputs — no copy-paste, no drift. Components are discoverable through the [CI/CD Catalog](https://docs.gitlab.com/ci/components/#cicd-catalog), which means teams can find and adopt approved building blocks without needing to go through the platform team directly.\n\n\nHere is a component definition from a shared library:\n\n```yaml\n# templates/deploy.yml\nspec:\n  inputs:\n    stage:\n      default: deploy\n    environment:\n      default: production\n---\ndeploy-job:\n  stage: $[[ inputs.stage ]]\n  script:\n    - echo \"Deploying $APP_NAME to $[[ inputs.environment ]]\"\n    - echo \"Deploy URL: $DEPLOY_URL\"\n  environment:\n    name: $[[ inputs.environment ]]\n```\nAnd here is how an application team consumes it:\n\n```yaml\n# Application repo: .gitlab-ci.yml\nvariables:\n  APP_NAME: \"my-awesome-app\"\n  DEPLOY_URL: \"https://api.example.com\"\n\ninclude:\n  - component: gitlab.com/my-org/component-library/build@v1.0.6\n  - component: gitlab.com/my-org/component-library/test@v1.0.6\n  - component: gitlab.com/my-org/component-library/deploy@v1.0.6\n    inputs:\n      environment: staging\n\nstages:\n  - build\n  - test\n  - deploy\n```\n\nThree lines of `include:` replace hundreds of lines of duplicated YAML. The platform team can push a security fix to `v1.0.7` and teams opt in on their own schedule — or the platform team can pin everyone to a minimum version. Either way, one change propagates everywhere instead of needing to be applied repo by repo.\n\n\nPair this with [resource groups](https://docs.gitlab.com/ci/resource_groups/) to prevent concurrent deployments to the same environment, and [protected environments](https://docs.gitlab.com/ci/environments/protected_environments/) to enforce approval gates - and you have a governed delivery platform where compliance is the default, not the exception.\n\n\nWhy it matters: This is the pattern that makes GitLab CI/CD scale across hundreds of teams. Platform engineering teams enforce compliance without becoming a bottleneck. Application teams get a fast path to a working pipeline without reinventing the wheel.\n\n\n![Component pipeline (imported jobs)](https://res.cloudinary.com/about-gitlab-com/image/upload/v1775738776/Blog/Imported/hackathon-fake-blog-post-s/image2_pizuxd.png \"Component pipeline (imported jobs)\")\n\n## Putting it all together\n\nNone of these features exist in isolation. The reason GitLab's pipeline model is worth understanding deeply is that these primitives compose:\n\n*   A monorepo uses parent-child pipelines, and each child uses DAG execution\n\n*   A microservices platform uses multi-project pipelines, and each project uses MR pipelines with merged results\n\n*   A governed platform uses CI/CD components to standardize the patterns above across every team\n\n\nMost teams discover one of these features when they hit a specific pain point. The ones who invest in understanding the full model end up with a delivery system that actually reflects how their engineering organization works, not a pipeline that fights it.\n\n## Other patterns worth exploring\n\n\nThe five patterns above cover the most common structural pain points, but GitLab's pipeline model goes further. A few others worth looking into as your needs grow:\n\n\n*   [Review apps with dynamic environments](https://docs.gitlab.com/ci/environments/) let you spin up a live preview for every feature branch and tear it down automatically when the MR closes. Useful for teams doing frontend work or API changes that need stakeholder sign-off before merging.\n\n*   [Caching and artifact strategies](https://docs.gitlab.com/ci/caching/) are often the fastest way to cut pipeline runtime after the structural work is done. Structuring `cache:` keys around dependency lockfiles and being deliberate about what gets passed between jobs with [artifacts:](https://docs.gitlab.com/ci/yaml/#artifacts) can make a significant difference without changing your pipeline shape at all.\n\n*   [Scheduled and API-triggered pipelines](https://docs.gitlab.com/ci/pipelines/schedules/) are worth knowing about because not everything should run on a code push. Nightly security scans, compliance reports, and release automation are better modeled as scheduled or [API-triggered](https://docs.gitlab.com/ci/triggers/) pipelines with `$CI_PIPELINE_SOURCE` routing the right jobs for each context.\n\n## How to get started\n\nModern software delivery is complex. Teams are managing monorepos with dozens of services, coordinating across multiple repositories, deploying to many environments at once, and trying to keep standards consistent as organizations grow. GitLab's pipeline model was built with all of that in mind.\n\nWhat makes it worth investing time in is how well the pieces fit together. Parent-child pipelines bring structure to large codebases. Multi-project pipelines make cross-team dependencies visible and testable. Dynamic pipelines turn environment management into something that scales gracefully. MR-first delivery with merged results ensures confidence at every step of the review process. And CI/CD Components give platform teams a way to share best practices across an entire organization without becoming a bottleneck.\n\nEach of these features is powerful on its own, and even more so when combined. GitLab gives you the building blocks to design a delivery system that fits how your team actually works, and grows with you as your needs evolve.\n\n> [Start a free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/) to use pipeline logic today.\n\n## Read more\n\n*   [Variable and artifact sharing in GitLab parent-child pipelines](https://about.gitlab.com/blog/variable-and-artifact-sharing-in-gitlab-parent-child-pipelines/)\n*   [CI/CD inputs: Secure and preferred method to pass parameters to a pipeline](https://about.gitlab.com/blog/ci-cd-inputs-secure-and-preferred-method-to-pass-parameters-to-a-pipeline/)\n*   [Tutorial: How to set up your first GitLab CI/CD component](https://about.gitlab.com/blog/tutorial-how-to-set-up-your-first-gitlab-ci-cd-component/)\n*   [How to include file references in your CI/CD components](https://about.gitlab.com/blog/how-to-include-file-references-in-your-ci-cd-components/)\n*   [FAQ: GitLab CI/CD Catalog](https://about.gitlab.com/blog/faq-gitlab-ci-cd-catalog/)\n*   [Building a GitLab CI/CD pipeline for a monorepo the easy way](https://about.gitlab.com/blog/building-a-gitlab-ci-cd-pipeline-for-a-monorepo-the-easy-way/)\n*   [A CI/CD component builder's journey](https://about.gitlab.com/blog/a-ci-component-builders-journey/)\n*   [CI/CD Catalog goes GA: No more building pipelines from scratch](https://about.gitlab.com/blog/ci-cd-catalog-goes-ga-no-more-building-pipelines-from-scratch/)","5 ways GitLab pipeline logic solves real engineering problems","Learn how to scale CI/CD with composable patterns for monorepos, microservices, environments, and governance.",[734],"Omid Khan","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772721753/frfsm1qfscwrmsyzj1qn.png","2026-04-09",[108,738,739,740],"DevOps platform","tutorial","features",{"featured":30,"template":14,"slug":742},"5-ways-gitlab-pipeline-logic-solves-real-engineering-problems",{"content":744,"config":754},{"title":745,"description":746,"authors":747,"heroImage":749,"date":750,"body":751,"category":10,"tags":752},"How to use GitLab Container Virtual Registry with Docker Hardened Images","Learn how to simplify container image management with this step-by-step guide.",[748],"Tim Rizzi","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772111172/mwhgbjawn62kymfwrhle.png","2026-03-12","If you're a platform engineer, you've probably had this conversation:\n  \n*\"Security says we need to use hardened base images.\"*\n\n*\"Great, where do I configure credentials for yet another registry?\"*\n\n*\"Also, how do we make sure everyone actually uses them?\"*\n\nOr this one:\n\n*\"Why are our builds so slow?\"*\n\n*\"We're pulling the same 500MB image from Docker Hub in every single job.\"*\n\n*\"Can't we just cache these somewhere?\"*\n\nI've been working on [Container Virtual Registry](https://docs.gitlab.com/user/packages/virtual_registry/container/) at GitLab specifically to solve these problems. It's a pull-through cache that sits in front of your upstream registries — Docker Hub, dhi.io (Docker Hardened Images), MCR, and Quay — and gives your teams a single endpoint to pull from. Images get cached on the first pull. Subsequent pulls come from the cache. Your developers don't need to know or care which upstream a particular image came from.\n\nThis article shows you how to set up Container Virtual Registry, specifically with Docker Hardened Images in mind, since that's a combination that makes a lot of sense for teams concerned about security and not making their developers' lives harder.\n\n## What problem are we actually solving?\n\nThe Platform teams I usually talk to manage container images across three to five registries:\n\n* **Docker Hub** for most base images\n* **dhi.io** for Docker Hardened Images (security-conscious workloads)\n* **MCR** for .NET and Azure tooling\n* **Quay.io** for Red Hat ecosystem stuff\n* **Internal registries** for proprietary images\n\nEach one has its own:\n\n* Authentication mechanism\n* Network latency characteristics\n* Way of organizing image paths\n\nYour CI/CD configs end up littered with registry-specific logic. Credential management becomes a project unto itself. And every pipeline job pulls the same base images over the network, even though they haven't changed in weeks.\n\nContainer Virtual Registry consolidates this. One registry URL. One authentication flow (GitLab's). Cached images are served from GitLab's infrastructure rather than traversing the internet each time.\n\n## How it works\n\nThe model is straightforward:\n\n```text\nYour pipeline pulls:\n  gitlab.com/virtual_registries/container/1000016/python:3.13\n\nVirtual registry checks:\n  1. Do I have this cached? → Return it\n  2. No? → Fetch from upstream, cache it, return it\n\n```\n\nYou configure upstreams in priority order. When a pull request comes in, the virtual registry checks each upstream until it finds the image. The result gets cached for a configurable period (default 24 hours).\n\n```text\n┌─────────────────────────────────────────────────────────┐\n│                    CI/CD Pipeline                       │\n│                          │                              │\n│                          ▼                              │\n│   gitlab.com/virtual_registries/container/\u003Cid>/image   │\n└─────────────────────────────────────────────────────────┘\n                           │\n                           ▼\n┌─────────────────────────────────────────────────────────┐\n│            Container Virtual Registry                   │\n│                                                         │\n│  Upstream 1: Docker Hub ────────────────┐               │\n│  Upstream 2: dhi.io (Hardened) ────────┐│               │\n│  Upstream 3: MCR ─────────────────────┐││               │\n│  Upstream 4: Quay.io ────────────────┐│││               │\n│                                      ││││               │\n│                    ┌─────────────────┴┴┴┴──┐            │\n│                    │        Cache          │            │\n│                    │  (manifests + layers) │            │\n│                    └───────────────────────┘            │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Why this matters for Docker Hardened Images\n\n[Docker Hardened Images](https://docs.docker.com/dhi/) are great because of the minimal attack surface, near-zero CVEs, proper software bills of materials (SBOMs), and SLSA provenance. If you're evaluating base images for security-sensitive workloads, they should be on your list.\n\nBut adopting them creates the same operational friction as any new registry:\n\n* **Credential distribution**: You need to get Docker credentials to every system that pulls images from dhi.io.\n* **CI/CD changes**: Every pipeline needs to be updated to authenticate with dhi.io.\n* **Developer friction**: People need to remember to use the hardened variants.\n* **Visibility gap**: It's difficult to tell if teams are actually using hardened images vs. regular ones.\n\nVirtual registry addresses each of these:\n\n**Single credential**: Teams authenticate to GitLab. The virtual registry handles upstream authentication. You configure Docker credentials once, at the registry level, and they apply to all pulls.\n\n**No CI/CD changes per-team**: Point pipelines at your virtual registry. Done. The upstream configuration is centralized.\n\n**Gradual adoption**: Since images get cached with their full path, you can see in the cache what's being pulled. If someone's pulling `library/python:3.11` instead of the hardened variant, you'll know.\n\n**Audit trail**: The cache shows you exactly which images are in active use. Useful for compliance, useful for understanding what your fleet actually depends on.\n\n## Setting it up\n\nHere's a real setup using the Python client from this demo project.\n\n### Create the virtual registry\n\n```python\nfrom virtual_registry_client import VirtualRegistryClient\n\nclient = VirtualRegistryClient()\n\nregistry = client.create_virtual_registry(\n    group_id=\"785414\",  # Your top-level group ID\n    name=\"platform-images\",\n    description=\"Cached container images for platform teams\"\n)\n\nprint(f\"Registry ID: {registry['id']}\")\n# You'll need this ID for the pull URL\n```\n\n### Add Docker Hub as an upstream\n\nFor official images like Alpine, Python, etc.:\n\n```python\ndocker_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://registry-1.docker.io\",\n    name=\"Docker Hub\",\n    cache_validity_hours=24\n)\n```\n\n### Add Docker Hardened Images (dhi.io)\n\nDocker Hardened Images are hosted on `dhi.io`, a separate registry that requires authentication:\n\n```python\ndhi_upstream = client.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-docker-username\",\n    password=\"your-docker-access-token\",\n    cache_validity_hours=24\n)\n```\n\n### Add other upstreams\n\n```python\n# MCR for .NET teams\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://mcr.microsoft.com\",\n    name=\"Microsoft Container Registry\",\n    cache_validity_hours=48\n)\n\n# Quay for Red Hat stuff\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://quay.io\",\n    name=\"Quay.io\",\n    cache_validity_hours=24\n)\n```\n\n### Update your CI/CD\n\nHere's a `.gitlab-ci.yml` that pulls through the virtual registry:\n\n```yaml\nvariables:\n  VIRTUAL_REGISTRY_ID: \u003Cyour_virtual_registry_ID>\n\n  \nbuild:\n  image: docker:24\n  services:\n    - docker:24-dind\n  before_script:\n    # Authenticate to GitLab (which handles upstream auth for you)\n    - echo \"${CI_JOB_TOKEN}\" | docker login -u gitlab-ci-token --password-stdin gitlab.com\n  script:\n    # All of these go through your single virtual registry\n    \n    # Official Docker Hub images (use library/ prefix)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/library/alpine:latest\n    \n    # Docker Hardened Images from dhi.io (no prefix needed)\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/python:3.13\n    \n    # .NET from MCR\n    - docker pull gitlab.com/virtual_registries/container/${VIRTUAL_REGISTRY_ID}/dotnet/sdk:8.0\n```\n\n### Image path formats\n\nDifferent registries use different path conventions:\n\n| Registry | Pull URL Example |\n|----------|------------------|\n| Docker Hub (official) | `.../library/python:3.11-slim` |\n| Docker Hardened Images (dhi.io) | `.../python:3.13` |\n| MCR | `.../dotnet/sdk:8.0` |\n| Quay.io | `.../prometheus/prometheus:latest` |\n\n### Verify it's working\n\nAfter some pulls, check your cache:\n\n```python\nupstreams = client.list_registry_upstreams(registry['id'])\nfor upstream in upstreams:\n    entries = client.list_cache_entries(upstream['id'])\n    print(f\"{upstream['name']}: {len(entries)} cached entries\")\n\n```\n\n## What the numbers look like\n\nI ran tests pulling images through the virtual registry:\n\n| Metric | Without Cache | With Warm Cache |\n|--------|---------------|-----------------|\n| Pull time (Alpine) | 10.3s | 4.2s |\n| Pull time (Python 3.13 DHI) | 11.6s | ~4s |\n| Network roundtrips to upstream | Every pull | Cache misses only |\n\n\n\n\nThe first pull is the same speed (it has to fetch from upstream). Every pull after that, for the cache validity period, comes straight from GitLab's storage. No network hop to Docker Hub, dhi.io, MCR, or wherever the image lives.\n\nFor a team running hundreds of pipeline jobs per day, that's hours of cumulative build time saved.\n\n## Practical considerations\nHere are some considerations to keep in mind:\n\n### Cache validity\n\n24 hours is the default. For security-sensitive images where you want patches quickly, consider 12 hours or less:\n\n```python\nclient.create_upstream(\n    registry_id=registry['id'],\n    url=\"https://dhi.io\",\n    name=\"Docker Hardened Images\",\n    username=\"your-username\",\n    password=\"your-token\",\n    cache_validity_hours=12\n)\n```\n\nFor stable, infrequently-updated images (like specific version tags), longer validity is fine.\n\n### Upstream priority\n\nUpstreams are checked in order. If you have images with the same name on different registries, the first matching upstream wins.\n\n### Limits\n\n* Maximum of 20 virtual registries per group\n* Maximum of 20 upstreams per virtual registry\n\n## Configuration via UI\n\nYou can also configure virtual registries and upstreams directly from the GitLab UI—no API calls required. Navigate to your group's **Settings > Packages and registries > Virtual Registry** to:\n\n* Create and manage virtual registries\n* Add, edit, and reorder upstream registries\n* View and manage the cache\n* Monitor which images are being pulled\n\n## What's next\n\nWe're actively developing:\n\n* **Allow/deny lists**: Use regex to control which images can be pulled from specific upstreams.\n\nThis is beta software. It works, people are using it in production, but we're still iterating based on feedback.\n\n## Share your feedback\n\nIf you're a platform engineer dealing with container registry sprawl, I'd like to understand your setup:\n\n* How many upstream registries are you managing?\n* What's your biggest pain point with the current state?\n* Would something like this help, and if not, what's missing?\n\nPlease share your experiences in the [Container Virtual Registry feedback issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/589630).\n## Related resources\n- [New GitLab metrics and registry features help reduce CI/CD bottlenecks](https://about.gitlab.com/blog/new-gitlab-metrics-and-registry-features-help-reduce-ci-cd-bottlenecks/#container-virtual-registry)\n- [Container Virtual Registry documentation](https://docs.gitlab.com/user/packages/virtual_registry/container/)\n- [Container Virtual Registry API](https://docs.gitlab.com/api/container_virtual_registries/)",[739,753,740],"product",{"featured":13,"template":14,"slug":755},"using-gitlab-container-virtual-registry-with-docker-hardened-images",{"content":757,"config":767},{"title":758,"description":759,"authors":760,"heroImage":762,"date":763,"category":10,"tags":764,"body":766},"How IIT Bombay students are coding the future with GitLab","At GitLab, we often talk about how software accelerates innovation. But sometimes, you have to step away from the Zoom calls and stand in a crowded university hall to remember why we do this.",[761],"Nick Veenhof","https://res.cloudinary.com/about-gitlab-com/image/upload/v1750099013/Blog/Hero%20Images/Blog/Hero%20Images/blog-image-template-1800x945%20%2814%29_6VTUA8mUhOZNDaRVNPeKwl_1750099012960.png","2026-01-08",[261,623,765],"open source","The GitLab team recently had the privilege of judging the **iHack Hackathon** at **IIT Bombay's E-Summit**. The energy was electric, the coffee was flowing, and the talent was undeniable. But what struck us most wasn't just the code — it was the sheer determination of students to solve real-world problems, often overcoming significant logistical and financial hurdles to simply be in the room.\n\n\nThrough our [GitLab for Education program](https://about.gitlab.com/solutions/education/), we aim to empower the next generation of developers with tools and opportunity. Here is a look at what the students built, and how they used GitLab to bridge the gap between idea and reality.\n\n## The challenge: Build faster, build securely\n\nThe premise for the GitLab track of the hackathon was simple: Don't just show us a product; show us how you built it. We wanted to see how students utilized GitLab's platform — from Issue Boards to CI/CD pipelines — to accelerate the development lifecycle.\n\nThe results were inspiring.\n\n## The winners\n\n### 1st place: Team Decode — Democratizing Scientific Research\n\n**Project:** FIRE (Fast Integrated Research Environment)\n\nTeam Decode took home the top prize with a solution that warms a developer's heart: a local-first, blazing-fast data processing tool built with [Rust](https://about.gitlab.com/blog/secure-rust-development-with-gitlab/) and Tauri. They identified a massive pain point for data science students: existing tools are fragmented, slow, and expensive.\n\nTheir solution, FIRE, allows researchers to visualize complex formats (like NetCDF) instantly. What impressed the judges most was their \"hacker\" ethos. They didn't just build a tool; they built it to be open and accessible.\n\n**How they used GitLab:** Since the team lived far apart, asynchronous communication was key. They utilized **GitLab Issue Boards** and **Milestones** to track progress and integrated their repo with Telegram to get real-time push notifications. As one team member noted, \"Coordinating all these technologies was really difficult, and what helped us was GitLab... the Issue Board really helped us track who was doing what.\"\n\n![Team Decode](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/epqazj1jc5c7zkgqun9h.jpg)\n\n### 2nd place: Team BichdeHueDost — Reuniting to Solve Payments\n\n**Project:** SemiPay (RFID Cashless Payment for Schools)\n\nThe team name, BichdeHueDost, translates to \"Friends who have been set apart.\" It's a fitting name for a group of friends who went to different colleges but reunited to build this project. They tackled a unique problem: handling cash in schools for young children. Their solution used RFID cards backed by a blockchain ledger to ensure secure, cashless transactions for students.\n\n**How they used GitLab:** They utilized [GitLab CI/CD](https://about.gitlab.com/topics/ci-cd/) to automate the build process for their Flutter application (APK), ensuring that every commit resulted in a testable artifact. This allowed them to iterate quickly despite the \"flaky\" nature of cross-platform mobile development.\n\n![Team BichdeHueDost](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/pkukrjgx2miukb6nrj5g.jpg)\n\n### 3rd place: Team ZenYukti — Agentic Repository Intelligence\n\n**Project:** RepoInsight AI (AI-powered, GitLab-native intelligence platform)\n\nTeam ZenYukti impressed us with a solution that tackles a universal developer pain point: understanding unfamiliar codebases. What stood out to the judges was the tool's practical approach to onboarding and code comprehension: RepoInsight-AI automatically generates documentation, visualizes repository structure, and even helps identify bugs, all while maintaining context about the entire codebase.\n\n**How they used GitLab:** The team built a comprehensive CI/CD pipeline that showcased GitLab's security and DevOps capabilities. They integrated [GitLab's Security Templates](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/templates/Security) (SAST, Dependency Scanning, and Secret Detection), and utilized [GitLab Container Registry](https://docs.gitlab.com/user/packages/container_registry/) to manage their Docker images for backend and frontend components. They created an AI auto-review bot that runs on merge requests, demonstrating an \"agentic workflow\" where AI assists in the development process itself.\n\n![Team ZenYukti](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380253/ymlzqoruv5al1secatba.jpg)\n\n## Beyond the code: A lesson in inclusion\n\nWhile the code was impressive, the most powerful moment of the event happened away from the keyboard.\n\nDuring the feedback session, we learned about the journey Team ZenYukti took to get to Mumbai. They traveled over 24 hours, covering nearly 1,800 kilometers. Because flights were too expensive and trains were booked, they traveled in the \"General Coach,\" a non-reserved, severely overcrowded carriage.\n\nAs one student described it:\n\n*\"You cannot even imagine something like this... there are no seats... people sit on the top of the train. This is what we have endured.\"*\n\nThis hit home. [Diversity, Inclusion, and Belonging](https://handbook.gitlab.com/handbook/company/culture/inclusion/) are core values at GitLab. We realized that for these students, the barrier to entry wasn't intellect or skill, it was access.\n\nIn that moment, we decided to break that barrier. We committed to reimbursing the travel expenses for the participants who struggled to get there. It's a small step, but it underlines a massive truth: **talent is distributed equally, but opportunity is not.**\n\n![hackathon class together](https://res.cloudinary.com/about-gitlab-com/image/upload/v1767380252/o5aqmboquz8ehusxvgom.jpg)\n\n### The future is bright (and automated)\n\nWe also saw incredible potential in teams like Prometheus, who attempted to build an autonomous patch remediation tool (DevGuardian), and Team Arrakis, who built a voice-first job portal for blue-collar workers using [GitLab Duo](https://about.gitlab.com/gitlab-duo-agent-platform/) to troubleshoot their pipelines.\n\nTo all the students who participated: You are the future. Through [GitLab for Education](https://about.gitlab.com/solutions/education/), we are committed to providing you with the top-tier tools (like GitLab Ultimate) you need to learn, collaborate, and change the world — whether you are coding from a dorm room, a lab, or a train carriage. **Keep shipping.**\n\n> :bulb: Learn more about the [GitLab for Education program](https://about.gitlab.com/solutions/education/).\n",{"slug":768,"featured":13,"template":14},"how-iit-bombay-students-code-future-with-gitlab",{"promotions":770},[771,785,796,808],{"id":772,"categories":773,"header":775,"text":776,"button":777,"image":782},"ai-modernization",[774],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":778,"config":779},"Get your AI maturity score",{"href":780,"dataGaName":781,"dataGaLocation":243},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":783},{"src":784},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":786,"categories":787,"header":788,"text":776,"button":789,"image":793},"devops-modernization",[753,569],"Are you just managing tools or shipping innovation?",{"text":790,"config":791},"Get your DevOps maturity score",{"href":792,"dataGaName":781,"dataGaLocation":243},"/assessments/devops-modernization-assessment/",{"config":794},{"src":795},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":797,"categories":798,"header":800,"text":776,"button":801,"image":805},"security-modernization",[799],"security","Are you trading speed for security?",{"text":802,"config":803},"Get your security maturity score",{"href":804,"dataGaName":781,"dataGaLocation":243},"/assessments/security-modernization-assessment/",{"config":806},{"src":807},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":809,"paths":810,"header":813,"text":814,"button":815,"image":820},"github-azure-migration",[811,812],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":816,"config":817},"See how GitLab compares to GitHub",{"href":818,"dataGaName":819,"dataGaLocation":243},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":821},{"src":795},{"header":823,"blurb":824,"button":825,"secondaryButton":830},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":826,"config":827},"Get your free trial",{"href":828,"dataGaName":50,"dataGaLocation":829},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":505,"config":831},{"href":54,"dataGaName":55,"dataGaLocation":829},1776444479370]