{"id":36147,"date":"2026-05-07T11:44:53","date_gmt":"2026-05-07T11:44:53","guid":{"rendered":"https:\/\/www.nvecta.com\/blog\/?p=36147"},"modified":"2026-05-07T11:44:53","modified_gmt":"2026-05-07T11:44:53","slug":"real-time-personalisation-architecture","status":"publish","type":"post","link":"https:\/\/www.nvecta.com\/blog\/real-time-personalisation-architecture\/","title":{"rendered":"Real-Time Personalisation at Scale: Architecture, Trade-Offs &#038; Production Challenges"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Most users today do not want to see the same homepage, product grid, or recommendations as everyone else. They want something that feels relevant to them, right now. That is what Real-Time Personalisation is about: showing the right thing to the right person at the right moment, not hours later, but within the same click.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">At NVECTA, we work with teams building exactly this. And the honest truth is that getting personalisation to work well in production is hard. The ideas are straightforward; the engineering is not. This blog breaks down how real-time personalisation systems are actually built, what decisions teams have to make, and where things get complicated.<\/span><\/p>\n<h3><b>What Is Real-Time Personalisation?<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Simply put, real-time personalisation means your system knows something about a user at the moment and uses that information before responding.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are different levels of how &#8220;real-time&#8221; a system can actually be:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Batch personalisation<\/b><span style=\"font-weight: 400;\"> &#8211; the model runs overnight, and the results are applied the next day. Slow, but cheap to build.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Near-real-time<\/b><span style=\"font-weight: 400;\"> &#8211; the system refreshes every few minutes. Better, but still a step behind.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>True real-time<\/b><span style=\"font-weight: 400;\"> &#8211; the system reads what a user just did and factors it into the response within the same request. This is the hard one.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">The difference matters more than people think. Knowing that a user just searched for &#8220;running shoes&#8221; is far more useful than knowing they bought shoes six months ago. The closer your system can operate to the present moment, the more relevant it becomes.<\/span><\/p>\n<h3><b>Use Cases: Where Real-Time Personalisation Actually Makes a Difference<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Before going into architecture, it helps to see where this actually gets used and why it matters in each context.<\/span><\/p>\n<h4><b>E-Commerce\u00a0<\/b><\/h4>\n<p>When someone is browsing a clothing site and keeps clicking on black jackets, a good <a href=\"https:\/\/www.nvecta.com\/blog\/what-is-ecommerce-cdp-benefits-guide\/\">e-commerce CDP<\/a> picks that up within the session and starts showing more black jackets, or similar items, without the user having to search for them. Batch systems would miss this entirely.<\/p>\n<h4><b>Media and Streaming\u00a0<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">Platforms like video or podcast apps use real-time signals to decide which thumbnail to show, which row to put first, and which content to push. What a user watched an hour ago is often more useful than their full watch history.<\/span><\/p>\n<h4><b>Banking and Fintech\u00a0<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">If someone just transferred a large amount of money, showing them a savings plan prompt at that moment is far more relevant than showing it to them on a random day. Real-time personalisation lets financial apps react to what is happening in real time.<\/span><\/p>\n<h4><b>Travel Booking\u00a0<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">A user who has searched for &#8220;Bali&#8221; three times in a single session shows strong intent. A good system surfaces Bali deals, Bali hotel recommendations, and relevant travel insurance without the user having to ask.<\/span><\/p>\n<h4><b>SaaS Products<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">Enterprise tools use real-time personalisation to surface the right onboarding tip, upsell prompt, or feature suggestion based on what the user is actively doing in the product at that moment.<\/span><\/p>\n<h4><b>Advertising<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">This is where the stakes are highest for latency. A real-time ad system has to decide which ad to show, to whom, at what price, in under 100 milliseconds. That is the most demanding version of this problem.<\/span><\/p>\n<h3><b>How These Systems Are Actually Built<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">A real-time personalisation system has five main pieces. They all need to work together, and the failure of any one of them breaks the whole thing.<\/span><\/p>\n<h4><b>1. Collecting Events as They Happen<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">Every click, scroll, search, and purchase gets sent as an event into a pipeline. Tools like Apache Kafka or AWS Kinesis handle this. They are built to receive millions of events per second without dropping any.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The important things to get right here are ensuring events follow a consistent format so nothing downstream breaks, and ensuring events from the same user arrive in order so the system can understand sequences of behaviour.<\/span><\/p>\n<h4><b>2. Turning Events into Features<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">A raw event like &#8220;user clicked product ID 4821&#8221; is not directly useful to a model. It needs to be turned into something like &#8220;user has clicked three running shoes in the last five minutes.&#8221; That transformation is called feature computation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Tools like Apache Flink do this in real time. The outputs are stored in a feature store, a database built specifically to serve these values quickly.<\/span><\/p>\n<p><b>3. The Feature Store<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The feature store sits between the pipeline and the model. When a user makes a request, the personalisation system pulls their features from the store and sends them to the model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Speed is everything here. If the feature store is slow, the whole system is slow. Most teams use Redis or a similar in-memory database to keep reads fast.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are two broad approaches to keeping the feature store up to date. One is the Lambda approach, which runs a fast stream pipeline and a slower batch pipeline in parallel and merges the results. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">It is reliable but complex. The other is the Kappa approach, which uses only the stream pipeline. It is simpler to run but harder to fix when something goes wrong. Most teams end up somewhere in between.<\/span><\/p>\n<h4><b>4. The Model<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">The model is what actually decides what to show. It takes in a user&#8217;s features and scores a set of candidate items, then returns a ranked list.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Running a heavy model on thousands of items for every <a href=\"https:\/\/www.reddit.com\/r\/cursor\/comments\/1ileb1w\/slow_requests_are_deliberately_slowed_down_and_i\/\" target=\"_blank\" rel=\"noopener\">request is too slow<\/a>. The common solution is to use two steps: first, a fast, simple model narrows the list from thousands of items to around 100. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Then, a more detailed model scores just those 100 and picks the best ones. This way, you get quality recommendations without blowing your time budget.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Models also go stale. A model trained last week does not know about new products or recent trends. Teams retrain frequently, from once a day to once an hour for fast-moving domains, and roll out updates carefully to avoid breaking things.<\/span><\/p>\n<h4><b>5. Putting It All Together and Measuring What Works<\/b><\/h4>\n<p><span style=\"font-weight: 400;\">The final step takes the ranked results, applies any business rules (do not show out-of-stock items, do not repeat what was shown last time), and returns the response to the user.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Just as important is logging what happened. Every impression and every click feeds back into the system so models can keep improving. A\/B tests sit here too, so teams can measure whether a new model or a new ranking strategy actually moves the numbers.<\/span><\/p>\n<h3><b>The Trade-Offs Nobody Talks About Enough<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Real-time personalisation forces you to make choices where every option has a cost. Here are the ones that come up most often.<\/span><\/p>\n<p><b>Speed vs accuracy<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Better models take longer to run. But a slower response costs you, users. The practical answer is to use a lightweight model for the first pass and a heavier model only for the final ranking of a short list.<\/span><\/p>\n<p><b>New users vs existing users<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You cannot personalise for someone you know nothing about. New users get a bad experience if you try to force personalisation too early. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">A sensible approach is to start with location or time-of-day defaults, then shift to session signals, then move to full personalisation once the user has done enough to give you something to work with.<\/span><\/p>\n<p><b>Freshness vs cost<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Keeping features up to date to the second is expensive. Not every feature needs to be. Decide what actually needs to be real-time and what can be updated every 10 minutes or even once a day.<\/span><\/p>\n<p><b>Always being up vs always being accurate<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Distributed systems fail sometimes. When the feature store is slow or the model is unreachable, the system has to decide what to do. Serving a slightly generic result is almost always better than serving nothing. Build fallbacks in from the start.<\/span><\/p>\n<p><b>Privacy vs depth of personalisation<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The signals that make personalisation most accurate are often the most personal. Regulations like GDPR and CCPA put real limits on what you can store and for how long. This is not optional. Build data handling into the pipeline design, not as an afterthought.<\/span><\/p>\n<h3><b>Making It Work at High Traffic<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">A few patterns make a big difference when volume goes up.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pre-computing results for predictable users before they even make a request significantly reduces response times. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Caching at multiple layers, from in-memory on the server all the way down to the model inference layer, reduces the amount of work that needs to happen in real time. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Batching multiple feature lookups into a single network call, rather than issuing many, reduces tail latency and makes systems feel unreliable.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The serving layer should hold no state of its own. All user data lives in the feature store. This makes scaling simple: add more servers when traffic rises, remove them when it drops.<\/span><\/p>\n<h3><b>What to Measure<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">A personalisation system that cannot be measured cannot be improved. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The most important things to track are whether users engage more (clicks, session length, pages visited), whether they convert more (purchases, sign-ups), whether recommendations are varied enough that users do not see the same things repeatedly, <\/span><\/p>\n<p><span style=\"font-weight: 400;\">How fresh the features actually are at the time the model uses them, and whether the system meets its latency targets under real load.<\/span><\/p>\n<h3><b>Conclusion<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Real-time personalisation is genuinely hard to build well. It touches every part of the stack, from how you collect data to how fast your database responds to how often you retrain your models. The decisions you make at each layer have real consequences for speed, cost, accuracy, and user trust. None of them is obvious, and most of them involve giving something up to get something else.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">NVECTA exists to make these decisions easier. Rather than building each layer from scratch, teams working with NVECTA get a system in which the pipelines, feature store, model serving, and experimentation layer are already connected and production-tested. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The hard parts, the ones this blog describes, are handled. What is left is the work that actually matters: understanding your users and building experiences they find useful. Real-time personalisation done right is a real competitive edge. NVECTA is how you get there without spending years on infrastructure.<\/span><\/p>\n<h3><b>Frequently Asked Questions<\/b><\/h3>\n<p><b>Q1: How much data do you need before real-time personalisation is worth building?<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A few thousand active users with consistent behaviour is a reasonable starting point. Below that, there is not enough signal for the models to learn anything meaningful. Start simpler and add complexity as your user base grows.<\/span><\/p>\n<p><b>Q2: What do you show to new users with no history?<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Start with defaults based on location, time of day, or what is generally popular. As soon as they start clicking around, use those session signals. Move them to full personalisation once they have done enough to give you something to work with, usually around 5 to 10 meaningful actions.<\/span><\/p>\n<p><b>Q3: How fast does a real-time personalisation system need to be?<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For most consumer apps, the whole process from receiving the request to returning personalised results should take between 50 and 150 milliseconds. Ad systems need to be faster, often under 100ms. Internal tools can be a bit slower. Set a target for your product and measure against it consistently.<\/span><\/p>\n<p><b>Q4: How often should you retrain the personalisation model?<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It depends on how quickly your users&#8217; interests change. E-commerce and media apps often require retraining daily, or even hourly. B2B tools can get away with weekly. Watch how model performance changes over time and retrain before it drops to a point where users notice.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most users today do not want to see the same homepage, product grid, or recommendations as everyone else. They want something that feels relevant to them, right now. That is what Real-Time Personalisation is about: showing the right thing to the right person at the right moment, not hours later, but within the same click. [&hellip;]<\/p>\n","protected":false},"author":25,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1112],"tags":[],"class_list":["post-36147","post","type-post","status-publish","format-standard","hentry","category-personalization"],"_links":{"self":[{"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/posts\/36147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/users\/25"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/comments?post=36147"}],"version-history":[{"count":2,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/posts\/36147\/revisions"}],"predecessor-version":[{"id":36149,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/posts\/36147\/revisions\/36149"}],"wp:attachment":[{"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/media?parent=36147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/categories?post=36147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nvecta.com\/blog\/wp-json\/wp\/v2\/tags?post=36147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}