
Apple is tightening up how developers manage the limited context window for its on-device Foundation Models, introducing new tools in iOS 26.4 Release Candidate that make token usage easier to track and control.
Like most large language models, Apple’s Foundation Models rely on a context window – the fixed amount of tokens available to hold system instructions, user prompts and model responses. On Apple’s on-device models, that window is relatively small at 4,096 tokens. In chat-style apps where prompts and replies accumulate, that capacity can be exhausted quickly.
When the limit is hit, the framework throws an .exceededContextWindowSize error and the model can no longer respond within the same session. To recover, developers must spin up a new session and re-establish the necessary state so the user’s workflow can continue without a jarring interruption.
Apple’s recent work pushes developers to treat the context window as a constrained resource, much like memory in a low-resource system. Instead of assuming the model will always have room, apps are expected to plan for how that space is used and reclaimed over time.
Apple has previously published technical guidance with practical strategies for working within the limit. Those recommendations include:
- Splitting large tasks into multiple language model sessions instead of trying to handle everything in one long conversation.
- Requesting shorter answers from the model to reduce token consumption per response.
- Trimming prompts, for example by summarising earlier parts of a conversation or keeping only the most relevant turns.
- Using tool calling efficiently so the model doesn’t waste tokens on unnecessary context.
These approaches help reduce the likelihood of hitting the 4,096-token ceiling, but they don’t remove the need for precise accounting. Developers still have to understand what is contributing to token usage at any given time.
iOS 26.4 RC adds new capabilities to the Foundation Models framework aimed squarely at that problem. A new contextSize property on SystemLanguageModel exposes the available context capacity. Rather than hard-coding the 4,096-token maximum, apps can query contextSize directly, making token-aware logic more robust against future changes.
Complementing that, a tokenCount(for:) method lets developers measure how many tokens a given input will consume. This becomes the basis for what is effectively token bookkeeping: before sending prompts, tools or other data to the model, the app can estimate their token cost and adapt accordingly.
According to a practical walkthrough by developer Artem Novichkov, effective context management means accounting for every element that contributes to the window. That includes the system prompt, all user instructions and the model’s own responses. It also extends to tool usage, which can be a hidden source of token bloat.
When tools are involved, their definitions – including the tool’s name, description and argument schema – are serialised and sent alongside the instructions. This additional metadata can significantly increase the token count, eating into the context budget faster than developers might expect.
Novichkov’s article refers to a tokenUsage(for:) method; in the latest iOS 26.4 Release Candidate, that API appears under the name tokenCount(for:). The new additions to the Foundation Models framework are marked with @backDeployed(before: iOS 26.4, macOS 26.4, visionOS 26.4), which makes them available on earlier OS versions that already support the framework, not just on devices running iOS 26.4 and its desktop and visionOS counterparts.
This combination – knowing the actual context capacity via contextSize and measuring consumption via tokenCount(for:) – gives developers the raw data they need to manage the 4,096-token window more intelligently. It does not fully solve the complexity of deciding what to keep, summarise or discard in a live conversation, but it lays the groundwork for more predictable, user-friendly on-device AI experiences.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.







