GPT-5.4 by OpenAI: What’s new? 9 Key Improvements

Table of contents

    Just a few years ago, AI-powered tools were mainly able to generate text or answer questions. Today, their role is changing rapidly – increasingly, they are not only supporting human work but also beginning to perform real operational tasks. OpenAI’s latest model, GPT-5.4, is another step in that direction. OpenAI introduced GPT-5.4 to the world on March 5, 2026, making the model available simultaneously in ChatGPT (as “GPT-5.4 Thinking”), via the API, and in the Codex environment. At the same time, a GPT-5.4 Pro variant was released for the most demanding analytical and research tasks. GPT-5.4 was designed as a new, unified approach to AI models – one system intended to combine the latest advances in reasoning, coding, and agentic workflows, while also handling tasks typical of knowledge work more effectively: document analysis, report preparation, spreadsheet work, and presentation creation. The model is also a response to two important problems of the previous generation. First, capabilities across the OpenAI ecosystem were fragmented – some models were better for conversation, others for coding, and still others for more complex reasoning. Second, the development of agent-based systems exposed the cost and complexity of integrating tools. GPT-5.4 is meant to simplify that ecosystem by offering a single model capable of working across many environments and with many tools at the same time. In practice, this means AI increasingly resembles a digital co-worker that can analyze data, prepare business materials, and even perform some operational tasks on the user’s computer. In this article, we take a look at the most important improvements in GPT-5.4 and what they mean for companies and business decision-makers.

    GPT 5.4 in business

    1. What’s new in GPT 5.4?

    1.1 One model instead of many specialized tools

    One of the key changes in GPT-5.4 is the combination of previously separate AI capabilities into a single model. In previous generations, OpenAI developed several different systems specialized for specific tasks – one model was better at programming, another at data analysis, and another at generating quick conversational responses. In practice, this meant that users or applications often had to choose the right model depending on the task. GPT-5.4 integrates these capabilities into one system. The model combines coding skills, advanced reasoning, tool use, and document or data analysis. As a result, one model can perform different types of tasks – from preparing a report, to analyzing a spreadsheet, to generating a code snippet or automating a process in an application. For business users, this also means a simpler way to use AI. Instead of wondering which model to choose for a specific task, it is increasingly enough to simply describe the problem. The system selects the way of working on its own and uses the appropriate capabilities of the model during the task. As a result, AI begins to resemble a more universal digital co-worker rather than a set of separate tools for different use cases.

    1.2 Better support for knowledge work

    The new generation of the model has been clearly optimized for tasks typical of knowledge workers – analysts, lawyers, consultants, and managers. OpenAI measures this, among other ways, with the GDPval benchmark, which includes tasks from 44 different professions, such as financial analysis, presentation preparation, legal document interpretation, and spreadsheet work. In this test, GPT-5.4 achieves results comparable to or better than a human’s first attempt in about 83% of cases, while the previous version of the model scored around 71%. This represents a noticeable leap in tasks typical of office and analytical work. In practice, the model can, for example, analyze a large dataset in a spreadsheet, prepare a report with conclusions, create a presentation summarizing results, or suggest the structure of a financial model. As a result, it can increasingly serve as support for day-to-day analytical and decision-making tasks in companies.

    1.3 Built-in computer and application use

    One of the most groundbreaking functions of GPT-5.4 is the ability to directly use a computer and applications. The model can analyze screenshots, recognize interface elements, click buttons, enter data, and test the solutions it creates. In practice, this marks a shift from AI that merely “advises” to AI that can actually perform operational tasks – for example, operating systems, entering data, or automating repetitive office activities. In previous generations of models, the user had to perform all actions in applications manually – AI could only suggest what to do. GPT-5.4 introduces native so-called computer use functions, allowing the model to go through the steps of a process itself, for example by opening a website, finding the right form field, and filling in data. In practice, this function is mainly available in development environments and automation tools – such as Codex or the OpenAI API – where the model can control a browser or application via code. In simpler use cases, it may be enough to upload a screenshot or describe an interface, and the model can suggest specific actions or generate a script that automates the entire process. In practice, some of these capabilities can already be seen in the ChatGPT interface – for example, in the so-called agent mode (available after hovering over the “+” next to the prompt field), which allows the model to carry out multi-step tasks and use different tools while working. This makes it possible to build AI agents that independently perform tasks across many applications – from spreadsheet work to handling business systems.

    GPT-5.4 by OpenAI

    1.4 The ability to work on very long documents and large datasets

    GPT-5.4 can analyze much larger amounts of information in a single task than previous models. In practice, this means AI can work simultaneously on very long documents, large reports, or entire datasets without needing to split them into many smaller parts. Technically, the model supports a context window of up to around one million tokens, which can be compared to being able to “read” hundreds of pages of text at the same time. Thanks to this, GPT-5.4 can analyze, for example, entire code repositories, lengthy legal contracts, multi-year financial reports, or extensive project documentation in a single process. For companies, this primarily means less manual work when preparing data for AI and greater consistency of analysis. Instead of feeding documents to the model in multiple parts, teams can work on the full source material, increasing the chances of more complete conclusions and more accurate recommendations.

    1.5 Intelligent tool management (tool search)

    GPT-5.4 introduces a mechanism for searching tools during work. Instead of loading all tool definitions into context at the beginning of a task, the model can search for the needed functions only when they are required. As a result, context usage and token consumption drop by as much as several dozen percent. For companies building AI systems, this means cheaper and more scalable agent-based solutions. Example: imagine an AI system in a company that has access to many different integrations – for example, a CRM, invoicing system, customer database, calendar, analytics tool, and email platform. In the older approach, the model had to “know” all of these tools from the start of the task, which increased the amount of processed data and the cost of operation. Thanks to the tool search mechanism, GPT-5.4 can first determine what it needs and only then reach for the right tool – for example, first checking customer data in the CRM and only later using the invoicing system to generate a document. As a result, the process is more efficient and easier to scale as the number of integrations grows.

    1.6 Better collaboration with tools and process automation

    GPT-5.4 significantly improves the way the model uses external tools – such as web browsers, databases, company files, or various APIs. In previous generations, AI could often perform a single step, but had difficulty planning an entire process made up of many stages. The new model is much better at coordinating multiple actions within a single task. It can, for example, plan the next steps itself: find the necessary information, analyze the data, and then prepare the result in a specified format – for example, a report, table, or presentation. A good example of these capabilities is generating working applications based on a functional description. During testing, I asked GPT-5.4 to create a simple browser-based arcade game of the “escape maze” type. The AI generated a complete application in HTML, CSS, and JavaScript – with a randomly generated maze, an enemy (in this case, the boss… 😉 chasing the player (an office worker hunting for benefits/rewards), and a leaderboard. The code was created based on a description of how the game should work and – as shown below – functions in the browser as a working prototype.

    This example shows that GPT-5.4 is becoming increasingly capable in end-to-end development tasks, where an idea or functional description can be turned into a working application.

    1.7 Fewer hallucinations and more reliable answers

    One of the most frequently cited problems of earlier AI models was so-called hallucination, a situation in which the model generates information that sounds credible but is in fact false. In a business environment, this is particularly important because incorrect data in a report, analysis, or recommendation can lead to poor decisions. According to OpenAI, GPT-5.4 introduces a noticeable improvement in this area. Compared with GPT-5.2, the number of false individual claims dropped by around 33%, and the number of answers containing any error at all – by around 18%. This means the model generates false information less often and is more likely to indicate uncertainty or the need for additional verification. In practice, this translates into greater usefulness in tasks such as data analysis, report preparation, market research, or document work. Verification of critical information is still recommended, but the amount of manual checking may be significantly lower than with earlier generations of models. Importantly, early analyses by independent AI model comparison services – such as Artificial Analysis – as well as user test results from crowdsourced platforms like LM Arena also suggest improved stability and answer quality in GPT-5.4, especially in analytical and research tasks.

    1.8 The ability to steer the model while it is working

    GPT-5.4 introduces greater interactivity when performing more complex tasks. Unlike earlier models, the user does not have to wait until the entire process is finished to make changes or redirect the AI. In practice, this can be seen in modes such as Deep Research or in tasks requiring longer reasoning. The model often first presents an action plan – a list of steps it intends to perform, such as finding data, analyzing materials, or preparing a summary. It then shows the progress of the work and indicates what stage it is currently at. During this process, the user can refine the instruction, add new requirements, or redirect the analysis without having to start from scratch. The interface allows the user to send another message that updates the model’s working context – for example, expanding the scope of the analysis, indicating new sources, or changing the final report format. For business users, this means a more natural way of working with AI. Instead of issuing a one-time instruction and waiting for the result, the collaboration resembles a consulting process – the model presents a plan, performs the next steps, and can be guided in real time toward the right direction.

    1.9 A faster operating mode (Fast Mode)

    GPT-5.4 also introduces a special accelerated working mode called Fast Mode. In this mode, the model generates answers faster thanks to priority processing and limiting some of the additional reasoning stages. In practice, this means a shorter wait time for results, which can be particularly useful in business contexts where response time matters – for example, customer support, draft content generation, or preliminary data analysis. It is worth remembering, however, that Fast Mode does not change the model’s underlying architecture or knowledge. The difference is mainly that the system spends less time on additional analysis steps in order to generate an answer faster. In more complex tasks – such as extensive data analysis or detailed research – the standard working mode may therefore provide more in-depth results. Fast Mode may also involve more intensive use of computational resources. Answers are produced faster, but at the cost of more intensive use of computing infrastructure. In many cases, this means a slightly larger carbon footprint per individual query, although the exact scale depends on the data center infrastructure and the way the model operates.

    GPT-5.4 - 2026

    2. Underappreciated but important changes in GPT-5.4 from a business perspective

    In addition to the most publicized functions, such as the larger context window or computer use, GPT-5.4 also introduces several less visible changes that may be highly significant for companies in practice. The model more often starts work by presenting an action plan, handles long and multi-step tasks better, and is more responsive to user instructions. Combined with better collaboration with tools and greater stability in long analyses, this makes GPT-5.4 much more suitable for automating real business processes than earlier generations of models.

    2.1 The model more often starts with an action plan

    GPT-5.4 much more often presents a plan for solving the task first, and only then generates the result. In practice, this means the model may show, for example:

    • what data it will gather,
    • what analysis steps it will perform,
    • what the output format will be.

    For businesses, this means greater predictability in how AI works and the ability to correct the direction of the analysis before the model completes the whole task.

    2.2 Much better stability in long-running tasks

    Previous models often “got lost” in long processes – for example, when analyzing many documents or building an application. GPT-5.4 has been clearly optimized for long, multi-step workflows. Thanks to this, the model can:

    • work on a single task for a longer time,
    • perform subsequent analysis steps,
    • iteratively improve the result.

    This is a key change for companies building AI agents that automate business processes.

    2.3 Better model “steerability” by the user

    GPT-5.4 is much more responsive to system instructions and user corrections. It is easier to define:

    • the response style,
    • the model’s way of working,
    • the level of caution in decision-making.

    For companies, this means the ability to build AI agents tailored to specific business processes, for example more conservative ones for financial analysis or more creative ones for marketing.

    2.4 Greater resistance to “losing context”

    GPT-5.4 is much less likely to lose context in long conversations or analyses. The model remembers earlier information better and can use it in later stages of the task. For business users, this means more consistent collaboration with AI on long projects, for example when preparing strategy, reports, or documentation.

    3. The most important GPT-5.4 numbers in one place

    Metric GPT-5.4 What it means in practice
    Context window up to 1 million tokens the ability to work on hundreds of pages of documents or large code repositories in a single task
    GDPval benchmark (office tasks) approx. 83% wins or ties a clear improvement over GPT-5.2 (~71%) in analytical and office tasks
    Computer use (OSWorld-Verified) approx. 75% effectiveness the model can perform computer tasks at a level close to a human
    Hallucination reduction approx. 33% fewer false claims greater reliability of answers in analyses and reports
    Answers containing errors approx. 18% fewer less need for manual verification of results
    Token savings thanks to tool search up to 47% less cheaper and more scalable agent systems
    API price (base model) approx. $2.50 / 1M input tokens an increase over GPT-5.2, but with greater computational efficiency
    API price (GPT-5.4 Pro) approx. $30 / 1M input tokens a version for the most demanding tasks and research

    4. What to watch out for when implementing GPT-5.4 in a company

    Although GPT-5.4 introduces many improvements, practical use also comes with certain costs and trade-offs. From an organizational perspective, it is worth paying attention to several aspects.

    4.1 Higher API prices – but greater efficiency

    OpenAI raised official per-token rates compared with earlier models. At the same time, GPT-5.4 is meant to be more efficient – in many tasks, it needs fewer tokens to achieve a similar result. The final cost therefore depends more on how the model is used than on the token price itself.

    4.2 The Pro version offers the highest performance – but is significantly more expensive

    The model is also available as GPT-5.4 Pro, intended for the most complex analytical and research tasks. It offers the longest reasoning processes and the best results, but comes with clearly higher computational costs.

    4.3 Conscious selection of the model’s working mode is necessary

    Users increasingly choose between different model modes – for example Thinking, Pro, or Fast Mode. The greatest strengths of GPT-5.4 are visible in long, multi-step tasks, while in simpler business use cases faster modes may be more cost-effective.

    4.4 Complex analyses may take longer

    GPT-5.4 was designed as a model focused on deeper reasoning. In more complex tasks – for example, analyzing many documents – the answer may appear more slowly than with previous generations of models.

    4.5 A very large context window may increase costs

    The ability to work on huge sets of information is a major advantage of GPT-5.4, but with very large documents it may increase token usage. In practice, companies often use data selection techniques or document retrieval instead of passing entire datasets to the model.

    4.6 Automating actions in applications requires control

    GPT-5.4 collaborates better with tools and applications, making it possible to automate many processes. In enterprise systems, however, it is still worth applying safeguards – such as permission limits, operation logging, or user confirmation for critical actions.

    4.7 Benchmarks do not always reflect real-world use

    Some of the model’s advantages are based on benchmarks, often conducted under controlled research conditions. In practice, results may differ depending on how the model is used in ChatGPT or enterprise systems.

    4.8 The biggest benefits are visible in agent-based tasks

    Early user tests suggest that the biggest improvements in GPT-5.4 appear in tasks requiring tool use and process automation – for example, analyzing multiple data sources or working in a browser. In simple conversational tasks, the differences versus earlier models may be less visible.

    GPT-5.4 Implementation

    5. GPT-5.4 and new AI capabilities – why implementation security is becoming critical

    The development of models like GPT-5.4 shows that AI is moving increasingly fast from the experimentation phase into real business processes. AI can already analyze documents, prepare reports, automate tasks, and even build applications. At the same time, the importance of safe and responsible AI management within organizations is growing – especially where AI works with sensitive data or supports key business decisions.

    That is why formal AI management standards are starting to play an increasingly important role. One of the most important is ISO/IEC 42001, the first international standard for artificial intelligence management systems (AIMS – AI Management System). It defines, among other things, the principles of risk management, data control, oversight of AI systems, and transparency of AI-based processes.

    TTMS is among the absolute pioneers in implementing this standard. Our company launched an AI management system compliant with ISO/IEC 42001 as the first organization in Poland and one of the first in Europe (the second on the continent). Thanks to this, we can develop and implement AI solutions for clients in line with international standards of security, governance, and responsible use of artificial intelligence.

    You can read more about our AI management system compliant with ISO/IEC 42001 here:
    https://ttms.com/pressroom/ttms-adopts-iso-iec-42001-aligned-ai-management-system/

    6. AI solutions for business from TTMS

    If the development of models like GPT-5.4 is encouraging your organization to implement AI in day-to-day business processes, it is worth reaching for solutions designed for specific use cases. At TTMS, we develop a set of specialized AI products supporting key business processes – from document analysis and knowledge management, to training and recruitment, to compliance and software testing. These solutions help organizations implement AI safely in everyday operations, automate repetitive tasks, and increase team productivity while maintaining control over data and regulatory compliance.

    • AI4Legal – AI solutions for law firms that automate, among other things, court document analysis, contract generation from templates, and transcript processing, increasing lawyers’ efficiency and reducing the risk of errors.
    • AI4Content (AI Document Analysis Tool) – a secure and configurable document analysis tool that generates structured summaries and reports. It can operate locally or in a controlled cloud environment and uses RAG mechanisms to improve response accuracy.
    • AI4E-learning – an AI-powered platform enabling the rapid creation of training materials, transforming internal organizational content into professional courses and exporting ready-made SCORM packages to LMS systems.
    • AI4Knowledge – a knowledge management system serving as a central repository of procedures, instructions, and guidelines, allowing employees to ask questions and receive answers aligned with organizational standards.
    • AI4Localisation – an AI-based translation platform that adapts translations to the company’s industry context and communication style while maintaining terminology consistency.
    • AML Track – software supporting AML processes by automating customer verification against sanctions lists, report generation, and audit trail management in the area of anti-money laundering and counter-terrorist financing.
    • AI4Hire – an AI solution supporting CV analysis and resource allocation, enabling deeper candidate assessment and data-driven recommendations.
    • QATANA – an AI-supported software test management tool that streamlines the entire testing cycle through automatic test case generation and offers secure on-premise deployments.

    FAQ

    Is GPT-5.4 currently the best AI model on the market?

    In many benchmarks, GPT-5.4 ranks among the top AI models. In tests related to coding, tool usage, and task automation, the model often achieves results comparable to or higher than competing systems such as Claude Opus or Gemini. On independent AI model comparison platforms, GPT-5.4 is frequently classified as one of the best models for agent-based and programming tasks.

    Is GPT-5.4 better than GPT-5.3 for programming?

    GPT-5.4 largely inherits the coding capabilities known from the GPT-5.3 Codex model and expands them with new functions related to reasoning and tool usage.

    In practice, this means developers no longer need to switch between different models depending on the task. GPT-5.4 can generate code, debug applications, and work with large project repositories within a single workflow.

    Can GPT-5.4 test its own code?

    Yes – one of the interesting capabilities of GPT-5.4 is the ability to test its own solutions. The model can run generated applications, check how they work in a browser, or analyze a user interface based on screenshots.

    In some development environments, the model can even automatically open an application in a browser, detect visual or functional issues, and correct the code on its own. This approach significantly speeds up prototyping and debugging.

    How long can GPT-5.4 work on a single task?

    One of the characteristic features of GPT-5.4 is its ability to work on complex tasks for an extended period of time. In Pro mode, the model can analyze a problem for several minutes or even longer before generating a final answer.

    In practice, this means the model can execute multi-step processes such as searching the internet, analyzing data, generating code, and testing solutions within a single task.

    Is GPT-5.4 slower than previous models?

    In many tests, GPT-5.4 takes more time to begin generating an answer than earlier models. This is because the model performs additional analysis steps before producing a result.

    Some testers have noted that the time required to produce the first response may be noticeably longer than in previous versions. At the same time, the additional reasoning often leads to more detailed and accurate answers.

    Is GPT-5.4 suitable for building AI agents?

    Yes – GPT-5.4 was designed with agent-based systems in mind, meaning applications that can perform multi-step tasks on behalf of the user.

    Thanks to features such as computer use, tool search, and integrations with external tools, the model can automatically search for information, analyze data, and perform actions within applications.

    What does “computer use” mean in GPT-5.4?

    Computer use refers to the model’s ability to interact with computer interfaces. This means the AI can analyze screenshots, recognize interface elements, and perform actions similar to those performed by a user – such as clicking buttons, entering data, or navigating between applications.

    What is tool search in GPT-5.4?

    Tool search is a mechanism that allows the model to look up tools only when they are needed. In older approaches, all tool definitions had to be included in the prompt at the start of a task.

    With GPT-5.4, the model receives only a lightweight list of tools and retrieves detailed definitions only when necessary, which reduces token usage and system costs.

    What does “knowledge work” mean in the context of AI?

    Knowledge work refers to tasks that mainly involve analyzing information and making decisions based on data. Examples include work performed by analysts, consultants, lawyers, and managers.

    Models such as GPT-5.4 are designed to support these tasks, for example by analyzing documents, generating reports, or preparing presentations.

    What is the “Thinking” mode in GPT-5.4?

    Thinking mode is a model configuration in which the AI spends more time analyzing a task before generating a response.

    This allows the model to perform more complex operations, such as analyzing data from multiple sources or planning multi-step solutions.

    What does “vibe coding” mean?

    Vibe coding is an informal term describing a programming style where a developer describes the idea or functionality of an application in natural language and the AI generates most of the code.

    In this approach, the developer focuses more on supervising the process, testing the application, and refining the results generated by AI rather than writing every line of code manually.

    Is GPT-5.4 free?

    GPT-5.4 is partially free. The basic version of the model may be available in ChatGPT under the free plan, although with limitations on the number of queries or available features.

    Full capabilities, including longer reasoning sessions or access to the Pro variant, are usually available in paid subscription plans or through the OpenAI API.

    Is GPT-5.4 better than Claude and Gemini?

    In many benchmarks, GPT-5.4 achieves results comparable to or higher than competing models such as Claude or Gemini, especially in coding, automation, and tool usage.

    However, different models may still perform better in specific areas. Some tests show that other models may have advantages in interface design or multimodal analysis.

    Can GPT-5.4 create websites?

    Yes, the model can generate HTML, CSS, and JavaScript code needed to build websites or simple web applications.

    In many cases, it can produce a complete prototype including page structure, interface elements, and basic functionality. However, the generated code still requires verification and refinement by developers or designers.

    Can GPT-5.4 analyze documents and company files?

    Yes. One of the key capabilities of GPT-5.4 is analyzing large amounts of information, including documents, reports, and datasets.

    Thanks to its large context window, the model can process long documents or multiple files simultaneously. In practice, this allows it to assist with tasks such as contract analysis, report processing, or document summarization.

    Is GPT-5.4 safe to use in companies?

    Like any AI tool, GPT-5.4 requires a proper approach to data security. In business applications, it is important to control data access, use auditing mechanisms, and choose an appropriate deployment environment.

    Many companies integrate AI with internal systems or use solutions operating in controlled cloud environments or on-premise infrastructure.

    How can companies start using GPT-5.4?

    The easiest way is to begin experimenting with the model in ChatGPT, where teams can test its capabilities on real business tasks.

    In the next step, companies often integrate AI models into their own systems through APIs or adopt specialized AI tools for specific tasks such as document analysis, knowledge management, or workflow automation.