The Once Times

Opinion

Why AI Still Can’t Tell What Looks Good: The Hidden Difficulty of Pretty UIs and Powerful Prose

We taught AI to pass every test, but beauty doesn’t.

7 min read

It’s a tale of two capacities. Ask a large language model to write a Python script that scrapes a website, and it spits out clean, functional code in seconds. Pose it a complex calculus problem, and it often walks through the solution step by step. But ask the same model to “design a beautiful mobile app login screen” or “write an article that will go viral,” and the results, while sometimes passable, usually feel hollow, generic copycat designs, unmemorable prose, a persistent whiff of the uncanny valley.

Website UI design on dribbble.com
Website UI design on dribbble.com
Website designs generated by 1st AI model on [designarena.ai](http://designarena.ai).
Website UI by best AI model on designarena.ai

Why is it that AI, which can ace the bar exam and generate entire codebases, stumbles when beauty is on the line? The answer isn’t about processing power or a lack of design rules. It’s about the fundamental gulf between problems with objectively correct answers and those whose success rests entirely on the shifting sands of human subjectivity, emotion, and taste.

Coding and mathematics are, at their core, systems of explicit, unambiguous rules. A programming language has a formal grammar and a defined semantics. A mathematical statement is either provable or not, true or false. There are infinite possible implementations of a sorting algorithm, but they can all be tested against the same input-output contract. Even when code is considered “creative,” it lives within a cage of compilers, linters, and unit tests that instantly flag errors. For an AI, this is a friendly environment: the training data is full of correct solutions paired with problem statements, the feedback loop is deterministic, and the success criteria are crystal clear.

This advantage has been supercharged by modern agent architectures. Many coding agents today use the ReAct (Reasoning + Acting) paradigm: the model not only writes code but actively interacts with a compiler or interpreter, runs the program, reads error messages, and decides what to fix in a tight, self-correcting loop. It can verify correctness without any human in the loop. A binary “compiles and passes tests” signal that is both instant and precise. This automated verification is a foundational reason why AI has become so startlingly capable at generating software because it can learn not just from static code examples but from the real-time feedback of execution.


Now, enter the task of designing a “pretty” login screen. What does pretty mean? Is it minimal and airy, or warm and colorful? Should it use rounded corners or sharp ones? A centered layout or left-aligned? Dark mode or light? The number of possible satisfactory designs is unbounded, and there is no compiler to say “wrong.” Worse, what one user finds elegant, another finds sterile; a UI that delights a teenager may confuse an older adult. A pretty UI isn’t just about visual harmony—it’s about emotional resonance, usability, brand personality, and cultural context, all mashed together into a single holistic impression.


Writing an article is even more slippery. Unlike code, a good article doesn’t have a single correct output. It must grip a reader with a compelling hook, maintain rhythm and pacing, deploy just the right word at just the right moment, and leave a lasting emotional aftertaste. These qualities are not reducible to a set of rules. An article is a delicate negotiation between authorial voice, audience expectation, and the ephemeral mood of the day. A viral essay is often good not despite breaking conventional structure, but because it does so in a way that feels authentic and surprising—qualities AI, which is inherently a statistical pattern matcher, struggles to manufacture.


Teaching AI to code benefits from a massive, clean training signal: millions of repositories with code that, at the very least, compiles and often passes tests. For a pretty UI, there is no equivalent. We have screenshots, mockups, and design files, but they come without reliable “beauty scores.” A Dribbble shot might have thousands of likes, but likes are a noisy proxy—they reflect trends, the creator’s popularity, and fleeting fashions, not some objective prettiness. Training a model on such data teaches it to replicate the visual clichés of the moment, not to understand the underlying principles of delight.

Articles suffer from a similar data problem. The web is awash with text, but the vast majority is mediocre. Quality is often measured by clicks, time on page, or upvotes, all of which are subject to clickbait, sensationalism, and the echo chamber. The model learns to produce text that looks like popular content, but it misses the deeper craft: the empathy for the reader’s state of mind, the meticulous construction of an argument that thinks two steps ahead of the skeptic, the genuine human experience that makes a personal essay resonate.


At its core, a beautiful UI is not about pixels; it’s about how those pixels make a user feel. A well-placed animation isn’t just decorative. It provides reassurance, gives feedback, softens the cognitive load. A good article is not a sequence of grammatically correct sentences; it’s a mind-to-mind transfer of understanding and emotion. AI, as of today, has no real emotions, no lived experience, no intuitive sense of what it’s like to fumble through a confusing interface or to feel a story stir something deep. It can mimic empathy by leveraging patterns from human-written confessions and user research reports, but that mimicry often feels shallow under close scrutiny. The best design and writing come from a place of shared humanity, and that remains uniquely out of reach.

Code can be improved automatically: run the test suite, note the failure, feed the error message back, regenerate. Math can be checked with formal verifiers. This rapid, cheap, and precise feedback loop is a force multiplier for AI training. For UIs and articles, evaluation is painfully human. To know if a design is “prettier,” you need to run a usability test or gather subjective ratings from a diverse group of people. To judge if an article is compelling, someone actually has to read it slowly, attentively, and with critical taste. This human-in-the-loop process is slow, expensive, and noisy, making iterative refinement through reinforcement learning exponentially more difficult.

Where a ReAct-style coding agent can use “does it compile?” and “do tests pass?” as reliable, instantly computable reward signals to improve its output in real time, an agent attempting to refine a UI layout or a narrative arc has no such oracle. There is no compiler for beauty, no unit test for a memorable headline. The only verifier is the slow, subjective, and irreproducible judgment of a human reader or user.

Beauty is no quality in things themselves: It exists merely in the mind which contemplates them; and each mind perceives a different beauty.

— David Hume, Of the Standard of Taste (1757)

True beauty often involves breaking the rules in a way that somehow feels right. It requires taste that mysterious ability to discern the sublime from the merely competent. AI models are trained to minimize loss against their training distribution which means they are good at interpolation and remixing existing ideas, but shockingly poor at true invention. They can produce a login screen that looks like every other clean-tech startup’s login screen, but they cannot intentionally pioneer a new aesthetic that feels both unprecedented and obvious in hindsight. They can write an article that reads like a million other think pieces, but they can’t channel a raw, singular voice that makes you forget you’re reading at all. Because beautiful design and powerful writing are defined not by conformity to a pattern, AI’s core operating principle becomes a barrier.

Of course. This doesn’t mean AI will never have beautiful design or compelling prose. Already, generative tools serve as capable assistants to spark ideas, generate variations, draft copy that a human editor can shape into something brilliant. Multimodal models could learn to connect text, image, and layout in more sophisticated ways, and human preference tuning is slowly nibbling at the subjectivity problem. But the central challenge remains philosophical rather than technical: whenever there’s no universally agreed-upon metric for success, when the answer changes depending on who you ask and what day it is, the algorithmic “optimize this number” paradigm hits a wall. Teaching a machine to produce a pretty UI or a moving article is, in essence, teaching it to understand us and we’re still learning to articulate what we find beautiful ourselves.

You might like

Editor's Picks