Online Exclusive: “A World Encoded” by Katherine Yang

Katherine Yang

A World Encoded

I had never thought so obsessively about poetry as I did when I was learning about syntactic linguistics.

There were different kinds of verbs, my professor had told me. If John threw a ball, that throwing was a doing. But if John loved Mary, my professor stressed that, within the structure of the sentence, John served as what was called an “experiencer” of the emotion of love. Loving was not a doing — syntactically speaking, of course. He drew for us two diagrammatic trees that demonstrated how the nature of a doing or experiencing verb influenced how the sentence was modeled in our internal understanding of language.

There is poetry in constellations, and poetry in patterns of birdsong, but this class was how I learned about the poetry that inhabits the formal structures of language. Was the professor aware of the symphony he was plotting out on the blackboard? He was composing out loud a striking thesis on what it meant to perform the act of love, or, indeed, to simply be the experiencer of some cosmic feeling. I was desperate to learn more, especially if the professor continued to explain these concepts in a way that inadvertently sounded like a poetic proclamation.

Over the course of the semester, I learned a litany of beautiful linguistics. I also learned that, as humans, we had the capacity to understand and forgive a lot that our models couldn’t.

Elsewhere in the textbook, we found that “John likes herself” was an ungrammatical sentence. The reflexive pronoun “herself” couldn’t be used to stand in for John in that sentence because John was not a girl; the sentence didn’t compute.

What the twenty-year-old introductory textbook didn’t satisfactorily explain was that the conflicting pieces of information arose not from the words themselves, but rather from our expectations of who “John” was likely to be — namely, a boy who used the reflexive pronoun “himself.” We knew nothing about “John” but that the models in our prototypical grammar had categorically precluded John from being anything other than a textbook example of a boy.

If John strayed from expectations, the expression would be struck out as a grammatical error.

I can’t help but think about words and their power, everywhere I look.

In the computer code that I write and read every day, the word “let” is strewn everywhere. Each instance is a wish or desire — a mundane one, certainly, but powerful. With a word, I can make space in memory, create a variable called x, and give it a value of ten.

There’s something magical and intoxicating about that ability. It calls to mind the arcane magic of rituals and ceremony: the magic words “I name you” have the power to bring life to an inanimate object, bind a spirit, or set the ball rolling on a great prophecy.

In speech act theory, certain kinds of sentences serve as both expressions and actions. For those three magic words, “I name you” may appear to be simply a verbal expression, but in our verbal culture, words have the potential to carry the great weight and power of context. By vocalizing an intention, we can actualize that unspoken life into the physical world. Put simply, we can speak ideas into existence.

Much like the field of syntactic linguistics, the field of computer science is rich with systems of abstraction. A computer is, at its core, a highly sophisticated calculator, which we have creatively wrangled into the ultimate tool to solve any and all of our most complex interpersonal challenges.

Take, for example, some tens of thousands of people in the employment of a large corporation. Placed in front of this contingent and handed a sophisticated calculator, what could you proceed to do but enumerate them? The pieces then fall into place accordingly. This is how we begin to make sense of people, to understand and manage them. We collect pieces of data about people so we can store them in a big, centralized database. We compress human beings into a linear array, so we can aggregate them, categorize them, and delineate them.

These are the systems that keep our world running in its infinite complexity. For better or for worse, they’re built upon millions of human judgments about how we see the world.

Abstraction runs down the design of the computer to such fundamental concepts like the binary. Even at the microscopic level of a bit flipping on and off, there are big questions to be asked. In a system built upon millions of combinations of on or off, one or zero, true or false — what does it mean to occupy space on a spectrum? What does it mean to be messy and to color outside the lines?

We built these systems. We built these worlds. I was starting to understand how they worked, and I was starting to imagine we could build others, too. I wanted to burst open these models and examine their pumping hearts.

I was beginning to form a nascent question. If these were the trappings that held the languages we speak and the languages we equip, what could a language of my own look like? What could it say — not just in its lexicon, but in its syntax and function? What alternative values could I encode; what new models could I construct?

Over the course of a year, I undertook a project of designing and writing my own small programming language.

I knew how to program, but up until now, I had no sense of how the gears turned under the hood. With a little bit of research and a lot of trial and error, I soon found that making a programming language is not unlike teaching a child how to read.

In broad strokes, the engine that powers a programming language works by taking a program written in the designated syntax and translating that text into executable instructions.

It starts by reading in each character one by one, piecing them together to form words that it can compare against its lexicon. If it encounters the word “let,” for example, it knows, from how the word has been encoded, that the word is a cue for the declaration of a new variable. It can now expect the name and value of a new variable to follow — information that it will store and use to understand future references to that variable name. (If what follows doesn’t match the expectation, it stutters to a stop from the resulting syntax error, much like our assumed “John likes herself” example.) Word by word, statement by statement, and function by function, it thus walks through the entirety of the program to assemble a structured representation of the actions to be taken.

At this point, I was feeling something like a spark of life under my fingertips. With this malleable clay in my hands, I set about trying to mold it into something interesting.

Instead of having it munge through numbers and executing complex algorithms like a traditional programming language might, I wanted my programming language to be engaged with written language. I wanted it to be a vehicle for examining and reflecting on language.

I wanted to retain the computing practice of storing and manipulating values in memory, but beyond that, I started thinking about ways I could bring in ideas from the literary and poetic world.

The question of the binary lingered with me. It is a fundamental model that drives so much of the computing world, determining logical pathways and producing lightning-fast calculations. But life rarely falls so neatly into these labeled buckets. Do concepts such as ambiguity and plurality have any place in a system?

In Babel by R. F. Kuang, the world runs on a type of magic called silver — more specifically, magical artifacts called silver bars. These bars are stuck to the bottom of carriages to make them run more safely quickly, for example, or embedded in gardens to induce greater tranquility. The bars are characterized by engraving two words of a chosen match pair on either side, each representing the same concept in different languages. For example, a match pair might consist of the words “speed,” from English, against its Latin root, “spēs,” whose meanings include the idea of hope. The two words are closely tied, yes, but translation is an ambitious and tricky endeavor. More often than not, we aren’t hoping to communicate every nuanced facet of a word and its place in a language, but we’re settling instead for the closest approximation or an emulation of the original idea. Indeed, it’s this elusive translational distance in context and connotation that serves as the rich source of power in this world’s magic system.

I find poetry in the shape of a system and in the awkward spaces where it can’t quite fit. To a layperson, to a mind new to the intricacies of interpretation — to a computational system, perhaps — the machine of translation seems so straightforward until you consider the sheer scope of its objective.

Words occupy amorphous spaces in our galaxy of language. Poets, who work as craftspeople in the medium of language, are immensely talented in understanding how to wield the raw material of words, both in isolation and in combination, to tell a specific story or evoke a specific emotion. And still — or perhaps, as a result — a million readers will inevitably have a million different interpretations of the same poem and the words used within.

In contrast, computers don’t traffic in anything so human and changeable. If one system communicates to another that x is equal to ten, no shred of nuance can possibly be lost in that exchange.

And yet, if pressed, I might make the case that, in the computing world, a device called the regular expression is one that seems to play on a lot of the same questions that human language does.

Let’s say that you wanted to find every North American phone number in a spreadsheet. Instead of typing in random sequences of ten-digit numbers until you found every one, you could write one magical expression into the search box that would encapsulate every phone number that could possibly exist. Assuming a consistent format, the expression could look like this:

\d{3}-\d{3}-\d{4}

The backslash followed by a “d” is a special code that covers any digit from zero to nine. The dash means you want a literal dash, and the “3” or “4” in curly brackets means you want three or four of those numbers. In one concise expression, you would be able to find the phone numbers “123-456-7890” or “555-555-5555” or “206-555-0100.”

Handy for doing a batch operation on a spreadsheet, of course, but that’s not why we’re here. Even in the weeds of formal computational grammars, there’s something in the act of searching and the descriptive language of representation that rings of poetry to me.

Instead of dealing with phone numbers, let’s say that you have a habit of confusing the words “word” and “world” — that pesky “l” sneaks in and out whenever you’re not paying attention. You want to find every use of those two words, perhaps to mark the places where you’ll need to reread the passage and confirm the spelling of the word. You could search for each word individually, but here’s a regular expression you might write instead:

worl?d

Here, most of the bare letters are requests for those literal letters, but if the letter is followed by a question mark, that letter is marked as optional. When the system uses this query to search through the text, it’ll look for every word that starts with the letters “wor,” then either does or doesn’t have the letter “l,” and finally ends with the letter “d.” In other words, here is an expression that searches for, or perhaps represents, both “word” and “world.” It encapsulates the orthographic, and perhaps semantic, span of the two words; it, in a way, is both “word” and “world.” Once the syntax of the system is decoded, the concealed layers of meaning begin to reveal themselves.

The question mark is a special character that represents either absence or presence, and thus represents both simultaneously, leading to the creation of something like Schrodinger’s letters. Its cloaking ability isn’t limited to just letters, either.

Take this expression:

(will )?(have )?miss(ed)?

The parentheses group letters or words together, so that they can be operated upon as a unit by the following question mark. This expression marks the words “will” and “have” as optional; same for the past tense “-ed” postfix on the verb “miss.” It encapsulates multiple possible tenses and aspects of the act and feeling of missing: we “miss,” “missed,” “will miss,” “have missed,” and “will have missed.” It is past, present, future, and more, in one fell swoop. Each symbolic marking unlocks another layer of possibility. In its most expanded form, the expression only consists of three words, but the journey of its decoding starts to feel like a poem.

This is where I started to find a home for my language.

Coem (“co-uhm”) is a programming language, in its bones, but it has grown into a broader experiment in the contemplation of words and their relationships.

Variable names are built with regular expressions: with the question mark at your disposal, you can speak about words and worlds always in the same breath; a hearth can always carry a heart. With another special operator, representing the choice between multiple options between groups of letters, you could consider the holy union of poetry and pottery, or gaze upon a moon that always abuts the night.

It’s a two-way relationship between the definition and the reference. When I write worl?d, I mean to say something about both “word” and “world”; when I later ask for “word,” the system returns to me that same joint definition. Like two halves of an imperfect match pair, the words occupy the same space in memory, giving rise to something new from their juxtaposition, like overlapping shards of colored glass.

On top of this foundation, I’ve been exploring more approaches to examining and interacting with words. One is a palimpsest mode, which allows for variables to grow throughout the piece. In the traditional paradigm, redefining a variable replaces the old value with the new one, but in a palimpsest mode, variables can expand with each new definition, taking on traces of time like layers of writing on parchment. When it’s asked to report its value, it returns not only its latest definition, but every definition it’s taken on throughout the course of the piece.

The idea of the “output” has also been an area of exploration. In programming, there is the most visible output: the result of the program, or the file or webpage or software that is produced by the program. There is also the programmer’s internal kind of output. In an area of the window called the console, the programmer can “print” messages at certain points in the program to help identify problems in the code.

The focus of Coem is on the writing and the text. The output is worth reading, but so is the source code, and so, too, is the internal output. These are all elements of “the text” to me. In Coem, then, printing the result of a variable is the output, and the output is fed back into the source code as a comment or annotation, in a feedback loop of definition, reference, and affirmation.

Piece by piece, the language starts to feel like a tiny looking glass through to makings of a little world. It’s a world where we make space for plurality, where we platform intentional expression over algorithmic detritus, where we pull the curtain back, where we acknowledge growth and complexity, where we feel our way forward through language and poetry.


Katherine Yang is an artist, programmer, and crafter exploring windows into code as a meaningful and poetic medium. Her work has previously appeared in Taper, The HTML Review, Electronic Literature Organization, and Backslash Lit. She currently works as a developer and designer in Boston; she spends her free time knitting and reading about mythology. You can find her in her pixel paper home at kayserifserif.place.



Comments are closed.