Assignment 2 | Notion

Reflect on the relationship between labels and images in a machine learning image classification dataset? Who has the power to label images and how do those labels and machine learning models trained on them impact society?

Untitled

“The Treachery of Images.”

“This is not an apple.”

“This is not an apple.”

“Don’t judge a book by its cover.” But now machine learning are trying their best to “label” and “categorize” images, ignoring the richer meaning behind them. Take the above two artworks as an example. As a human being, we could easily tell the irony purpose of the title to the content. However, to a machine learning model, which are trained on identifying the common pattern of different items, can barely understand the meaning of images. Thus, lots of critics argues that AI can only memory but not understand.

This bring up a questions what is an image, and what kind of information we can get from it. Obviously, image is not solely a combination of pixels that represent items in the real world. It contains information that could not be described in words - emotions and thoughts with a certain degree of ambiguity. Especially for the artworks, AI now can copy their style simply by looking at the “pattern” of their color style, but fail to understand the composition, structure, balance, and most importantly, the feelings behind the pictures.

Even for human being, labeling images with few words is a hard and complicated task. This is not because the complexity of the items appeared in the pictures, but because of the profound elements reflected in it such as the culture, emotions, resonation, etc.

AI now still lack the ability to understand and see beyond the pixels. As french philosopher Michel Foucault once wrote for “The Treachery of Images”:

From painting to image, from image to text, from text to voice, a sort of imaginary pointer indicates, shows, fixes, locates, imposes a system of references, tries to stabilise a unique space. But why have we introduced this teacher’s voice? Because scarcely has he stated, “This is a pipe,” before he must correct himself and stutter, “This is not a pipe, but a drawing of a pipe,” “This is not a pipe but a sentence saying that this is not a pipe,” “The sentence ‘this is not a pipe,’ this is a not a pipe: the painting, written sentence, drawing of a pipe — all of this is not a pipe.”

Thoughts:

However, from another aspect, the current AI model resembles an infant who just begin to observe the world around them but fail to understand the meaning behind. So what if we can instruct them on connecting labels with the culture background, emotions, and ideas. For example, if you show the artwork “The Treachery of Images” to a 5 years old child, he/she will not be able to tell the meaning behind the title. But as he/she learn and grow in the society, gradually grasp the concept of irony, art critics, and symbols. He/she may generate their own interpretation on it.

There is another features of AI that I found interesting, or even ironic. Even though it cannot tell the meaning behind images, it already has biases and hierarchy. Influenced by the labels and training datas, AI “categorize” human into different classes by their biometric signature, which is kind of insane.

Other assumptions about the relationship between pictures and concepts recall physiognomy, the pseudoscientific assumption that something about a person’s essential character can be gleaned by observing features of their bodies and faces. ImageNet takes this to an extreme, assuming that whether someone is a “debtor,” a “snob,” a “swinger,” or a “slav” can be determined by inspecting their photograph. In the weird metaphysics of ImageNet, there are separate image categories for “assistant professor” and “associate professor”—as though if someone were to get a promotion, their biometric signature would reflect the change in rank.

Train your own image classifer using transfer learning and ml5.js and apply the model to an interactive p5.js sketch. You can train the model with Teachable Machine or with your own ml5.js code. Feel free to try sound instead of or in addition to images. You may also choose to experiment with a "regression" rather than classification.

I design and code a “snake game” that players could use different gesture to control its direction. In the Teachable Machine, there are four categories “UP”, “DOWN”, “LEFT”, and “RIGHT”.