Saturday, March 5, 2022

Is Beyonder the Most Powerful Superhero Character? Machine Learning Models Say So


In a previous post I discussed who is the coolest superhero according to textual descriptions. For fans of comics and animation, perhaps the question of who is the most powerful superhero character, not only in Marvel and DC Comics, but across all different fantasy universes, stirs more debates. But judging from tabular and textual data from the Superheroes NLP dataset, it turns out Beyonder should claim the crown.


At first glance, the answer should be straight forward. The dataset, scraped from SuperHeroDb (SHDb) about two years ago, does list the Overall Scores, which represents SHDb's assessment of the overall power of each character. But one tricky point is the overall_score column has mixed data types. Apart from numerical values, it has 107 characters listed as "-", presumably meaning "not available", and 18 characters marked as "∞", meaning having infinite power. If we rank the values directly, it would be ordered as strings, with "∞" at the top, followed by "94", "9", "89", "87". For convenience of processing and visualization, I turn them all into numerical values. As the original numerical values range from 1 to 237, I set "-" to 0, and "∞" to 299.


Ranking the Infinite Scored Characters by Power Stats and Superpower Counts

So the most powerful character should be among the 18 who have infinite power. Some may say we can stop at this point, as SHDb has determined they are equally infinitely powerful. But if we probe further, how can we rank infinity? A handy way is to look into auxiliary features. In the dataset there are two sets of features which can be useful. The first is the six power statistics (intelligence_score, strength_score, speed_score, durability_score, power_score and combat_score), which range from 0 to 100. I add them together to get a "combined_score" to summarize them. The second is to count the number of superpowers listed in the "superpowers" column that each character possesses.


But now we have two ways to rank the infinite class, the first is to rank their combined scores first, if equal then rank their superpower counts. It would make Golden Master's Mech from LEGO's Ninjago the most powerful character, who have perfect power scores and 106 types of superpowers. 





On the other hand, if we give priority to superpower counts over combined scores, Black Alice from DC Comics would rank first. Her combined score is 565 out of 600, but has 126 types of superpower listed. In fact her only superpower that matters but not clearly listed is that she can temporarily usurp the magical powers of any being.




But a question arises. Does ranking by combined score or superpower counts follow the internal logic of the database? If we do a scatter plot of the overall score against the combined score, we can see most of the characters have overall scores less than 50, no matter how high their combined scores. Among characters who have perfect power stats, the overall scores vary from 27 to 299 (infinite). And among the infinite class, the combined score can be as low as 550. 



And the scatter plot of the overall score against superpower counts shows that, though in general characters who have more types of superpowers have higher overall scores, there is a large divergence in overall scores. Characters in the infinite class have 23 to 126 types of superpowers.



The Exclusive Superpowers of the Infinite Class

So we can infer that the most important factor of the overall score of a character is not power stats or number of superpowers, but the type of superpowers he or she possesses, and evidently not every type of superpower is equal in its contribution to the overall score. So we have to look into superpowers.


A comparison of the superpowers that the infinite and the non-infinite class possess shows that, omnipotent, apotheosis, orbing, salvation and willpower manipulation are the five superpowers exclusively possessed by the infinite class. Among the five one, omnipotent, which means unlimited power, is possessed by four of the characters which have infinite power. They are Abraxas, Eru Iluvatar, Life Entity and The Lord of Light, respectively the supreme being in Marvel, J.R.R. Tolkien, DC Comics and George R. R. Martin's universe. The other four types of exclusive superpowers record only a single instance each, and are less certain to be the sufficient conditions of infinite power.



Ranking by Probability Score of Classification Model

As those exclusive superpowers only account for 7 of the 18 characters in the infinite class, we may conclude that, perhaps with the exception of omnipotent, what makes one has infinite power lies in the combinations of superpowers. To find out what are the responsible superpowers, we may use some machine learning classification model. I choose logistic regression for it's highly interpretable and it gives a probability score, which can be viewed as a measurement on who has more of the "infinite class superpowers".


Since the purpose of this modeling is for analysis, not prediction, and the samples of infinite class are too few. I do not split the data into train and test sets, but cross validation shows it has a decent f1 score of 0.73. The coefficients of the model indicate that omnipotent is the most important factor, others include reality warping, vitakinesis, omnilingualism, matter absorption and so on, while strength score and combat score have moderate importance.



The probability scores of the 18 infinite class characters are very close, from 0.987650 to 0.999922, where Golden Master's Mech edges out at the top. While the Mech does not have omnipotent, it has important superpowers like reality warping, matter absorption, nigh-Omnipotent and nigh-omnipresent.



Scoring From Regression Model

Another approach is to do a regression to approximate the formula or algorithm the SHDb used to calculate the overall scores, then extrapolate into the characters of the infinite class. Linear regression is chosen for its interpretability, and Elastic-net regulation is used to reduce the number of zero coefficients, which should be closer to the case.


The possibility that SHDb's formula has interactions of variables, and the arbitrary setting of 299 overall scores for the infinite class should limit the accuracy of the linear model. But the Elastic-net model has a R squared score of 0.965, and in cross validation has a root mean squared error of 12.24, should be regarded as decent. In this model omnipotent has a weight of near 160 points in the overall score, while other high scoring superpowers include nigh-omnipotent, omniscient, causality_manipulation and so on.



In this second model, the omnipotent, omnipresent and omniscient Eru Iluvatar gets the highest score, Beyonder the second and Golden Master's Mech the fourth.



Textual Description Score

As the two models have different results, we may find further clues from textual descriptions to determine who is most powerful. After a look at the descriptions of the 18 characters, I decide to use a term matcher to score powerfulness, with each mention of positive keywords get 1 point, and each mention of negative keywords get -1.5 point, on the ground that a mention of vulnerability hurts more on the status of invincibility than a mention of powerfulness. While this kind of word marcher cannot be very accurate, it serves as a rough estimation of the powerfulness in description.


The positive keywords are:

"most powerful", "supreme", "ultimate", "omnipotence", "omnipotent", "immortality" ,"limitless", "unlimited" and "surpassing".


And the negative keyword are:

"limited", "nigh-omnipotence", "second to", "defeated", "weakness", "weaknesses", "weaker", "destroyed", "destroy him", "destroying", "second most powerful" and "restricted".


Under this scoring scheme, though Beyonder is described as being "nearly destroyed" once and weakened in later versions, the 5 mentions of omnipotence of the original Beyonder compensate that to get him 2 points, equals to the score of Man Of Miracles. Eru's description is rather bland and gets only 1 point, while the Mech is said to be destroyed costs him to get -0.5 point at the end.



Beyonder’s description:

Within a pocket realm in the Negative Zone, the omnipotent POS_KEYWORD , enigmatic Beyonders created discrete packets of reality-altering energy that could be accessed by others and contained within force fields as Cosmic Cubes (and Containment Units of other shapes). One created by Skrulls eventually evolved into the sentient Shaper of Worlds; one created by A.I.M. was manipulated by a number of forces on Earth before it evolved into Kubik. When Owen Reece accessed one of these energy packets via an electromagnetic accident, part of the energy mutated him into the powerful Molecule Man. The remaining energy gradually gained intelligence and became the Beyonder, who studied life by transporting a number of superhumans to his creation Battleworld and later by traveling to Earth and interacting directly with its residents. The Beyonder remained unfulfilled and eventually faked his death, creating a new realm over which he acted as a god. Eventually, however, the Shaper and Kubik revealed the truth to the Beyonder about his incompleteness, and he willingly merged with Reece to become a new, complete Cosmic Cube. The Cube removed its components from Reece, expelled him back to Earth, and evolved into Kosmos, taking a female form in hopes of avoiding the violent tendencies of its past incarnation. Kubik tutored Kosmos in the nature of humanity and the universe, exploring the Celestials and other cosmic entities, as well as performing experiments on the Fantastic Four. After Reece, who had managed to restore his powers, lost his lover, he went mad and assaulted Kosmos, drawing out the essence of the Beyonder and attacking it. Reece nearly destroyed NEG_KEYWORD the Beyonder, but Kubik, who was actually falling in love with Kosmos, convinced Reece to restore the Beyonder's essence to Kosmos to save her life. Kubik and Kosmos parted ways under unrevealed circumstances, and the Beyonder's personality sought dominance over Kosmos. She somehow became mortal but her diminished capacities and heightened sensations drove her mad. After slaughtering 64,000 Shi'ar colonists, she was incapacitated by the Imperial Guard telepath Oracle and placed in stasis in the Kyln space prison. Now known as the Maker, she regained consciousness but has only fragments of memory. She caused great chaos in the Kyln until subdued by Thanos and his new ally Skreet. Realizing that slaying Kosmos would unleash the Beyonder on the universe, Thanos shut down her mind but arranged to have her body kept alive forever via neurology-exempt nanocellular regeneration, serving as a living prison to the mad omnipotent POS_KEYWORD within her. In all of his versions, The Beyonder has the general ability to manipulate reality. The original Beyonder was considered to be the most powerful POS_KEYWORD being in the multiverse, having power surpassing POS_KEYWORD Cosmic Entities, such as The Living Tribunal and Eternity. He was capable of causing a multiversal wide destruction, and took over the entire earth with a mere thought. He also had vast psionic abilities, which enabled him to scan the minds of the entire world,[22] neutralize psychic probes from powerful telepaths,[23] erase the memories about himself from every human being on the planet,[1] among other abilities. His cognitive capacity is such that he can assimilate knowledge from the entire multiverse.[24] He can also easily change states of matter, and has a host of other different powers. He is endowed with superhuman strength of such an extent that it is potentially incalculable. Using his ability to manipulate reality, he can, in effect, regenerate damage done to his body by simply willing it repaired. Additional powers include teleportation, flight, the ability to choose his own physical resistances and attributes, as well as the ability to move others from one place to another via teleportation, such as the heroes and villains he moved through spacetime to his 'Battleworld' construct in the original Secret Wars. However, The Beyonder, went through several retcons, which significantly reduced his power. The retcons made The Beyonder arguably weaker NEG_KEYWORD than many Cosmic Entities, but he still retained his virtually infinite reality warping powers. Inherent in his near- limitless POS_KEYWORD psionic abilities the Beyonder has the potential to affect reality in a manner that could, in theory, simulate virtually any power.


Man Of Miracles’ description:

The being known as Mother is the creator of the universe. There is no being greater than she, and her powers appear to be limitless POS_KEYWORD . She is known as Mother to her children, of which there are more than there are numbers, but she is neither male nor female. In fact, she often appears as males if the situation suits her. She has appeared as many forms throughout the ages and can appear differently to different people simultaneously. Of her countless children, her two greatest disappointments were entrusted with the planet Earth. These two beings, whom we know as God and Satan, have squabbled and fought for ages. Their incessant bickering developed into complete hatred for one another and eventually all out war. Ordinarily, Mother would have allowed them to continue fighting as they saw fit (as she rarely gets involved with the affairs of her children). These two petty and hateful children, however, despite their never-ending feud managed to create something beautiful and wholly unique in the universe – mankind. Mankind was created by God to serve his will, but his creation was tainted by his hateful brother Satan. Man was given free will so as to disobey God’s commands. The combined influence of both God and Satan led to the unique creation that is a human being - a creature capable of boundless artistic expression and limitless POS_KEYWORD love, a creature which Mother deemed important enough to protect personally. During Armageddon, she appears to Spawn in the guise of the Man of Miracles, acting as his guide so that he may fulfill his role of stopping the Apocalypse and saving humanity from the forces of her two children. NA

Eru Iluvatar’s description:

Eru is the supreme POS_KEYWORD deity of Arda. He was the single creator, above the Valar, but has delegated almost all direct action within Eä to the Ainur, including the shaping of the Arda. NA

Golden Master’s Mech’s desccription:

While some of the Nindroids under Cryptor retrieved the Golden Weapons from the comet where they landed as a result of Garmadon's time travel exploits, Pythor led the others in claiming Borg Tower so as to adapt the assembly line for the Overlord's purposes. The Golden Master's Mech (also known as the Ultimate POS_KEYWORD Weapon) was a mech that was created from the remains of the Golden Weapons that were recovered by the Nindroids. Built to wreak havoc on New Ninjago City, the mech allowed the Golden Master to unleash his sacred power on the Ninja and the people of Ninjago. In the ensuing conflict, the Golden Master leveled buildings and caused widespread destruction. The mech would eventually create a giant web across Borg Tower, where the Ninja made their final stand against the Golden Master. When he incapacitated the Ninja and their masters, Zane sacrificed himself to freeze the Golden Master and his mech, destroying NEG_KEYWORD both in the process. the powers from golden master with mech are added


Combined Final Score

After the construction of three measurements of powerfulness, perhaps we should combine them into a final score. As the three are in different ranges, I first standardize the three scores and use Min-Max scaling to transform the range of the scores from 0 to 1, then take averages of the three transformed scores to make the final "weighted score".


Beyonder edges out Eru Iluvatar in the final score, and also becomes the most powerful character in Marvel's universe. While Beyonder is not listed to be omnipotent in the dataset, he has enough high scoring superpowers to make him taking the third and the second spots in the classification and regression models, and the text score propels him to the top spot. But there is one more issue: Beyonder is listed as neutral in alignment. If we restrict to the "good" superheros, then the most powerful comes to Eru Iluvatar. And Black Alice is ranked as the most powerful in DC Comics' Universe.



Coda

Apparently SHDb has overhauled the scoring system and the assessment of many characters since this dataset was scraped from it about two years ago. At present the list of infinite class characters is nearly completely different from this dataset, only Eru Iluvatar is on both lists. The overall score now can be in millions or even billions, while the six power stats can be infinite. And TIER is introduced in the calculation, where characters in TIER infinite have infinite overall scores. SHDb also reveals the present formula of calculating the overall score, which is quite complex:


( INT^1.3 + (STR*0.5 )^2 + (SPE*0.5)^2 + DUR^1.6 + (POW + (SPS*SPL))^2 + COM^1.8 ) ^ TIER


where

INT: intelligence

STR: Strength

SPE: Speed

DUR: Durability

POW: Power

SPS: Superpower Score

SPL: Superpower Level

COM: Combat

And a note states that:

Every Super Power has a score (SPS) that is used to calculate the Class. Each Super Power also has 3 levels (SPL). The level is set when connecting that Super Power to a character. The level determines the final score, of the Super Power, being used in the calculation.


So, what’s the point of doing all these analyses, when the data is based on a fictional database, and it has already changed? I think even if for a fictionally created database, the data in it are not arbitrary and follow some logic, the job of a data scientist is to see and analyze data as it is and try to discover this internal logic.


Original Dataset from Kaggle and Github

Source Code: Github and Kaggle


















Wednesday, March 2, 2022

The Coolest Superhero According to Cosine Similarity

Who is the coolest superhero?

Given only the two text columns, can you find a formula to find the coolest superhero?


In the description of Superheroes NLP Dataset in Kaggle, the creator Jonathan Besomi, also the co-development of text preprocessing toolkit Texthero, has some suggestions of analysis, this first listed above one may be the most interesting and challenging.


How can we determine who is the coolest? For social science researchers, they may define what is cool first, then list the different aspects of the concept of cool, and come up with some scoring schemes. This is the analytical way. An alternative idea springing up is somewhat like dating apps, you have the pictures of superheroes displayed, swipe right for cool and swipe left for not cool, and compile a ranked list. This crowdsourcing approach is based on popular perceptions, which may involve preconceived images. 


But if we look for purely textual analysis, word embedding vectors and cosine similarity may be the ready made tools. The following four attempts represent various ways of applying these tools, some of which may be unconventional, and the results may be unconvincing. Nevertheless it is a good exercise to get a feel of the potentials and limitations of using natural language processing for large amounts of textual data.


Word Embedding Vector and Cosine Similarity

Some words on the dataset first. The Superheros NLP dataset is scraped from Superheroes Database (SHDb). It has features of various power statistics, superpowers and appearance and so on. The columns in focus here are two textual columns, one on the history of each superhero character, the other on the description of power.  Both columns have more than a hundred null values, I fill them all by "NA", and join them into a single “text” column.



Word embedding vectors is mapping words into high dimensional vector space. In nlp library SpaCy used here, every word in their trained model has a vector of 300 dimensions. And in that vector space of word representation, words of similar meanings are pointing at roughly the same directions, so the cosine of the angle of the two vectors acts as a measure of similarity. We can get the word embedding vectors of the words and use a function to calculate cosine similarity, or use SpaCy's method of similarity, which is doing the same thing. SpaCy also has most_similar method to get the list of most similar words to a given word vector.


The word "cool" has multiple meanings of many subtle overtones. In popular usages when we say someone is cool, we usually mean he or she is hip, fashionable, excellent, composed, uniquely at their own, but it can also refer to not enthused, dispassionate and unfriendly, and also moderately cold temperature. The 100 most similar words of cool in SpaCy's large English model have "cool" in different spellings at the tops, then followed by "awesome", "nice", "pretty" and "fun", which are closer to the hip and excellent side of cool.


Word              Similarity Score

COOL                 1.0

COol                 1.0

cool                 1.0

Cool                 1.0

CooL                 1.0

AWesome              0.7616

Awesome              0.7616

AWESOME              0.7616

AWEsome              0.7616

awesome              0.7616

nICE                 0.7374

nice                 0.7374

NICE                 0.7374

Nice                 0.7374

NIce                 0.7374

PRETTY               0.6568

Pretty               0.6568

pretty               0.6568

PRetty               0.6568

Fun                  0.6487

fun                  0.6487

FUN                  0.6487

kinda                0.6418

Kinda                0.6418

KINDA                0.6418

neat                 0.6415

Neat                 0.6415

NEAT                 0.6415

amaZing              0.6363

amazing              0.6363

AMazing              0.6363

AMAZING              0.6363

Amazing              0.6363

Really               0.632

REALLY               0.632

reAlly               0.632

REALLy               0.632

REally               0.632

really               0.632

sO                   0.6226

So                   0.6226

SO                   0.6226

so                   0.6226

WARM                 0.6214

Warm                 0.6214

warm                 0.6214

AWSOME               0.6195

Awsome               0.6195

awsome               0.6195

TOo                  0.6193

TOO                  0.6193

too                  0.6193

toO                  0.6193

Too                  0.6193

ToO                  0.6193

STUFF                0.6189

Stuff                0.6189

stuff                0.6189

Cute                 0.6118

cute                 0.6118

CUTE                 0.6118

CuTe                 0.6118

coolest              0.6084

COOLEST              0.6084

Coolest              0.6084

Chill                0.6075

chill                0.6075

CHILL                0.6075

FUNNY                0.6073

FUnny                0.6073

funny                0.6073

Funny                0.6073

Lol                  0.607

lOl                  0.607

LoL                  0.607

lol                  0.607

lOL                  0.607

LOL                  0.607

LOl                  0.607

loL                  0.607

great                0.6023

Great                0.6023

greAt                0.6023

GREAT                0.6023

GReat                0.6023

Weird                0.6017

WEIRD                0.6017

weird                0.6017

SUPER                0.6002

super                0.6002

Super                0.6002

SUper                0.6002

LOOK                 0.5991

look                 0.5991

Look                 0.5991

LOOKS                0.5976

looks                0.5976

Looks                0.5976

KIND                 0.596

kind                 0.596



And the similarity scores of other words show the side of composed, nonchalant and confidence in cool is ranked low in SpaCy's large English model. While "nerd" and "geek" are regarded as opposite to cool, their relatively higher similarity scores (above 0.4) are attributed to the fact that they are talked more often together with cool than other unrelated topics. And one interesting thing is "Batman" has a higher similar score then "Superman", perhaps reflecting in common sayings Batman is cooler than Superman.


Word            Similarity Score

cool                 1.0

cold                 0.5644185

coolest              0.608426

awesome              0.7615766

amazing              0.6363099

chill                0.6075392

confidence           0.16269511

composed             0.19751108

calm                 0.44255415

nonchalant           0.24065392

Batman               0.36020163

Superman             0.31441408

nerd                 0.421054

geek                 0.4376547

uncool               0.33795375

hot                  0.5611704

ice                  0.41151524

fashion              0.37175122


And for vector representation of sentences, paragraphs and even documents, SpaCy follows the conventional method of using centroid vectors, meaning taking the mean of all tokenized words (including punctuations) of the sentence, paragraph and document. Generally speaking, sentences or paragraphs which contain more of the word "cool" or similar meaning words should get higher cosine similarity scores.


The next two paragraphs for testing are taken from an article on the meaning of cool, while the third one is taken from a news article. And the first two paragraphs do have higher similarity scores with the word 'cool' than the third one.


"It’s tough to define the exact qualities that make someone cool, since pretty much everyone has a different idea of what 'cool' is. For some, it’s a leather-coat-wearing motorcyclist on an open road. For others, it’s the lead singer of a band, an English major surrounded by books, or a really chic neighbor who always burns the best incense. These people are wildly different, and yet they can all be considered cool because they project something special — a certain je ne sais quoi — that makes them stand out."

similarity score with “cool”: 0.6362329653887256


“You know you’re in the presence of a cool person when you feel at ease. The reason? “Cool people are present, focused, and interested in those around them,” Romanoff says. They listen, they try to understand — and as a result, they help everyone feel seen and understood.”

similarity score with “cool”: 0.5910140438438523


“Her visit comes after three high-level diplomatic meetings last week ended with Russian troops still on Ukraine’s borders, but no definitive sign whether Putin would risk a military incursion or instead start talks with the US about arms control in Europe, a more limited agenda than his call for a redrawing of the security architecture of Europe.”

similarity score with “cool”: 0.48815137638105816


First Attempt: Ranking the Raw Texts by Cosine Similarity Scores with the Word "Cool"

For now it seems to be a promising approach. So a simple way of determining who is the coolest superhero is to get each one's text description tokenized and get the centroid vector, then calculate the cosine similarity with the word "cool" as a "cool score" and get them ranked. Is it that simple? Let's see how it goes.



Red Mist    Source:SHDb


So for this first trial, the Red Mist is the coolest superhero. In the description there is a direct reference stating he has "cooler appearance", but in the list of tokens in the text that have highest similarity scores with the word "cool", stop words like "but" and "some" also get moderately high similarity scores.


“The Red Mist was another teenager following the example of Kick-Ass. But his cooler appearance stole some Kick-Ass fandom. Trying to settle things right, Dave tried to talk to him and force Red Mist to give up his super hero identity but in the end they decided to team-up when a building was on fire.   When Kick-Ass was visited by Hit-Girl and Big Daddy, Red Mist was reluctant to join their team. Both friends were in they way to meet Hit-Girl and her father in their headquarters ready to make a counter offer. But what they saw was a heavenly wounded Big Daddy pleading for help at the hands of Johnny G. At that precise moment, Red Mist was exposed not only as a traitor who set the heroes up but as Johnny G's son. NA”



The second placed Hulk (Stark Gauntlet) (MCU) and third placed Batgirl (New 52) fare worse. Both descriptions are extremely short, but with some common words which have fairly high similarity scores with "cool" like "it", "everyone", "good" and "very", they are ranked high in "cool score". 


“After Tony created a new gaunlet Hulk uses it to revive everyone. NA”


“NA Barbara is very intellegent she is one of the smartest dc characters . She is also a very good fighter and has many gadgetsand weapons.”


It is not convincing, and indicates that the current approach of getting centroid vector to represent a text in it's raw state is discriminating against long descriptions. A long text may say about ten things about a superhero, while it may have some good words on coolness, but the talk of other nine things dilutes so much that the whole text gets a low similarity score. 



Second Attempt: TF-IDF Weighted Document Embedding Vectors

A better approach may be to clean the text first and suppress the weighing of words that are common across documents by means like TF-IDF. In creating TF-IDF weighted embedding vectors for documents, I adopt the codes of John Cardente. Then we use the TF-IDF weighted document embedding vectors to calculate cosine similarity scores for a second attempt of coolness ranking.


And I adapt the codes of Nathan Kelber for displaying the top 20 tokens in selected document by their TF-IDF scores, and their cosine similarity score and also their products with respect to a stated text (in this case 'cool') in embedded vector form.



In this second attempt, Red Mist is still ranked top, apparently factored significantly by the word "cooler".  Hulk (Stark Gauntlet) (MCU) and Batgirl (New 52) drop out from the top 10, but the inclusions of Kool-Aid Man, Iceman and Jack Frost reflect that terms related to temperature like "kool", "ice", "snow", "cold" come into focus, which is not the sense of "cool" we are talking about.


The Most Significant Words in description of Red Mist

Kool-Aid Man. Source:SHDb
Kool-Aid Man. Source:SHDb



“Before he was officially the Kool-Aid Man in 1975, he was the “Pitcher Man”. The Pitcher Man was created in 1954 by Marvin Plotts, an art director for a New York-based advertising agency. General Foods had just purchased Kool-Aid from the drink’s creator Edwin Perkins the year before, and Plotts was charged with drafting a concept to illustrate the copy message: “A 5-cent package makes two quarts. " Working from his Chicago home on a cold day, Potts watched as his young son traced smiley face patterns on a frosty windowpane," recounts Sue Uerling, marketing and communications director for Hastings Museum of Natural and Cultural History. This inspired Marvin Plotts to create a beaming glass pitcher filled with flavorful drink: the Pitcher Man. From there on the joyful pitcher was on all the Kool-Aid’s advertisements. the voice of the man is John Fickley. In 1975 Kraft Foods created the character’s first costume with arms and legs. He also became more of an action figure in commercials — performing extreme sports and busting through brick walls. Kool-Aid Man is famously known for shouting, “Oh, Yeah!” as he is summoned by thirsty children with the phrase, "Hey, Kool-Aid!". Commercials of the era also featured a catchy jingle, always ending with the Kool-Aid Man\'s phrase. Starting in the late 1980s, the character was given dialogue, and his mouth would be digitally manipulated to "move" while the voice actor talked. Sometime in the 1990s, the live-action character was retired; from that point until 2008, the character became entirely computer-generated (although other characters -- such as the kids -- remained live-action). In 2000, a new series of commercials were created for Kool-Aid Fierce and the actor chosen to play Kool-Aid Man was Jon Carr. The most recent Kool-Aid commercials feature a new and different live-action Kool-Aid Man playing street basketball and battling "Cola" to stay balanced on a log. NA”

The Most Significant Words in description of Kool-Aid Man



Third Attempt: Refining the "Cool” Vector

This brings us to the other cool property of word embedding vectors. When a vector space model is well trained, it can capture the semantic structure of words, so that related word pairs become parallel vectors that can perform arithmetic operations. If we use "|word|'' to denote a word vector, the famous examples are:


|King|-|man|+|woman|=|Queen|


|Paris|-|France|+|Germany|=|Berlin|


When we perform the same arithmetics on SpaCy's word embedding vectors, the closest words for the resulting vectors are in fact "King" and "Germany", but "Queen" and "Berlin" come as the close second.


The Most Similar Words for “King-man+woman”

KIng                 0.8024

King                 0.8024

king                 0.8024

KING                 0.8024

Queen                0.7881

queen                0.7881

QUEEN                0.7881

PRINCE               0.6401

prince               0.6401

Prince               0.6401

The Most Similar Words for “paris-france+germany”

Germany              0.8028

GERMANY              0.8028

germany              0.8028

BERLIN               0.7547

Berlin               0.7547

berlin               0.7547

paris                0.6961

PARIS                0.6961

Paris                0.6961

FRANKFURT            0.6708




How about subtracting "cold" from "cool"? The closest word becomes "kewl", the alternative spelling of "cool" in slang, but with cosine similarity score of only 0.4408, "cool" features even lower in 0.3774. But when we add one more vector of "cool" to it, the most similar word becomes "cool" again.


The Most Similar Words for “cool-cold” 

Kewl                 0.4408

KEWL                 0.4408

kewl                 0.4408

AWESOME              0.4206

AWEsome              0.4206

awesome              0.4206

Awesome              0.4206

AWesome              0.4206

AWSOME               0.3894

Awsome               0.3894

awsome               0.3894

coool                0.3781

COOOL                0.3781

Coool                0.3781

COol                 0.3774

CooL                 0.3774

cool                 0.3774


The Most Similar Words for “cool+cool-cold”

CooL                 0.8318

COol                 0.8318

cool                 0.8318

COOL                 0.8318

Cool                 0.8318

AWEsome              0.7133

awesome              0.7133

AWesome              0.7133

Awesome              0.7133

AWESOME              0.7133

NICE                 0.6181

nice                 0.6181

Nice                 0.6181

nICE                 0.6181

NIce                 0.6181



In marketing research about brand coolness, it is said that there are ten characteristics associated with cool:


  • Authentic
  • Inspiring
  • Creative
  • Attractive
  • Edgy
  • Rebellious
  • Surprising
  • Mysterious
  • Unique
  • Takes Risks


So when I make the formulation "|cool|+|cool|-|cold|+|authentic|+|rebellious|",the most similar words not only include "cool", "authentic", "awesome" and "rebellious", "edgy", "inspiring" and "unique" appear too. It looks hopeful that this vector captures much of the idea when we are looking for the coolest superhero. Is it the kind of formula in Besomi's mind?


The Most Similar Words for “cool+cool-cold+authentic+rebellious”

Cool                 0.7232

COOL                 0.7232

CooL                 0.7232

cool                 0.7232

COol                 0.7232

Authentic            0.6419

AUTHENTIC            0.6419

authentic            0.6419

AWesome              0.624

Awesome              0.624

awesome              0.624

AWESOME              0.624

AWEsome              0.624

QUIRKY               0.602

Quirky               0.602

quirky               0.602

Inspired             0.6

inspired             0.6

INSPIRED             0.6

Funky                0.5964

FUNKY                0.5964

funky                0.5964

Classy               0.5933

CLASSY               0.5933

classy               0.5933

EDGY                 0.5839

edgy                 0.5839

Edgy                 0.5839

amaZing              0.5742

Amazing              0.5742

AMazing              0.5742

amazing              0.5742

AMAZING              0.5742

REBELLIOUS           0.5684

Rebellious           0.5684

rebellious           0.5684

Inspiring            0.5668

inspiring            0.5668

INSPIRING            0.5668

Badass               0.5642

badass               0.5642

BADASS               0.5642

BadAss               0.5642

retro                0.5587

Retro                0.5587

RETRO                0.5587

fun                  0.5578

Fun                  0.5578

FUN                  0.5578

CHIC                 0.5566

chic                 0.5566

Chic                 0.5566

COLORFUL             0.5542

Colorful             0.5542

colorful             0.5542

Coolest              0.5515

COOLEST              0.5515

coolest              0.5515

STYLISH              0.5483

Stylish              0.5483

stylish              0.5483

cute                 0.5435

CUTE                 0.5435

Cute                 0.5435

CuTe                 0.5435

STYLE                0.542

style                0.542

Style                0.542

Trendy               0.5418

TRENDY               0.5418

trendy               0.5418

look                 0.538

Look                 0.538

LOOK                 0.538

NIce                 0.5378

Nice                 0.5378

nICE                 0.5378

nice                 0.5378

NICE                 0.5378

KIND                 0.537

kind                 0.537

Kind                 0.537

artsy                0.536

ARTSY                0.536

Artsy                0.536

fabulous             0.5353

Fabulous             0.5353

FABULOUS             0.5353

FABulous             0.5353

GROOVY               0.5341

groovy               0.5341

Groovy               0.5341

unique               0.534

Unique               0.534

UNIQUE               0.534

sassy                0.5324

Sassy                0.5324

SASSY                0.5324

Charming             0.531

CHARMING             0.531



So in the third formulation of "cool score", I calculate the cosine similarity scores between the TF-IDF weighted document vectors and the new "cool" vector. 



In this third ranking, Red Mist retreats to tenth. The new chart topper Kai from LEGO Ninjago Movie may not look too cool, but apparently the sentence "he seems to be curious or perhaps more sassy" in the description helps him get high marks. The second placed Fandral may be more aligned with conventional view of "cool", who is described as "one of the most good-looking Asgardians which along with his charm, gave him the reputation as a ladies' man". Despite the new formulation of "cool", Kool-Aid Man still ranks third.


Kai. Source:SHDb


“Kai\'s attitude is more serious, like Cole in the TV show. Despite this, he is very compassionate and approachable, as he is "always ready with a hug." Much like his own TV show counterpart, Kai is possibly impulsive, enjoys fighting enemies, and is loyal and protective of those he cares about, especially his teammates. Unlike his TV counterpart (beside the alternate face on his figurine), he seems to be curious or perhaps more sassy. He also appears to enjoy describing things with a variety of onomatopoeia. Kai wields a pair of katanas in the trailers, but he may be skillful with other weapons, though these are his favorite weapons. He and the other Ninja have Elemental Powers like their TV show counterparts, allowing him to create and manipulate fire. As seen in the trailers, Kai\'s vehicle has weapons that are fire-based like the flamethrower in his mech. Like the other Ninja, he is a master builder.”


The Most Significant Words in description of Kai

Fandral. Source:SHDb

"Fandral the Dashing was a charter member of the Warriors Three, a trio of Asgardian adventurers consisting of himself, Hogun the Grim, and Volstagg the Voluminous. Fandral was a strong and brave and a good friend to Thor. He fought in countless battles with his friends, to preserve and protect his people. He has been described as one of the most good-looking Asgardians which along with his charm, gave him the reputation as a ladies' man. Besides his looks, Fandral is also known for his skills in swordsmanship and bravery. He and Thor first met when the Warriors Three joined the Thunder God on an expedition to restore the Odinsword that had become cracked.Allegedly, Volstagg the Staggeringly Perfect led the youth Hogun the Good, Fandral the Quite Plain, Thor and Loki in Hel, fighting against all of its hordes for forty days and nights. Eventually Hogun was hurt and forced to retreat, helped by Fandral. Due to the battle, Hogun the Good became Hogun the Grim, and for some reason, Fandral the Quite Plain became Fandral the Dashing later, while Volstagg started eating every time and Thor was deemed worthy of Mjolnir. Fandral possesses all of the various superhuman attributes common among the Asgardians."

The Most Significant Words in description of Fandral


And the new ranking has some intriguing results. Apart from Kai, GPL, Lyold(in two entries), Killow and Masako from the LEGO universe get into the top 10. Does LEGO have a secret formula to make its characters look cool in descriptions? On the surface of words the cool factor is not apparent. On the other hand, Jack Kirby, the creator or co-creator of many classics like Avengers, X-Men and Fantastic Four gets into seventh with a long description.



Fourth Attempt: Compare Only the Top Similarity Scoring Terms

Despite Kirby's entry, most of the high similarity scoring texts are relatively short, reflecting that even if TD-IDF is used, it cannot overcome the dilution problem of long documents effectively. An alternative approach is to concentrate just on the top similarity scoring terms in each document of description, instead of getting the similarity score from the vector of the whole document. In the following implementation, I get the top 10 scoring terms with the new "cool" vector and then take the average in each document as a basis of comparison. To avoid the case that some descriptions may score high simply because they have repeated mentions of certain fairly high scoring terms such as "look" and "kind", I only count the unique terms. In this way, descriptions which have more different high similarity terms with the new "cool" vector would get higher scores, but long documents would have advantage as they are more likely to include various words related to "cool".




After this change of approach, some of the more well known names finally emerged, although with some surprise. According to this measurement, Hulk is the coolest superhero. In fact five versions of Hulk, each with very similar descriptions, get into the top 10. The cool related terms in descriptions of the Hulks include "awesome", "amazing", "style", "look", "unique", "fantastic". "great", "truly", "incredibly" and "love".


Hulk. Source:SHDb


The Highest Scoring Terms in Hulk's Description



After four of the Hulks, Sonic the Hedgehog and the Devilman claim the highest positions. Sonic's high scoring terms include "authentic", "amazing" and "unique", while Devilman's description refers to "keep everything cool".


Sonic the Hedgehog. Source:SHDb


The Highest Scoring Terms in Sonic the Hedgehog's Description


Devilman. Source:SHDb


The Highest Scoring Terms in Devilmans Description




Conclusion


Are Red Mist, Kai and Hulk really the coolest superheros? The question of "what is cool?" can draw many different opinions itself, perhaps more so with which superhero is the coolest. The four attempts described above include some unconventional approaches, and the answers arrived may not agree with many people, nonetheless they are judged purely on the basis of textual descriptions, with some quantifiable criteria. 


The application of word embedding vectors as a basis of comparison has an advantage over word matching that it might be better at capturing vague ideas such as "cool", but just like counting the appearance of words, it is far from perfect. The cool words in the history part of the description of superhero may be referring to others rather than the superhero himself/herself, and "look" the verb may be counted incorrectly as "look" the noun as one aspect of "cool". But perhaps it is the simpler way without using much more complex approaches, such as deivising matching rules on parts of speech and entities.


Original Dataset from Kaggle and Github

Source Code: Github and Kaggle


  How Feature Engineering Can Greatly Improved Model Predictions: The Case of Medical Insurance Cost (With Codes) Photo by  Martha Dominguez...