Taking it Further – Analogies, Relationships and Other Discoveries – Deep Learning & Job Descriptions (Pt 4 of 5)

Analogies and Relationships

Something different now. Researchers have already shown that a simple vector offset approach based on the cosine distance can be very effective for finding analogies using word embeddings (Mikolov, Yih et al.). There is the well-known example: cos(w(“king”) – w(“man”) + w(“woman”), w(“queen”)).

Although not an objective for us, we wanted to find out whether something similar holds here. If you train a model with 50 million parameters, you should at least have some fun with it! We took the trained word and job title embeddings and these were some of the first analogies we observed:

Example Predicted closest neighbour
“java developer” – “java” + “c++” “c++ developer”
“java developer” – “python” + “c++” “java developer”
“junior project manager” – “junior” + “senior” “senior project manager”
“junior merchandiser” – “junior” + “senior” “senior merchandiser”
“marketing assistant” – “junior” + “senior” “senior marketing executive”

Those results made sense: if you take away a Java developer’s Java skills and inject some C++, (s)he becomes a C++ developer. If you take away his/her Python skills, (s)he is probably still a Java developer. Also transitions between seniority levels seem to be captured in the embeddings. Then we started looking for more interesting relations:

Example Predicted closest neighbour
“nurse” – “nursing” + “programming” “software engineer”
“nanny” – “baby” + “database” “database administrator”
“forklift driver” – “forklift” + “hadoop” “big data architect”
“mobile phone technician” – “samsung” + “ford” “met technician”
“vehicle sales executive” – “ford” + “insurance” “insurance sales executive”
“marketing manager” –  “marketing assistant” + “junior software developer” “software development manager”

Note to readers who never got into a car accident: mechanical and electrical trim (MET) technicians identify damaged mechanical and electrical components on vehicles.

A nurse who stops nursing and starts programming, that’s how you become a software engineer. A nanny taking care of a baby is like a database administrator babysitting a database. Taking the Samsung out of a mobile phone technician and adding a touch of Ford results in a mechanical and electrical trim technician.

Let’s see what we get when we start moving around NAB (a large bank in Australia) employees:

Example Predicted closest neighbour
“bank staff” – “nab” + “hospital” “staff nurse”
“bank staff” – “nab” + “coles” “grocery clerk”
“bank staff” – “nab” + “woolworths” “supermarket retail assistant”

Note to readers less familiar with Australia: Woolworths and Coles are Australian supermarket chains.

While we did not start this project with these kind of analogies in mind, it was interesting to see that these relations hold to a certain extent.

What are the Feature Detectors Looking for?

The filters in the convolutional layer of the model correspond to feature detectors that are trained to be active when a particular pattern is observed. This can be an interesting insight to trends in job descriptions that can be used to discover a job title.

Since the convolutional layer in our model operates on word embeddings, we can easily interpret what these feature detectors are looking for. One straightforward way of doing this is by visualizing the specific input sequences that lead to maximal activation. The following tables show the top 5 input patterns from all job descriptions of the test data set for several filters.

No surprise, some of the feature detectors are looking for job titles. After all, that’s what we optimized for:

Teacher Hospitality staff
senior lecturer lecturer senior lecturer head barista sous chef foh
teacher teaching citizenship humanities ideal hospitality assistant manager will
teacher secondary teacher college lecturer title hospitality assistant apprentice employer
description sen teaching assistant sen page hotel assistant manager m
senior lecturer clinical psychology lecturer head waiter chefs de rang

Other feature detectors are looking for particular duties:

Cleaning Office administration
ironing cleaning taking children to bookings provide typing filing photocopying
toilets cleaning oven rubbish bins filing general typing answering phones
janitor janitorial worker maintenance janitor greeting clients typing filing sending
toilets cleaning guest bedrooms landings earner with typing handling calls
ironing cleaning bathrooms cleaning toilets typing copy typing answering switchboard

Yet other feature detectors focus on leadership:

Leader Supervisor
have people leadership experience senior oversee and supervise welders boilermakers
mentoring team lead experience senior supervise homework supervise bathtime
supervise and lead the floor motivate and supervise staff valid
have proven leadership of teams manage and supervise junior fee
supervise and lead the team motivate and supervise team members

Type of employment and salary:

Casual Salary
casual bar porter casual role hour pro rata weekend hourly
casual casual operator needed asap hour pro rata various hours
casual bar attendant write your hour pro rata to term
casual casual cleaner required good hour pro rata term part
casual bar attendant worktype casual hourly rate overtime available saturdays

Location and language skills:

Australia German
sydney cbd nsw sydney cbd fluent in german consultative professional
sydney melbourne brisbane perth canberra a motivated german specialist teacher
sydney australia australia sydney nsw specified location german resourcer entry
sydney adelaide brisbane chatswood melbourne translation skills german :english .
sydney adelaide brisbane chatswood melbourne job description german resourcer entry

Interestingly, there was also a feature detector for the desired appearance of candidates. Having good manners, a certain standard of personal hygiene and an English accent seem to go hand in hand according to this feature detector:

Appearance
caucasian languages : english hair
caucasian languages : english accents
disposition smart clean appearance friendly
groomed and presentable appearance caring
presentable with a polite courteous

As is clear now and was hinted before: the obtained filters not only focus on the job title, but cover most of the relevant sections that one expects to find in a job description. While CNNs are of course known for representation and feature learning, we still found it quite remarkable what it came up with after learning to predict job titles from raw, uncleaned job descriptions with almost no preprocessing.

There are lots of things one can do with the learnt features. My personal favourite and on top of my todo list: relating the Appearance feature back to the actual vacancies/companies to find out which one’s employees have likely the most well-maintained facial hair.

Extracting Keyphrases From a Job Description

An area that is well worth some investigation is trying to understand the predictions that the CNN makes. How can we make the steps that are taken to come up with a predicted job title more interpretable?

Let’s have another look at the pieces of input text that tend to maximally activate the feature detectors in the CNN. While this approach ignores the transfer functions and fully connected layer in the network, it is a helpful way to understand what is going on. The figure below highlights the parts of a job description that correspond to the 50 text windows with the largest activation over all filters in the network. By assigning a simple colour code (in increasing order of activation: grey, yellow, orange, red) to each word, it becomes clear that these text windows correspond to the keyphrases in this network support officer job description that was taken from the test data set:

 

Note that the first 20 words of the vacancy were not used since they usually contain the job title (it is usually the title of the vacancy), which would be too easy. We want the CNN to start looking in the job description for the parts that it thinks are important to derive the job title.

When we look at the network support officer example we can see that there is one remaining reference to the actual job title. That part of the job description is causing the largest activation. Then a number of expected skills are also clearly detected: installing network/telecommunication equipment, WAN/LAN/Wifi networking technologies, Cisco IP telephony, cabling infrastructure design/deployment and network support. Also locations, qualifications and type of disclosure are detected.

Closest Job Description Neighbours

The CNN part of the model is basically a document to vector approach that generates an embedding for a variable-length job description. Predicting a job title like we did in the first results sections comes down to comparing the job description embedding with the job title embeddings and looking for the closest neighbour.

In addition, in the previous sections we have seen that the CNN does more than simply focussing on the job title. It considers other types of information in the job description that are relevant to predict the job title. Hence, there is nothing stopping us from looking for similar job descriptions by comparing their embeddings.

We took a job description for a tandoori chef from the test data set and extracted the keyphrases based on the approach in the previous section for ease of interpretation:

 

Then we compared its embedding with the ones from all vacancies in the test data set. The following are the top 3 vacancies (with keyphrases) which are closest in the embedding space.

Neighbour 1:

 

Neighbour 2:

 

Neighbour 3:

 

No surprises: all three employers are seeking for a tandoori chef. Also not very surprising: looking at which neurons show high activations reveals that these 4 vacancies basically share the same active neurons. For example, neuron 725 gets activated for both “culinary skills, knowledge about food” and “restaurant in Prahan. Friendly” and neuron 444 for “all marinades and Indian dishes” and “a Tandoori chef to work”. So the model knows that these are semantically related.

This is another example but for a data scientist job description:

 

with the following closest vacancies in the embedding space, neighbour 1:

Neighbour 2:

Neighbour 3:

These closest neighbours are all data scientist vacancies that require skills as Hadoop, SQL and R. Looking at the activations, we can see similar patterns as in the previous example. For example, neuron 504 gets activated for both “Hadoop (Hive/Impala/Pig) PostgreSQL” and “AWS, Agile My client are” so it looks like the model knows that AWS and Hadoop are somehow related. Also interesting to observe was that the target vacancy and its closest neighbour 1 share a couple of sentences: “The role largely involves working as a senior member of the analytics delivery team on a day to day basis”, “Coaching other team members in advanced analytical methods” and “Building analytical process(es) and code to meet client requirements”.

Based on some experiments we did, it looks like this could be an interesting approach to find similar or related vacancies. Given that the model does not only look at the job title and knows which things are related (e.g. “data science” and “statistics”), this adds value compared to a simple approach that only uses the actual words of a job title or job description. While this is relevant on its own, lots of other applications could benefit from it. To name just one: a job recommendation engine could use it as an additional feature.

Continue Reading

This article is part of a series on Using Deep Learning to Extract Knowledge from Job Descriptions. For more information, head to Using Deep Learning to Extract Knowledge from Job Descriptions.

Jan Luts is a senior data scientist at Search Party. He received a Master of Information Sciences from Universiteit Hasselt, Belgium, in 2003. He also received Master degrees in Bioinformatics and Statistics from Katholieke Universiteit Leuven, Belgium, in 2004 and 2005, respectively. After obtaining his PhD at the Department of Electrical Engineering (ESAT) of Katholieke Universiteit Leuven in 2010, he worked in postdoctoral research for a further two years at the institution. In 2012 Jan moved to Australia where he worked as a postdoctoral researcher in Statistics at the School of Mathematical Sciences in the University of Technology, Sydney. In 2013 he moved into the private sector as Data Scientist at Search Party.

Leave a Reply