Using Hugging Face Transformers library for multi-label classification

[1] Install Required Libraries:

Install the transformers library:

!pip install transformers

[2] Load Pre-trained Model and Tokenizer:

Load the pre-trained model and tokenizer suitable for multi-label classification.

For example, use a model like bert-base-uncased:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = 'bert-base-uncased'
num_labels = 6
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

Make sure to set num_labels to the number of labels needed for the multi-label classification task.

[3] Tokenize Your Input:

Tokenize the input text using the tokenizer:

text = "i like fish, i hate chicken, i love vegetables, i am scared of crabs"
tokens = tokenizer(text, return_tensors='pt')

[4] Perform Inference:

Perform inferences by passing the tokenized input to the model:

outputs = model(**tokens)

The outputs will contain logits for each class.

[5] Post-process the Output:

For multi-label classification, a threshold to the logits to determine the predicted labels will need to be applied.

Use a sigmoid activation function for this:

import torch.nn.functional as F

probabilities = F.sigmoid(outputs.logits)
threshold = 0.5  # Adjust this threshold based on your task
predicted_labels = (probabilities > threshold).int()

Predicted_labels will be a tensor with 0s and 1s indicating the predicted labels.

A complete example:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch.nn.functional as F

# Load pre-trained model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
num_labels = 6  # Replace with the number of labels in your task
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)

# Tokenize input
text = "i like fish, i hate chicken, i love vegetables, i am scared of crabs"
tokens = tokenizer(text, return_tensors='pt')

# Perform inference
outputs = model(**tokens)

# Post-process the output for multi-label classification
probabilities = F.sigmoid(outputs.logits)
threshold = 0.5
predicted_labels = (probabilities > threshold).int()

print("Predicted Labels:", predicted_labels)

Adjust the model, tokenizer, and post-processing steps based on the specific requirements of the multi-label classification task.

Colab Notebook:

https://colab.research.google.com/drive/188l1GMqRduFC6uuXF79Dd2DPTlyjRkou?usp=sharing