The trick to this competition was to identify negative reviews with high confidence. My approach had 2 easy steps:
- Convert speech to text.
- Classify text based on sentiment.
In both steps there was no training involved, only inference using pretrained models from internet.
-
Convert speech to text.
For this I used a free model found on torchhub. It wasn’t 100% accurate, but did ok.import pandas as pd import torchaudio import torch from glob import glob device = torch.device('cpu') # gpu also works model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language='en', # also available 'de', 'es' device=device) (read_batch, split_into_batches, read_audio, prepare_model_input) = utils # see function signature for details stage = 'test' files = sorted(glob(f'{INSERT_YOUR_PATH_TO_WAV_FOLDER}/*.wav')) bsize = 10 batches = split_into_batches(files, batch_size=bsize) ids = [f.split('/')[-1].split('.')[0] for f in files] res = [] for i, batch in enumerate(batches): minput = prepare_model_input(read_batch(batch), device=device) output = model(minput) for j, example in enumerate(output): res.append([ids[i * bsize + j], decoder(example.cpu())]) df = pd.DataFrame(res) df.columns=['wav_id', 'text'] df.to_csv(f'text_{stage}.csv', index=False)
-
Compute sentiment
For this task I usedtransformers
library, which gives “POSITIVE/NEGATIVE” sentiment of a phrase and a confidence in its prediction.from transformers import pipeline
nlp = pipeline(‘sentiment-analysis’)
bulk = 50res = []
for i in range(math.ceil(len(z) / bulk)):
r = nlp(list(df.iloc[bulk * i: bulk * (i + 1)][‘text’].values))
res.extend( r)rdf = pd.DataFrame(res).rename(columns={‘label’: ‘sentiment’})
d = pd.concat([df, rdf], axis=1)
d = d.sort_values(‘wav_id’).reset_index(drop=True)
Finally, submit 0(=negative) only when the model is very confident:
d[‘label’] = 2
d.loc[(d.sentiment == ‘NEGATIVE’) & (d.score > .995)] = 0
d.to_csv(‘submission.csv’, index=False)