Dataset and categories are heavily imbalanced (#explainer).
Possible solutions (just randomly picked websites):
- https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
- https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
- https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28
| # | Categories | # Cases | # cases [%] |
|---|---|---|---|
| 49 | fruity | 892 | 20.6673 |
| 47 | floral | 632 | 14.6432 |
| 109 | woody | 588 | 13.6237 |
| 56 | herbal | 564 | 13.0677 |
| 55 | green | 556 | 12.8823 |
| 48 | fresh | 504 | 11.6775 |
| 97 | sweet | 451 | 10.4495 |
| 87 | resinous | 370 | 8.5728 |
| 95 | spicy | 302 | 6.9972 |
| 12 | balsamic | 270 | 6.2558 |
| 90 | rose | 258 | 5.9778 |
| 41 | earthy | 234 | 5.4217 |
| 43 | ethereal | 216 | 5.0046 |
| 29 | citrus | 213 | 4.9351 |
| 76 | oily | 181 | 4.1937 |
| 70 | mint | 172 | 3.9852 |
| 101 | tropicalfruit | 172 | 3.9852 |
| 44 | fatty | 171 | 3.9620 |
| 74 | nut | 171 | 3.9620 |
| 22 | camphor | 168 | 3.8925 |
| 96 | sulfuric | 157 | 3.6376 |
| 14 | berry | 153 | 3.5449 |
| 106 | waxy | 148 | 3.4291 |
| 72 | musk | 141 | 3.2669 |
| 103 | vegetable | 135 | 3.1279 |
| 11 | apple | 132 | 3.0584 |
| 19 | burnt | 130 | 3.0120 |
| 66 | meat | 130 | 3.0120 |
| 81 | phenolic | 123 | 2.8499 |
| 84 | powdery | 121 | 2.8035 |
| 23 | caramellic | 120 | 2.7804 |
| 26 | chemical | 115 | 2.6645 |
| 73 | musty | 114 | 2.6413 |
| 40 | dry | 111 | 2.5718 |
| 64 | lily | 111 | 2.5718 |
| 2 | aldehydic | 109 | 2.5255 |
| 9 | animalic | 106 | 2.4560 |
| 85 | pungent | 101 | 2.3401 |
| 102 | vanilla | 101 | 2.3401 |
| 63 | lemon | 97 | 2.2475 |
| 61 | leaf | 95 | 2.2011 |
| 3 | alliaceous | 94 | 2.1779 |
| 57 | honey | 85 | 1.9694 |
| 104 | violetflower | 82 | 1.8999 |
| 39 | dairy | 80 | 1.8536 |
| 54 | grass | 80 | 1.8536 |
| 6 | ambery | 75 | 1.7377 |
| 21 | cacao | 75 | 1.7377 |
| 59 | jasmin | 74 | 1.7146 |
| 94 | sour | 73 | 1.6914 |
| 89 | roasted | 72 | 1.6682 |
| 30 | clean | 71 | 1.6450 |
| 77 | orange | 70 | 1.6219 |
| 69 | metallic | 68 | 1.5755 |
| 46 | fermented | 65 | 1.5060 |
| 4 | almond | 64 | 1.4829 |
| 33 | coffee | 63 | 1.4597 |
| 37 | cooling | 63 | 1.4597 |
| 67 | medicinal | 60 | 1.3902 |
| 100 | tobacco | 59 | 1.3670 |
| 75 | odorless | 57 | 1.3207 |
| 79 | pear | 57 | 1.3207 |
| 65 | liquor | 55 | 1.2743 |
| 25 | cheese | 54 | 1.2512 |
| 35 | coniferous | 52 | 1.2048 |
| 68 | melon | 52 | 1.2048 |
| 36 | cooked | 51 | 1.1816 |
| 20 | butter | 50 | 1.1585 |
| 15 | blackcurrant | 49 | 1.1353 |
| 62 | leather | 49 | 1.1353 |
| 108 | wine | 49 | 1.1353 |
| 28 | cinnamon | 48 | 1.1121 |
| 13 | banana | 47 | 1.0890 |
| 99 | terpenic | 47 | 1.0890 |
| 10 | anisic | 46 | 1.0658 |
| 71 | mushroom | 46 | 1.0658 |
| 32 | coconut | 45 | 1.0426 |
| 53 | grapefruit | 45 | 1.0426 |
| 58 | hyacinth | 45 | 1.0426 |
| 86 | rancid | 44 | 1.0195 |
| 50 | geranium | 43 | 0.9963 |
| 80 | pepper | 42 | 0.9731 |
| 42 | ester | 41 | 0.9500 |
| 52 | grape | 41 | 0.9500 |
| 17 | body | 38 | 0.8804 |
| 51 | gourmand | 38 | 0.8804 |
| 93 | smoky | 38 | 0.8804 |
| 107 | whiteflower | 37 | 0.8573 |
| 60 | lactonic | 36 | 0.8341 |
| 83 | plum | 34 | 0.7878 |
| 98 | syrup | 34 | 0.7878 |
| 24 | cedar | 33 | 0.7646 |
| 27 | cherry | 33 | 0.7646 |
| 31 | clove | 32 | 0.7414 |
| 105 | watery | 32 | 0.7414 |
| 91 | seafood | 31 | 0.7183 |
| 92 | sharp | 25 | 0.5792 |
| 1 | alcoholic | 22 | 0.5097 |
| 5 | ambergris | 22 | 0.5097 |
| 88 | ripe | 20 | 0.4634 |
| 38 | cucumber | 19 | 0.4402 |
| 82 | plastic | 18 | 0.4171 |
| 18 | bread | 17 | 0.3939 |
| 34 | cognac | 12 | 0.2780 |
| 78 | overripe | 10 | 0.2317 |
| 7 | ambrette | 8 | 0.1854 |
| 8 | ammoniac | 8 | 0.1854 |
| 45 | fennel | 7 | 0.1622 |
| 16 | blueberry | 6 | 0.1390 |