Dataset and categories are heavily imbalanced (#explainer).
Possible solutions (just randomly picked websites):
- https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
- https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
- https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28
# | Categories | # Cases | # cases [%] |
---|---|---|---|
49 | fruity | 892 | 20.6673 |
47 | floral | 632 | 14.6432 |
109 | woody | 588 | 13.6237 |
56 | herbal | 564 | 13.0677 |
55 | green | 556 | 12.8823 |
48 | fresh | 504 | 11.6775 |
97 | sweet | 451 | 10.4495 |
87 | resinous | 370 | 8.5728 |
95 | spicy | 302 | 6.9972 |
12 | balsamic | 270 | 6.2558 |
90 | rose | 258 | 5.9778 |
41 | earthy | 234 | 5.4217 |
43 | ethereal | 216 | 5.0046 |
29 | citrus | 213 | 4.9351 |
76 | oily | 181 | 4.1937 |
70 | mint | 172 | 3.9852 |
101 | tropicalfruit | 172 | 3.9852 |
44 | fatty | 171 | 3.9620 |
74 | nut | 171 | 3.9620 |
22 | camphor | 168 | 3.8925 |
96 | sulfuric | 157 | 3.6376 |
14 | berry | 153 | 3.5449 |
106 | waxy | 148 | 3.4291 |
72 | musk | 141 | 3.2669 |
103 | vegetable | 135 | 3.1279 |
11 | apple | 132 | 3.0584 |
19 | burnt | 130 | 3.0120 |
66 | meat | 130 | 3.0120 |
81 | phenolic | 123 | 2.8499 |
84 | powdery | 121 | 2.8035 |
23 | caramellic | 120 | 2.7804 |
26 | chemical | 115 | 2.6645 |
73 | musty | 114 | 2.6413 |
40 | dry | 111 | 2.5718 |
64 | lily | 111 | 2.5718 |
2 | aldehydic | 109 | 2.5255 |
9 | animalic | 106 | 2.4560 |
85 | pungent | 101 | 2.3401 |
102 | vanilla | 101 | 2.3401 |
63 | lemon | 97 | 2.2475 |
61 | leaf | 95 | 2.2011 |
3 | alliaceous | 94 | 2.1779 |
57 | honey | 85 | 1.9694 |
104 | violetflower | 82 | 1.8999 |
39 | dairy | 80 | 1.8536 |
54 | grass | 80 | 1.8536 |
6 | ambery | 75 | 1.7377 |
21 | cacao | 75 | 1.7377 |
59 | jasmin | 74 | 1.7146 |
94 | sour | 73 | 1.6914 |
89 | roasted | 72 | 1.6682 |
30 | clean | 71 | 1.6450 |
77 | orange | 70 | 1.6219 |
69 | metallic | 68 | 1.5755 |
46 | fermented | 65 | 1.5060 |
4 | almond | 64 | 1.4829 |
33 | coffee | 63 | 1.4597 |
37 | cooling | 63 | 1.4597 |
67 | medicinal | 60 | 1.3902 |
100 | tobacco | 59 | 1.3670 |
75 | odorless | 57 | 1.3207 |
79 | pear | 57 | 1.3207 |
65 | liquor | 55 | 1.2743 |
25 | cheese | 54 | 1.2512 |
35 | coniferous | 52 | 1.2048 |
68 | melon | 52 | 1.2048 |
36 | cooked | 51 | 1.1816 |
20 | butter | 50 | 1.1585 |
15 | blackcurrant | 49 | 1.1353 |
62 | leather | 49 | 1.1353 |
108 | wine | 49 | 1.1353 |
28 | cinnamon | 48 | 1.1121 |
13 | banana | 47 | 1.0890 |
99 | terpenic | 47 | 1.0890 |
10 | anisic | 46 | 1.0658 |
71 | mushroom | 46 | 1.0658 |
32 | coconut | 45 | 1.0426 |
53 | grapefruit | 45 | 1.0426 |
58 | hyacinth | 45 | 1.0426 |
86 | rancid | 44 | 1.0195 |
50 | geranium | 43 | 0.9963 |
80 | pepper | 42 | 0.9731 |
42 | ester | 41 | 0.9500 |
52 | grape | 41 | 0.9500 |
17 | body | 38 | 0.8804 |
51 | gourmand | 38 | 0.8804 |
93 | smoky | 38 | 0.8804 |
107 | whiteflower | 37 | 0.8573 |
60 | lactonic | 36 | 0.8341 |
83 | plum | 34 | 0.7878 |
98 | syrup | 34 | 0.7878 |
24 | cedar | 33 | 0.7646 |
27 | cherry | 33 | 0.7646 |
31 | clove | 32 | 0.7414 |
105 | watery | 32 | 0.7414 |
91 | seafood | 31 | 0.7183 |
92 | sharp | 25 | 0.5792 |
1 | alcoholic | 22 | 0.5097 |
5 | ambergris | 22 | 0.5097 |
88 | ripe | 20 | 0.4634 |
38 | cucumber | 19 | 0.4402 |
82 | plastic | 18 | 0.4171 |
18 | bread | 17 | 0.3939 |
34 | cognac | 12 | 0.2780 |
78 | overripe | 10 | 0.2317 |
7 | ambrette | 8 | 0.1854 |
8 | ammoniac | 8 | 0.1854 |
45 | fennel | 7 | 0.1622 |
16 | blueberry | 6 | 0.1390 |