I originally had the issue that the data pipeline was freezing.
I will elaborate more details on that below. But to resolve that I tried making a
minerl.data.make() to get a DataPipeline each iteration. This quickly led to a memory error and looking at it in more detail there is a serious memory leak with the MineRL data pipeline that. Overall these two issues mean that the data object cannot be used except for gathering data once and only once. Any moderate scale of iterative gathering of data is rendered impossible.
The setup is some loop such as
class Data: def __init__(self, minerl_data): self.data = minerl_data # the minerl data object def get_data(): data =  for current_states, a, _, next_states, _ in self.data.sarsd_iter(num_epochs=-1): # gather data return data
It usually loads and returns data just fine, but after a few calls to get_data(), the pipeline will log debug that it is enqueing or loading data from file x, and then get stuck. I am loading relatively small sequences of default 32. I have left it overnight and it makes no progress so some loop in the mineRL data pipeline code is caught up thus freezing the program.
I suspect the below block may be the culprit, in DataPipeline class.
except Empty: if map_promise.ready(): epoch += 1 break else: time.sleep(0.1)
As I said, trying to resolve this issue I decided to make a new
minerl.data.make() DataPipeline object each iteration, so the code looks more like this:
class Data: def get_data(): data =  data _loader = minerl_data # the minerl data object for current_states, a, _, next_states, _ in data_loader.sarsd_iter(num_epochs=-1): # gather data return data
Doing this however, I got a memory error:
File "/home", line 120, in get_data self.data = minerl.data.make(self.env, data_dir=self.data_dir) File "/usr/local/lib/python3.5/dist-packages/minerl/data/__init__.py", line 49, in make minimum_size_to_dequeue) File "/usr/local/lib/python3.5/dist-packages/minerl/data/data_pipeline.py", line 58, in __init__ self.processing_pool = multiprocessing.Pool(self.number_of_workers) File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool context=self.get_context()) File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__ self._repopulate_pool() File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool w.start() File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen return Popen(process_obj) File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch self.pid = os.fork()
So I made a loop that creates and overwrites a DataPipeline variable:
for i in range(100): data = minerl.data.make() print(memory_use())
And found this:
2019-09-27 16:13:12 ollie-pc root INFO System memory usage: 43.3 % ... 2019-09-27 16:13:39 ollie-pc root INFO System memory usage: 63.4 % ... 2019-09-27 16:14:24 ollie-pc root INFO System memory usage: 91.3 % ... 2019-09-27 16:17:34 ollie-pc root INFO System memory usage: 99.9 %
There is clearly a memory leak in this code, I think due to the use of multiprocessing.
Let me close in saying a big thank you for organising this competition. It has pushed me to new ideas and I have learnt so much!
Please if you can help me solve these issues, I would greatly appreciate it. I have spent a long time trying to solve this through various ways my end and think we need some work on the code base so would really be grateful of some help in solving this so I can finally train the solution I have worked on!