Introduction
I originally had the issue that the data pipeline was freezing.
I will elaborate more details on that below. But to resolve that I tried making a minerl.data.make()
to get a DataPipeline each iteration. This quickly led to a memory error and looking at it in more detail there is a serious memory leak with the MineRL data pipeline that. Overall these two issues mean that the data object cannot be used except for gathering data once and only once. Any moderate scale of iterative gathering of data is rendered impossible.
Freezing Pipeline
The setup is some loop such as
class Data:
def __init__(self, minerl_data):
self.data = minerl_data # the minerl data object
def get_data():
data = []
for current_states, a, _, next_states, _ in self.data.sarsd_iter(num_epochs=-1):
# gather data
return data
It usually loads and returns data just fine, but after a few calls to get_data(), the pipeline will log debug that it is enqueing or loading data from file x, and then get stuck. I am loading relatively small sequences of default 32. I have left it overnight and it makes no progress so some loop in the mineRL data pipeline code is caught up thus freezing the program.
I suspect the below block may be the culprit, in DataPipeline class.
except Empty:
if map_promise.ready():
epoch += 1
break
else:
time.sleep(0.1)
Memory Leak
As I said, trying to resolve this issue I decided to make a new minerl.data.make()
DataPipeline object each iteration, so the code looks more like this:
class Data:
def get_data():
data = []
data _loader = minerl_data # the minerl data object
for current_states, a, _, next_states, _ in data_loader.sarsd_iter(num_epochs=-1):
# gather data
return data
Doing this however, I got a memory error:
File "/home", line 120, in get_data
self.data = minerl.data.make(self.env, data_dir=self.data_dir)
File "/usr/local/lib/python3.5/dist-packages/minerl/data/__init__.py", line 49, in make
minimum_size_to_dequeue)
File "/usr/local/lib/python3.5/dist-packages/minerl/data/data_pipeline.py", line 58, in __init__
self.processing_pool = multiprocessing.Pool(self.number_of_workers)
File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
context=self.get_context())
File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
self._repopulate_pool()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
self.pid = os.fork()
So I made a loop that creates and overwrites a DataPipeline variable:
for i in range(100):
data = minerl.data.make()
print(memory_use())
And found this:
2019-09-27 16:13:12 ollie-pc root[14496] INFO System memory usage: 43.3 %
...
2019-09-27 16:13:39 ollie-pc root[14496] INFO System memory usage: 63.4 %
...
2019-09-27 16:14:24 ollie-pc root[14496] INFO System memory usage: 91.3 %
...
2019-09-27 16:17:34 ollie-pc root[14496] INFO System memory usage: 99.9 %
There is clearly a memory leak in this code, I think due to the use of multiprocessing.
Closing
Let me close in saying a big thank you for organising this competition. It has pushed me to new ideas and I have learnt so much!
Please if you can help me solve these issues, I would greatly appreciate it. I have spent a long time trying to solve this through various ways my end and think we need some work on the code base so would really be grateful of some help in solving this so I can finally train the solution I have worked on!
Thanks!