Still big issues with the MineRL data pipeline - freezing and memory leak

olliejday · September 27, 2019, 7:40pm

Introduction

I originally had the issue that the data pipeline was freezing.

I will elaborate more details on that below. But to resolve that I tried making a minerl.data.make() to get a DataPipeline each iteration. This quickly led to a memory error and looking at it in more detail there is a serious memory leak with the MineRL data pipeline that. Overall these two issues mean that the data object cannot be used except for gathering data once and only once. Any moderate scale of iterative gathering of data is rendered impossible.

Freezing Pipeline

The setup is some loop such as

class Data:
def __init__(self, minerl_data):
    self.data = minerl_data  # the minerl data object
def get_data():
    data = []
    for current_states, a, _, next_states, _ in self.data.sarsd_iter(num_epochs=-1):
        # gather data
    return data

It usually loads and returns data just fine, but after a few calls to get_data(), the pipeline will log debug that it is enqueing or loading data from file x, and then get stuck. I am loading relatively small sequences of default 32. I have left it overnight and it makes no progress so some loop in the mineRL data pipeline code is caught up thus freezing the program.
I suspect the below block may be the culprit, in DataPipeline class.

            except Empty:
                if map_promise.ready():
                    epoch += 1
                    break
                else:
                    time.sleep(0.1)

Memory Leak

As I said, trying to resolve this issue I decided to make a new minerl.data.make() DataPipeline object each iteration, so the code looks more like this:

class Data:
def get_data():
    data = []
    data _loader = minerl_data  # the minerl data object
    for current_states, a, _, next_states, _ in data_loader.sarsd_iter(num_epochs=-1):
        # gather data
    return data

Doing this however, I got a memory error:

File "/home", line 120, in get_data
    self.data = minerl.data.make(self.env, data_dir=self.data_dir)
  File "/usr/local/lib/python3.5/dist-packages/minerl/data/__init__.py", line 49, in make
    minimum_size_to_dequeue)
  File "/usr/local/lib/python3.5/dist-packages/minerl/data/data_pipeline.py", line 58, in __init__
    self.processing_pool = multiprocessing.Pool(self.number_of_workers)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()

So I made a loop that creates and overwrites a DataPipeline variable:

for i in range(100):
    data = minerl.data.make()
    print(memory_use())

And found this:

2019-09-27 16:13:12 ollie-pc root[14496] INFO System memory usage: 43.3 %
...
2019-09-27 16:13:39 ollie-pc root[14496] INFO System memory usage: 63.4 %
...
2019-09-27 16:14:24 ollie-pc root[14496] INFO System memory usage: 91.3 %
...
2019-09-27 16:17:34 ollie-pc root[14496] INFO System memory usage: 99.9 %

There is clearly a memory leak in this code, I think due to the use of multiprocessing.

Closing

Let me close in saying a big thank you for organising this competition. It has pushed me to new ideas and I have learnt so much!
Please if you can help me solve these issues, I would greatly appreciate it. I have spent a long time trying to solve this through various ways my end and think we need some work on the code base so would really be grateful of some help in solving this so I can finally train the solution I have worked on!

Thanks!

notnanton · September 29, 2019, 2:27pm

Can’t you just load all the data you need into some buffer in RAM and then sample from that buffer? That’s how I do it

olliejday · September 30, 2019, 7:04am

Thanks for the idea, I’m looking at workarounds now, will be sure to try that out hadn’t thought of it that way.