DeepSpell Memory Issue

Hi, Thank you for sharing your code publicly, but I'm having some memory issues when running it on AWS.

I'm spinning a g2.2xlarge instance on AWS and try to run your code for only the first 1000 lines of news.2011.en.shuffled.

Have you ever got an error message like this one (see below)? And if so, is there a way to change the parameters to avoid or maybe should I select another type of AWS instance?

Just for completeness these are the parameters I was trying to test

NUMBER_OF_ITERATIONS = 20000
EPOCHS_PER_ITERATION = 5
RNN = recurrent.LSTM
INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
BATCH_SIZE = 500
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal" # : Gaussian initialization scaled by fan_in (He et al., 2014)
MAX_INPUT_LEN = 40
MIN_INPUT_LEN = 3
INVERTED = True
AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN
NUMBER_OF_CHARS = 100 # 75

And this is the error that I'm getting

Iteration 1
Train on 3376 samples, validate on 376 samples
Epoch 1/5
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: GpuElemwise node_m71c627ae87c918771aac75471af66509_0 Add: out of memory.
    n_blocks=30 threads_per_block=256
   Call: kernel_Add_node_m71c627ae87c918771aac75471af66509_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], o0_data, local_ostr[0][0], local_ostr[0][1])


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in main_news
  File "<stdin>", line 8, in iterate_training
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/models.py", line 672, in fit
    initial_epoch=initial_epoch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 1196, in fit
    initial_epoch=initial_epoch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 891, in _fit_loop
    outs = f(ins_batch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 959, in __call__
    return self.function(*inputs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: GpuElemwise node_m71c627ae87c918771aac75471af66509_0 Add: out of memory.
    n_blocks=30 threads_per_block=256
   Call: kernel_Add_node_m71c627ae87c918771aac75471af66509_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], o0_data, local_ostr[0][0], local_ostr[0][1])

Apply node that caused the error: GpuElemwise{add,no_inplace}(GpuDot22.0, GpuDimShuffle{x,0}.0)
Toposort index: 207
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, row)]
Inputs shapes: [(20000, 700), (1, 700)]
Inputs strides: [(700, 1), (0, 1)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuReshape{3}(GpuElemwise{add,no_inplace}.0, MakeVector{dtype='int64'}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Feb 21 '17 15:02 MatteoTomassetti

@MatteoTomassetti This is an out of memory issue I believe reducing the batch size to 25-50 should solve it.

Feb 27 '17 23:02 parth126

I changed the code to do iterative training with a generator. It now runs nicely on my laptop without needing to limit the amount of data! You might still need to limit the batch size to fit your specific memory availability.

On Feb 28, 2017 1:53 AM, "Parth Mehta" [email protected] wrote:

This is an out of memory issue I believe reducing the batch size to 25-50 should solve it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MajorTal/DeepSpell/issues/10#issuecomment-282896341, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9lwTvjoq-ooLNhyzh1QRNh1XAuhqiCks5rg2HrgaJpZM4MHfwQ .

Feb 28 '17 04:02 MajorTal

thanks @parth126 and @MajorTal! I was wondering, based on your experience, what's the average running time I should expect for one epoch to train the entire news.2011.en.shuffled dataset? My problem is that I ran the code for just one epoch and I extrapolated the time it should take to reach 20,000 iterations I'm left with years of training!

Feb 28 '17 09:02 MatteoTomassetti

I just moved to the news.2013.en.shuffled (much larger) - I'll update the code to reflect that. It is so large that I split the epochs to mini-epochs that cover about 1% of the data (because I save the model after each epoch and because it is taking so long...). These mini-epochs are configured to run about 30 minutes. After about 2 hours you already see meaningful results (about 85% accuracy). I used this AMI to train the system: https://aws.amazon.com/marketplace/pp/B06VSPXKDX On an AWS EC2 p2.xlarge instance (currently at $0.9 per Hour)

Feb 28 '17 12:02 MajorTal

@MatteoTomassetti @MajorTal I was running this exact code with the default news.2013.en.shuffled dataset (changed almost nothing except to update Keras API calling to newer version and adapt the code to be python 3 compatible). After almost 2 days of training (on reasonable speed, was using Azure with K80) the accuracy is stuck at about 47-48%. I also noticed that while it had been able to fix many spelling mistakes, it always repeat the last character or just add trailing periods to the prediction and therefore marked as wrong. Do you have any idea what could be happening? I have been looking around and could not find good answer.

Nov 16 '17 15:11 FMFluke

If I recall correctly, the trailing periods are how I used to signal the end of the sequence and they should be stripped off. Also - I don't remember if the hyperparameters are optimized in any way in the latest code. I changed the data significantly so that I can open source the code from the version we used internally.

Thanks,

Tal Weiss +31-6-1165-8778 Skype: major.tal https://twitter.com/majortal

On Thu, Nov 16, 2017 at 4:20 PM, FMFluke [email protected] wrote:

@MatteoTomassetti https://github.com/matteotomassetti @MajorTal https://github.com/majortal I was running this exact code with the default news.2013.en.shuffled dataset (changed almost nothing except to update Keras API calling to newer version and adapt the code to be python 3 compatible). After almost 2 days of training (on reasonable speed, was using Azure with K80) the accuracy is stuck at about 47-48%. I also noticed that while it had been able to fix many spelling mistakes, it always repeat the last character or just add trailing periods to the prediction and therefore marked as wrong. Do you have any idea what could be happening? I have been looking around and could not find good answer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MajorTal/DeepSpell/issues/10#issuecomment-344955007, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9lwXa8tiAXnktUMMzpOe21hYhFTsAQks5s3FK4gaJpZM4MHfwQ .

Nov 17 '17 15:11 MajorTal

Ok, but then how did you make the model know to exclude those periods when calculating the accuracy? How exactly did you strip them off?

Nov 18 '17 03:11 FMFluke