How do I create an "extracted_images" folder?
./generate.ipynb
How do I create a folder corresponding to the variable "read_path" in the "a" file?
read_path = "extracted_images"
At run time, the following error occurs.
FileNotFoundError Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: 'extracted_images/0'
There handwritten digits or single symbols would be stored to generate new sequences. Unfortunately they got lost somehow during a crash on my notebook and were too big to upload to this repository. It's possible to create them using this dataset: https://www.kaggle.com/xainano/handwrittenmathsymbols
and a folder structure like extracted_images/0/bla.jpg similar to the normalized folder but before normalization. Hope that helps.
Can you explain a bit more about use of extracted images(what are those images)? what is the use of normalized image and extracted images in context of generate.ipynb? A flow or anything could be helpful.
Hi and welcome to GitHub, sorry that the documentation of this code is quite bad and I don't have much time to fix that :/ I got images from https://www.kaggle.com/xainano/handwrittenmathsymbols and combined them to a formula of two types: Then used bounding boxes to get back the symbols and normalized them in a similar fashion to described in my blog: https://opensourc.es/blog/tensorflow-mnist Used these normalized images for training a simple CNN which I then used in combination with the position of the bounding box to train a seq2seq model.
Generating formulas definitely have to be improved with much more variety as even numbers without any formulas aren't converted correctly to LaTeX
Thank you so much for explaining, I could understand and use the generate scripts properly.But, In the seq2seq noteboook I couldnt understand these snippets properly. Can you explain whats the use of these: 1.
xo = np.load("iseq_n.npy")
yo = np.load("oseq_n.npy")
x_seq_length = len(x[0])
y_seq_length = len(y[0])- 1
print(x_seq_length, y_seq_length)
I'm not entirely sure but I think it's loading the input and output sequences which were generated before here: https://github.com/Wikunia/HE2LaTeX/blob/f77e54c4e17cef762704ecd30f95e6ecfbf3e597/generate_sequences.ipynb (last lines)
and then I use a sequence of length 30 I think so there are at most 30 symbols in a formula which is also a downside of the way I used.
In the input sequence we have several components saving the position of the symbol absolute and relative as well as the probabilities for each label.
I'm not sure about the -1 currently maybe it was about padding.
Hope that helps ;)