./generate.ipynb

How do I create a folder corresponding to the variable "read_path" in the "a" file?

read_path = "extracted_images"

At run time, the following error occurs.

FileNotFoundError Traceback (most recent call last) in () 2 list_digits = [] 3 for i in range(10): ----> 4 list_digits.append(listdir(read_path+"/"+str(i))) 5 list_plus = listdir(read_path+"/+") 6 list_minus = listdir(read_path+"/-")

FileNotFoundError: [Errno 2] No such file or directory: 'extracted_images/0'

Feb 18 '19 12:02 ktj4820

There handwritten digits or single symbols would be stored to generate new sequences. Unfortunately they got lost somehow during a crash on my notebook and were too big to upload to this repository. It's possible to create them using this dataset: https://www.kaggle.com/xainano/handwrittenmathsymbols

and a folder structure like extracted_images/0/bla.jpg similar to the normalized folder but before normalization. Hope that helps.

Feb 19 '19 11:02 Wikunia

Can you explain a bit more about use of extracted images(what are those images)? what is the use of normalized image and extracted images in context of generate.ipynb? A flow or anything could be helpful.

May 18 '19 20:05 QuantumPhysicsgit

Hi and welcome to GitHub, sorry that the documentation of this code is quite bad and I don't have much time to fix that :/ I got images from https://www.kaggle.com/xainano/handwrittenmathsymbols and combined them to a formula of two types: Then used bounding boxes to get back the symbols and normalized them in a similar fashion to described in my blog: https://opensourc.es/blog/tensorflow-mnist Used these normalized images for training a simple CNN which I then used in combination with the position of the bounding box to train a seq2seq model.

Generating formulas definitely have to be improved with much more variety as even numbers without any formulas aren't converted correctly to LaTeX

May 19 '19 08:05 Wikunia

Thank you so much for explaining, I could understand and use the generate scripts properly.But, In the seq2seq noteboook I couldnt understand these snippets properly. Can you explain whats the use of these: 1.

xo = np.load("iseq_n.npy")
yo = np.load("oseq_n.npy")

x_seq_length = len(x[0])
y_seq_length = len(y[0])- 1
print(x_seq_length, y_seq_length)

May 21 '19 19:05 QuantumPhysicsgit

I'm not entirely sure but I think it's loading the input and output sequences which were generated before here: https://github.com/Wikunia/HE2LaTeX/blob/f77e54c4e17cef762704ecd30f95e6ecfbf3e597/generate_sequences.ipynb (last lines)

and then I use a sequence of length 30 I think so there are at most 30 symbols in a formula which is also a downside of the way I used. In the input sequence we have several components saving the position of the symbol absolute and relative as well as the probabilities for each label. I'm not sure about the -1 currently maybe it was about padding. Hope that helps ;)

May 21 '19 19:05 Wikunia

How do I create an "extracted_images" folder?

How do I create a folder corresponding to the variable "read_path" in the "a" file?

read_path = "extracted_images"