question on synthetic data generation to training data
After a synthetic proof is generated, how is it used as training data? For example, if a synthetic proof has N auxiliary constructions between the premise statement si and the conclusion statement sN+1, would you make multiple training data entries from this proof by taking every intermediate statement from doing one auxiliary point construction at a time? i.e. you would generate N-1 data entry from this single N-construction proof:
If each single data entry follows this format
- <si>< sN+1><auxiliary construction i>
- <si+1>< sN+1><auxiliary construction i+1>
- <si+2>< sN+1><auxiliary construction i+2>
- ...
- <sN>< sN+1><auxiliary construction N> where si+1 is the statements after doing auxiliary construction i on <si>
I believe the paper says it made 100M proofs, of which 9M have at least one auxiliary construction. It later also says the fine-tuning used 9M data. How are those proofs with multiple constructions handled? I assume the transformer model only predicts one auxiliary construction at a time. I might have misunderstood this part. Please let me know if I can clarify my questions. Thanks so much for your help.
How do you generate the random proofs in the first place? I mean concretely not conceptually. Is there a code snippet?