ValeKnappich issues

Results 8 issues of


                                            ValeKnappich

How to get section hierarchy from fulltext?

Hi, I am using grobid to extract the pdf full text (`/processFulltextDocument`). It works great except that all sections are put on the same level and there doesn't seem to...

duplicate

BigQuery dataset

Hi, first of all, great work! Is there any chance you could provide more details on the BigQuery dataset / subset? Perhaps a list of the repositories used? It would...

generator-style `run_batch`

`run_batch` is great for performance but you also give up control during the iteration. For instance, you might want to save the results to disk as soon as they come...

enhancement

Commit Hashes of scraped repos

@ASvyatkovskiy @micheletufano First of all, thanks for this great resource. Unfortunately, the dataset does not contain commit hashes of the projects or dates when they were scraped. To calculate the...

LTeX server: "SEVERE: Could not send the HTTP request to the LanguageTool server."

**Describe the bug** Couldn't get LTeX to work on my machine. Getting the java errors in the server shown below. Client seems to simply get zero complaints, thus nothing happens...

1-bug 🐛

2-unconfirmed

`model_override_args` with server

When using a server, one currently cannot use the `model_overide_args` which could be very useful, e.g. for rope scaling. This is currently the `sglang.launch_server.py`: ```py import argparse from sglang.srt.server import...

good first issue

HumanEval loaded on import

The current deconamination implementation loads the humaneval from disk upon import: https://github.com/huggingface/alignment-handbook/blob/a9b8a50/src/alignment/decontaminate.py#L53 ```py def human_eval_docstrings() -> List[str]: ds = load_dataset("openai_humaneval", split="test") docstrings = [extract_docstring(v["prompt"]) for v in ds] return docstrings...

[Feature] GEPA optimize field descriptions

### What feature would you like to see? As far as I can tell, GEPA currently only optimizes the `signature.instructions`. It might also be helpful to let GEPA find suitable...

enhancement