Wendy Mak
Wendy Mak
## Maintenance request summary the `sol.vector.mask.mask_to_poly_geojson` currently has a default `min_area` of 40. While this is useful if you are only using the pixel version or you are using a...
(the other repositories seem to have v2?).
Hi, When I am trying to generate the synthetic data, do I need to treat the target column differently? Or would a correctly tuned generator take care of generating the...
- Look through data available at https://data.world/data4democracy/far-right as data from the discursive project Some of the tasks we might do are: - Stem - Tokenize - Remove stop words -...
Construct word2vec model with tweets for groups of people (e.g. far right) and compare with models trained on the overall twitterverse (e.g. http://fredericgodin.com/papers/Named%20Entity%20Recognition%20for%20Twitter%20Microposts%20using%20Distributed%20Word%20Representations.pdf) Some things to try: clustering tweets with...
Would it be possible to share how the dataset is generated? (ie go from wikidata data dump files to the current ones that I can download from the repo?) Thanks!
### Feature request Add logic for colbert in the optimum engine so it returns token embeddings ### Motivation Since this is already supported for the torch engine, it will be...
### Feature request There have been discussions on having decent performance in using colbert style models as rerankers (e.g. https://www.answer.ai/posts/2024-09-16-rerankers.html), and it would be useful if the rerank endpoint can...
### Model description I have a custom SentenceTransformer model that is a custom class (And also quite nested), so on the top level the modules.json file look like ``` [...