Wendy Mak

Results 12 issues of Wendy Mak

## Maintenance request summary the `sol.vector.mask.mask_to_poly_geojson` currently has a default `min_area` of 40. While this is useful if you are only using the pixel version or you are using a...

Type: Maintenance
Status: Review Needed

(the other repositories seem to have v2?).

Hi, When I am trying to generate the synthetic data, do I need to treat the target column differently? Or would a correctly tuned generator take care of generating the...

- Look through data available at https://data.world/data4democracy/far-right as data from the discursive project Some of the tasks we might do are: - Stem - Tokenize - Remove stop words -...

help wanted
status-in-progress

Construct word2vec model with tweets for groups of people (e.g. far right) and compare with models trained on the overall twitterverse (e.g. http://fredericgodin.com/papers/Named%20Entity%20Recognition%20for%20Twitter%20Microposts%20using%20Distributed%20Word%20Representations.pdf) Some things to try: clustering tweets with...

help wanted
status-in-progress

Would it be possible to share how the dataset is generated? (ie go from wikidata data dump files to the current ones that I can download from the repo?) Thanks!

### Feature request Add logic for colbert in the optimum engine so it returns token embeddings ### Motivation Since this is already supported for the torch engine, it will be...

### Feature request There have been discussions on having decent performance in using colbert style models as rerankers (e.g. https://www.answer.ai/posts/2024-09-16-rerankers.html), and it would be useful if the rerank endpoint can...

### Model description I have a custom SentenceTransformer model that is a custom class (And also quite nested), so on the top level the modules.json file look like ``` [...