Ar57m

Results 26 comments of Ar57m

before I tried slicing the tensors to match the small base model size, but it seems very unstable even at low weights(like 0.01, 0.005), so in this branch breaking_math I...

> This is really interesting! I'll definitely have to play with it a bit. > > I'm a little surprised this kind of interpolation isn't more immediately harmful to the...

I did Sheared-LLaMA-2.7B-ShareGPT + SanjiWatsuki/Silicon-Maid-7B at 10% if you guys wanna test [Aryanne/sheared-silicon10p](https://huggingface.co/Aryanne/sheared-silicon10p), if I did everything right(and didn't break anything), this should be working better than before. But I'm...

here's a code that shows what bilinear/linear is doing: ``` import torch import torch.nn.functional as F torch.manual_seed(42) x = torch.tensor([[1., 3.],[5., 7.],[9.,11.],[13.,15.]]) y = torch.tensor([[2.0, 4.0],[6.0,8.0],[10.,12.]]) print("X :") print(x) print("\nBase...

> check thanks for the check, this branch seems to be degrading a lot the base model(I submitted some trials using this branch and the other weighting_interp to the open-llm-leaderboard,...

idk, but this doesn't seems very promising at the moment, don't know if there's a good way of merging models with different sizes. btw @cg123 , I've been messing with...

@david565656 hey, this pr I was trying in the breaking_math and weighting_interp branch. [here](https://huggingface.co/Aryanne/9b_plus_tie13/tree/main) I did a merge a 13b into a 9b, but probably made the model invent words,...

> Hi, I am trying to merge two models following this [post](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54). Here is my config: > > ``` > yaml_config = """ > slices: > - sources: > -...

> @cg123 I set `clone_tensors=True` in `MergeOptions` class and still got the same error > > ![image](https://private-user-images.githubusercontent.com/51706966/295429380-b7a871b3-c263-40ac-926c-3ec37ff41c8d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDQ4NTY3NjAsIm5iZiI6MTcwNDg1NjQ2MCwicGF0aCI6Ii81MTcwNjk2Ni8yOTU0MjkzODAtYjdhODcxYjMtYzI2My00MGFjLTkyNmMtM2VjMzdmZjQxYzhkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAxMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMTEwVDAzMTQyMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU5ZjFjOTNkY2IzNDM3MWUzZDZkNGY3N2M4OWVmN2ExZmY3Y2ExY2YyNTIwMmI0OTZiMjhhOWVlNDVkMWIzOGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.EuPQUB2aZnxTrWrYYn9h7Jz9EzBqDkMZGNFznWx4VTE) seems that you wrote clone_tensor it's clone_tensors

> I'm trying to merge fine tuned safetensors into a model so I can save it as a gguf format. The directory contains adapter_config.json and adapter_model.safetensors, but the error message...