unbounded
unbounded
I can confirm that this is a big speedup for me, well done! `system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 =...
There are various optimized code paths that are only enabled for certain platform and feature sets, there could be differences in the implementation of those. Could you post the initial...
Could be related to #876 which was fixed in https://github.com/ggerganov/llama.cpp/commit/684da25926e5c505f725b4f10b5485b218fa1fc7
Closing as assumed fixed by https://github.com/ggerganov/llama.cpp/commit/684da25926e5c505f725b4f10b5485b218fa1fc7 , feel free to reopen if this still happens with the latest version.
Ideally all the SIMD implementations should be updated yes, but who has a PowerPC CPU lying around? :/ I'll have a look at the AVX versions if the perplexity run...
Not going to complete full perplexity run on my laptop, but from a partial run in 7B the difference looks close to 0.1 perplexity [-7,7] 7B perplexity ``` [1]4.6779,[2]5.2229,[3]6.1112,[4]6.7492,[5]6.8303,[6]6.8051,[7]6.9926,[8]7.0888,[9]7.5382,[10]7.7807,[11]8.0549,[12]8.1052,[13]8.0297,[14]8.1147,[15]8.3789,[16]7.9522,[17]7.8131,[18]7.7723,[19]7.3747,[20]7.3474,[21]7.2479,[22]7.0641,[23]7.0223,[24]6.9325,[25]6.9334,[26]6.7526,[27]6.5554,[28]6.4452,[29]6.3563,[30]6.1837,[31]6.1520,[32]6.1675,[33]6.0995,[34]6.1298,[35]6.1547,[36]6.2009,[37]6.2067,[38]6.2267,[39]6.2677,[40]6.3210,[41]6.3275,[42]6.3638,[43]6.3210,[44]6.3821,[45]6.3843,[46]6.3575,[47]6.3840,[48]6.3462,[49]6.3515,[50]6.3104,[51]6.3064,[52]6.2917,[53]6.3383,[54]6.3245,[55]6.2972,[56]6.3340,[57]6.3592,[58]6.3860,[59]6.3992,[60]6.4499,[61]6.4402,[62]6.5061,[63]6.5451,[64]6.5606,[65]6.6086,[66]6.6234,[67]6.6408,[68]6.6593,[69]6.6866,[70]6.7170,[71]6.7412,[72]6.7761,[73]6.8410,[74]6.8493,[75]6.8636,[76]6.8816,[77]6.8944,[78]6.8812,[79]6.9099,[80]6.9026,[81]6.9229,[82]6.9298,[83]6.8718,[84]6.8542,[85]6.8452,[86]6.8244,[87]6.7583,[88]6.7281,[89]6.7069,[90]6.6896,[91]6.7199,[92]6.7152,[93]6.7150,[94]6.7155,[95]6.7449,[96]6.7425,[97]6.7352,[98]6.7279,[99]6.7115,[100]6.7110,[101]6.7352,[102]6.7278,[103]6.7526,[104]6.7592,[105]6.7582,[106]6.7751,[107]6.7754,[108]6.7887,[109]6.7824,[110]6.7768,[111]6.8007,[112]6.8201,[113]6.8232,[114]6.8200,[115]6.8297,[116]6.8227,[117]6.8250,[118]6.8542,[119]6.8771,[120]6.9143,[121]6.9306,[122]6.9561,[123]6.9960,[124]7.0153,[125]7.0066,[126]7.0473,[127]7.0840,[128]7.1115,[129]7.0932,[130]7.1037,[131]7.0967,[132]7.0878,[133]7.0753,[134]7.0858,[135]7.0830,[136]7.0689,[137]7.0605,[138]7.0437,[139]7.0311,[140]7.0274,[141]6.9981,[142]6.9940,[143]6.9673,[144]6.9462,[145]6.9366,[146]6.9226,[147]6.9285,[148]6.9296,[149]6.9252,[150]6.9220,[151]6.9242,[152]6.9162,[153]6.8968,[154]6.8879,[155]6.8939,[156]6.8901,[157]6.9075,[158]6.9098,[159]6.9158,[160]6.9202,[161]6.9321,[162]6.8997,[163]6.8858,[164]6.8579,[165]6.8241,[166]6.7931,[167]6.7527,[168]6.7188,[169]6.7050,[170]6.6919,[171]6.6609,[172]6.6415,[173]6.6218,[174]6.5891,[175]6.5668,[176]6.5542,[177]6.5325,[178]6.5070,[179]6.4887,[180]6.4793,[181]6.4552,[182]6.4356,[183]6.4206,[184]6.4206,[185]6.4126,[186]6.4149,[187]6.4207,[188]6.4168,[189]6.4354,[190]6.4378,[191]6.4586,[192]6.4742,[193]6.4921,[194]6.5046,[195]6.5262,[196]6.5439,[197]6.5656,[198]6.5822,[199]6.5856,[200]6.5896,[201]6.5873,[202]6.6088,[203]6.6157,[204]6.6185,[205]6.6296,[206]6.6370,[207]6.6320,[208]6.6411,[209]6.6458,[210]6.6518,[211]6.6635,[212]6.6714,[213]6.6830,[214]6.6895,[215]6.6930,[216]6.7079,[217]6.7273,[218]6.7412,[219]6.7434,[220]6.7394,[221]6.7340,[222]6.7300,[223]6.7180,[224]6.7119,[225]6.7067,[226]6.7286,[227]6.7388,[228]6.7445,[229]6.7522,[230]6.7469,[231]6.7636,[232]6.7500,[233]6.7310,[234]6.7152,[235]6.6997,[236]6.6917,[237]6.6807,[238]6.6849,[239]6.6676,[240]6.6567,[241]6.6611,[242]6.6643,[243]6.6616,[244]6.6494,[245]6.6457,[246]6.6330,[247]6.6199,[248]6.6123,[249]6.6098,[250]6.6141,[251]6.6061,[252]6.6012,[253]6.5908,[254]6.5869,[255]6.5741,[256]6.5540,[257]6.5418,[258]6.5335,[259]6.5311,[260]6.5236,[261]6.5188,[262]6.5125,[263]6.5074,[264]6.4896,[265]6.4889,[266]6.4874,[267]6.4805, ```...
Pushed suggestions for vectorized implementations, but besides AVX and AVX2 they have not been tested. Will need someone to verify them for each architecture.
Very interesting experiment @sw - hard to know how much the metrics correspond to perplexity, but we could test the one with lowest RMS and see if it improves perplexity....
@ikawrakow that is surprising to me, so far the RMSE has mapped fairly well to the performance of different quantization methods. Very interesting result! I think there is a good...
Updated for q4_2