oleotiger
oleotiger
Here is one situation that I wanna to collect metrics with pcm tools when I run a benchmark. And running the benchmark is really time consuming. If pcm tools can...
According to the wiki [PCM Column Names Decoder Ring](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-pcm-column-names-decoder-ring.html), `TIME(ticks)` here means `Number of invariant clockticks`. As I set time interval as 1s, there should be 3G ticks during per...
So the value of TIME(ticks) should be multiplied by 1 million? Or the domain should be changed to TIME(1e6 ticks)?
Should the value of `INST` be multiplied by 1million as well? In my experiment, the number of instructions is far blow my desired value.
Exporting conf of parent node: ``` [json:timescaledb_instance] enabled = yes destination = localhost:14866 remote write URL path = /write data source = as collected prefix = netdata update every =...
I reproced it. Error log of netdata: ``` 2021-01-27 10:30:15: netdata ERROR : MAIN : EXPORTING: failed to write data to 'localhost:14866'. Willing to write 3289916 bytes, wrote 2154360 bytes....
8张卡成本很高了,我们手里只有两张A100。 我们通过以下方式测试机器性能,减少Transformer的层数,tp改为2,保证模型size能放到两张A100中运行的。 1、 但是模型不能支持cuda unified memory吗?哪怕性能下降,也至少保证正确运行? 2、intel新出的spr,epr处理器都支持AMX,ArmV9也支持SVE、SME,都可以做高效矩阵乘,有些超算CPU也挂有HBM,CPU在大模型推理上开始有一定的优势,这里能不能出个CPU版本?