Influence of the binning
Dear Daniel,
Thank you for implementing the WHAM; I found it very convenient. However, I have an issue with the influence of binning on the shape of the PMF. I've attached the figures with PMFs obtained for the same dataset with the Grossfield and your WHAM programs with different numbers of bins. In the Grossfield implementation, with several bins bigger than 100, PMF becomes relatively smooth and remains the same with increased binning. The two other figures show PMFs with your WHAM (SDs are not shown; these are the free energy values themselves). Relatively smooth PMF is obtained with the smallest number of bins=30, while with an increase in the number of bins, the PMF becomes more and more wavy. What do you think could be done about this?
Thank you, Sofya
Hi Sofya,
It's been a while since I developed this library, but as far as I remember, most of the code is very similar to Grossfield's. There shouldn't be a significant difference in the results if used on the same dataset.
I wonder if the wavy lines might be a result from insufficient samples in the individual bins. If you increase the number of bins, the bin width gets smaller, and therefore less samples are in each bin. If the count per bin approaches 0, the algorithm becomes unstable. What leads me to this suspicion is that the global minimum (where we can expect a higher sample count) seems less wavy overall. Could you check the number of samples in the individual bins?
Also, from the plots its not entirely clear to me what I am looking at:
- Am I correct that the "b100" stands for 100 bins, "b500" for 500 bins etc? Or are these different data sets and the different bin count is only between the plots? Can you please share more details on that and ideally also list all settings you used for WHAM?
- Which of the three plots are from grossfield and which are made with this tool? I can spot wavy lines in all three of them for at least one "b" number. The bottom one seems to be the least wavy, except for b50 which again has ups and downs.
- Any Idea why b60 in the middle-plot looks so different?
Cheers, Daniel
Hi, thank you for the answer.
- Yes, indeed, b in the legend is the bin size. The dataset is the same. I'm attaching the whole folder with the dataset and scripts. wham-test.tar.gz The total number of data points per window is 30'000, which should be sufficient for any binning.
- The first two plots are for PMFs obtained with your tool, and the third is with the Grossfield's. For Grossfield's tool, the typical behavior is that with small bin numbers, the PMFs are wavy, and with an increase in the number of bins, it becomes smoother without significant changes in the PMF shape with further increase. And the third plot illustrates this.
- I also wonder. It could be some mistake in running WHAM, but I double-checked, it is uniform with the others.
Best regards, Sofya
Hi,
I played around with your dataset and can reproduce what you observed. I get the same results for your settings.
I also looked a bit into the raw data and cant find anything that is wrong with it. The histogramms look well defined for me and there are enough datapoints with no obvious areas without proper overlap:
Individual histograms from all timeseries:
Combined histograms for bin counts 10, 30, 100, 1000:
Also, trying these things didnt solve the issue:
- leave out some of the dataset
- changing min/max lambda values
- skipping the first 600 datapoints of each series
- using an older version of WHAM
This really puzzles me, since I never experienced something like this with other data sets.