The data in "TrainingData" link is inconsistent with genTrainingData.py

Open mengyuest opened this issue 1 year ago • 0 comments

Hi, I am checking the score lookup table. Based on genTrainingData.py the X are like grid-data and should have zero-mean and unit range at every entry, and has shape (32x41x41=53792, 3). But I downloaded the data from Training data in the README.md and the first 32 lines are as follows:

I ran

import numpy as np
import h5py
f = h5py.File('HCAS_rect_TrainingData_v6_pra0_tau00.h5', 'r')
X_train = np.array(f['X'])
print(X_train[:32])

And it generates

[[ 2.48775044e-03 -1.25392984e-17 -5.00000000e-01]
 [ 2.26453615e-03 -1.25666343e-17 -5.00000000e-01]
 [ 2.04132186e-03 -1.25939702e-17 -5.00000000e-01]
 [ 1.81810758e-03 -1.26213060e-17 -5.00000000e-01]
 [ 1.59489329e-03 -1.26486419e-17 -5.00000000e-01]
 [ 1.14846472e-03 -1.27033136e-17 -5.00000000e-01]
 [ 7.02036150e-04 -1.27579854e-17 -5.00000000e-01]
 [-1.90820993e-04 -1.28673288e-17 -5.00000000e-01]
 [-1.08367814e-03 -1.29766723e-17 -5.00000000e-01]
 [-1.97653528e-03 -1.30860157e-17 -5.00000000e-01]
 [-2.06582099e-03 -1.30969501e-17 -5.00000000e-01]
 [-4.20867814e-03 -1.33593744e-17 -5.00000000e-01]
 [-6.44082099e-03 -1.36327331e-17 -5.00000000e-01]
 [-1.09051067e-02 -1.41794504e-17 -5.00000000e-01]
 [-1.53693924e-02 -1.47261677e-17 -5.00000000e-01]
 [-2.42979639e-02 -1.58196023e-17 -5.00000000e-01]
 [-3.32265353e-02 -1.69130370e-17 -5.00000000e-01]
 [-4.21551067e-02 -1.80064716e-17 -5.00000000e-01]
 [-6.00122496e-02 -2.01933409e-17 -5.00000000e-01]
 [-7.78693924e-02 -2.23802102e-17 -5.00000000e-01]
 [-9.57265353e-02 -2.45670795e-17 -5.00000000e-01]
 [-1.13583678e-01 -2.67539488e-17 -5.00000000e-01]
 [-1.31440821e-01 -2.89408181e-17 -5.00000000e-01]
 [-1.49297964e-01 -3.11276873e-17 -5.00000000e-01]
 [-1.67155107e-01 -3.33145566e-17 -5.00000000e-01]
 [-1.85012250e-01 -3.55014259e-17 -5.00000000e-01]
 [-2.20726535e-01 -3.98751645e-17 -5.00000000e-01]
 [-2.65369392e-01 -4.53423377e-17 -5.00000000e-01]
 [-3.10012250e-01 -5.08095109e-17 -5.00000000e-01]
 [-3.54655107e-01 -5.62766841e-17 -5.00000000e-01]
 [-4.26083678e-01 -6.50241612e-17 -5.00000000e-01]
 [-4.97512250e-01 -7.37716384e-17 -5.00000000e-01]]
>>>

notice that the first column is not having a unit range, and the second column starts with close-to-zero value. Which is weird. If I just ran the process in genTrainingData.py to generate the X:

import numpy as np
ranges = np.array([0.0,25.0,50.0,75.0,100.0,150.0,200.0,300.0,400.0,500.0,510.0,750.0,1000.0,1500.0,2000.0,3000.0,4000.0,5000.0,7000.0,9000.0,11000.0,13000.0,15000.0,17000.0,19000.0,21000.0,25000.0,30000.0,35000.0,40000.0,48000.0,56000.0])
thetas = np.linspace(-np.pi,np.pi,41)
psis  = np.linspace(-np.pi,np.pi,41)
X = np.array([[r,t,p] for p in psis for t in thetas for r in ranges])
means = np.mean(X, axis=0)
rnges = np.max(X, axis=0) - np.min(X, axis=0)
min_inputs = np.min(X, axis=0)
max_inputs = np.max(X, axis=0)
rnges = np.where(rnges==0.0, 1.0, rnges)
X  = (X - means) / rnges
print(X[:32])

I will get

[[-0.20399554 -0.5        -0.5       ]
 [-0.20354911 -0.5        -0.5       ]
 [-0.20310268 -0.5        -0.5       ]
 [-0.20265625 -0.5        -0.5       ]
 [-0.20220982 -0.5        -0.5       ]
 [-0.20131696 -0.5        -0.5       ]
 [-0.20042411 -0.5        -0.5       ]
 [-0.19863839 -0.5        -0.5       ]
 [-0.19685268 -0.5        -0.5       ]
 [-0.19506696 -0.5        -0.5       ]
 [-0.19488839 -0.5        -0.5       ]
 [-0.19060268 -0.5        -0.5       ]
 [-0.18613839 -0.5        -0.5       ]
 [-0.17720982 -0.5        -0.5       ]
 [-0.16828125 -0.5        -0.5       ]
 [-0.15042411 -0.5        -0.5       ]
 [-0.13256696 -0.5        -0.5       ]
 [-0.11470982 -0.5        -0.5       ]
 [-0.07899554 -0.5        -0.5       ]
 [-0.04328125 -0.5        -0.5       ]
 [-0.00756696 -0.5        -0.5       ]
 [ 0.02814732 -0.5        -0.5       ]
 [ 0.06386161 -0.5        -0.5       ]
 [ 0.09957589 -0.5        -0.5       ]
 [ 0.13529018 -0.5        -0.5       ]
 [ 0.17100446 -0.5        -0.5       ]
 [ 0.24243304 -0.5        -0.5       ]
 [ 0.33171875 -0.5        -0.5       ]
 [ 0.42100446 -0.5        -0.5       ]
 [ 0.51029018 -0.5        -0.5       ]
 [ 0.65314732 -0.5        -0.5       ]
 [ 0.79600446 -0.5        -0.5       ]

You can see that only the last columns between the two outputs are the same, whereas the rest columns are not consistent even if taking numerical error into account. Could you please explain why it became this, and can we safely replace the X in the data with the one I created here? Thanks for your response.

Nov 21 '24 18:11 mengyuest