logparser icon indicating copy to clipboard operation
logparser copied to clipboard

Drain.py modified (Bug Fix)

Open kvlksgjp1 opened this issue 5 years ago • 0 comments

When I use 'Drain.py' for my database log(hadoop log) I found there's some bug like this..

Traceback (most recent call last):
  File "./run_method.py", line 55, in <module>
    parser.parse(os.path.basename(setting['log_file']))
  File "../logparser/Drain/Drain.py", line 285, in parse
    self.outputResult(logCluL)
  File "../logparser/Drain/Drain.py", line 216, in outputResult
    self.df_log["ParameterList"] = self.df_log.apply(self.get_parameter_list, axis=1)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4877, in apply
    ignore_failures=ignore_failures)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4990, in _apply_standard
    result = self._constructor(data=results, index=index)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
    construction_error(len(arrays), arrays[0].shape, axes, e)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4604, in construction_error
    raise e
ValueError: could not broadcast input array from shape (3) into shape (4)

I traced the cause of this bug. And finally I found the way to fix this bug.

in Drain.py - 215 line

if self.keep_para:
            self.df_log["ParameterList"] = self.df_log.apply(self.get_parameter_list, axis=1)

I changed this code like this..

if self.keep_para:
            parameter_list=[]
            for i, row in self.df_log.iterrows():
                parameter_list.append(self.get_parameter_list(row))
            p_series=pd.Series(parameter_list, name='ParameterList')
            self.df_log = pd.concat([self.df_log, p_series], axis=1)

these two codes do exactly same thing. But my version does't make bug.

kvlksgjp1 avatar Apr 06 '20 20:04 kvlksgjp1