logparser
logparser copied to clipboard
Drain.py modified (Bug Fix)
When I use 'Drain.py' for my database log(hadoop log) I found there's some bug like this..
Traceback (most recent call last):
File "./run_method.py", line 55, in <module>
parser.parse(os.path.basename(setting['log_file']))
File "../logparser/Drain/Drain.py", line 285, in parse
self.outputResult(logCluL)
File "../logparser/Drain/Drain.py", line 216, in outputResult
self.df_log["ParameterList"] = self.df_log.apply(self.get_parameter_list, axis=1)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4604, in construction_error
raise e
ValueError: could not broadcast input array from shape (3) into shape (4)
I traced the cause of this bug. And finally I found the way to fix this bug.
in Drain.py - 215 line
if self.keep_para:
self.df_log["ParameterList"] = self.df_log.apply(self.get_parameter_list, axis=1)
I changed this code like this..
if self.keep_para:
parameter_list=[]
for i, row in self.df_log.iterrows():
parameter_list.append(self.get_parameter_list(row))
p_series=pd.Series(parameter_list, name='ParameterList')
self.df_log = pd.concat([self.df_log, p_series], axis=1)
these two codes do exactly same thing. But my version does't make bug.