Vikas Gupta

Results 34 comments of Vikas Gupta

fixed via commit 548c8384 , pull request #541

If one of the columns is renamed as source: ./scripts/zingg.sh --phase findTrainingData --conf examples/febrl/config.json --zinggDir /tmp/z_temp this leads to org.apache.spark.sql.AnalysisException: Reference 'z_source' is ambiguous, could be: z_source, z_source. at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:377)...

renamed z_source to z_zsource pull request #574 commits 3bbceb5b , 19d02b4e , ea18f3fa

I was able to run it using following steps: Go to folder /zingg/docker/mac (which contains Dockerfile) docker image build -t zingg/vikas . => docker image zingg/vikas will get formed with...

Tried following: docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --run examples/febrl/FebrlExample.py error didn't come (by default FebrlExample.py ran trainMatch)

I also ran the phases 1 by 1 by modifying FebrlExample.py (after deleting models/100), issue not reproduced

I tried a combo i.e. findTrainingData, label, train using json and match using FebrlExample.py, this was done after deleting models/100. issue not reproduced

Finally reproduced using folllowing: docker run -v /tmp:/tmp -it zingg/vikas bash ./scripts/zingg.sh --phase match --conf examples/febrl/config.json . Available: z_z_zid, z_zid, fname, lname, stNo, add1, add2, city, areacode, state, dob, ssn,...

in case of FebrlExample.py all are fuzzy while in config.json stNo , areacode are exact by changing these 2 to fuzzy it worked

fixed in commit 687cef2 , pull request #543 generated the model again and change exact to fuzzy in json where there was a difference