safebrowsing 503 error, API outage?

Hey, we started receiving a bunch of 503 responses this afternoon, is the API experiencing any outages?

thanks!

It says in your status codes that 503 means the service is unavailable, is this potentially something on my end? Or are we experiencing issues that are expected?

Feb 05 '19 02:02 bigbizze

Hey @bigbizze -- are you still experiencing this? We are not aware of any existing service outages. Please let me know and I'll continue to dig in.

Feb 07 '19 19:02 alexwoz

Hey @alexwoz sorry for the delayed response , I am still apparently getting the same error as I was then.

I am getting 503 responses peppered in with 200 ones which is why I initially thought it had to be something on my end.

Here are some examples of the POST requests I'm sending to the Safe Browsing API with the status code of the JSON printed below each JSON block (it's too long for github): https://pastebin.com/iZhnKgE5

Let me know if there's any other attributed of the response that could help you or anything else in general that could help!

Feb 10 '19 00:02 bigbizze

Both URL and hash lookups are limited to 500 entries per API call. It looks like you're way over that. I'm not sure why it is returning 503 though, it should return a 4xx error of sorts.

https://developers.google.com/safe-browsing/v4/lookup-api#checking-urls

It also looks like something weird is happening with your entry types/platforms/etc with a bunch of duplication. I don't think there is any issue with that, but might as well fix them anyways.

Feb 12 '19 03:02 colonelxc

Ah, I didn't see that there were multiple requests in that paste, that might not be the problem. I'd try removing the deduplication on your threat types/etc, and see if that helps.

Feb 12 '19 03:02 colonelxc

I am certainly open to trying that but hopefully you can clear up some confusion on something that by my perusing wasn't particularly clear in the documentation before then...

I was under the impression that the lists found in "threats["threatLists"]" were directories or paths to separate lists or "bins" of different threats and that the duplication of threat types, threat entry types etc. were all unique to a specific list which is why I was using this function:

def reload_lists(p_load):
    re = requests.request("GET", "https://safebrowsing.googleapis.com/v4/threatLists", data="", headers=headers, params=querystring)
    threats = re.json()
    p_load["client"]["clientId"] = 'malicioussiteschoolproj'
    p_load["client"]["clientVersion"] = '1.0'
    p_load["threatInfo"]["threatTypes"] = list()
    p_load["threatInfo"]["platformTypes"] = list()
    p_load["threatInfo"]["threatEntryTypes"] = list()
    p_load["threatInfo"]["threatEntries"] = list()
    for items in threats["threatLists"]:
        p_load["threatInfo"]["threatTypes"].append(items["threatType"])
        p_load["threatInfo"]["platformTypes"].append(items["platformType"])
        p_load["threatInfo"]["threatEntryTypes"].append(items["threatEntryType"])
    with open("url_req.json", "w") as f:
        json.dump(p_load, f)
    return p_load

to ensure that I could make use of all of the safe browsing threat lists possible, it's also why there's so much duplication.

If there's a more efficient way of making reference to all of the threatlists I'd certainly be interested in using that.

Feb 12 '19 03:02 bigbizze

You're right that we generally treat them as 3 part tuples. However, it's a bit different in the API when looking up specific values (either a full hash lookup, or the url lookups, as you are doing), it is just a list of the threatTypes, etc that you are interested in. Internally it will create every combination that you may be interested in.

See this example: https://developers.google.com/safe-browsing/v4/lookup-api#example-threatmatchesfind In the example, there are two threat types (malware, software engineering), and one of each of the other types. On the server, it will pass back any results for url/windows/malware and url/windows/social_engineering.

I agree that it is a bit odd.

Feb 14 '19 21:02 colonelxc

Thank you, it appears to be working well now!

My only concern is that I initially did it this way with a list of known malicious URLs and it returned almost no matches whereas loading every combination it returned a large number of malicious matches, is this something I should be concerned about?

Mar 21 '19 18:03 bigbizze