Identify addresses with significantly more 311 requests
Overview
This can be very useful information for NCs and city agencies. Basically, we can identify addresses or small areas that could benefit from more signage, increased community assistance, or other actions.
This was actually one of the original goals of 311 Data (see Use Case Feasibility Report).
[Update 10/12/22] In progress HERE:
- EDA on Neighborhood Councils and block-by-block
- Geospatial analysis and folium maps with City of LA Neighborhood Council boundaries and Census LA block boundaries
- Clustering analysis notebook on data set sample
Action Items
- [ ] Figure out ways to implement this in existing dashboard
- [x] Create script to output a csv specific to each NC with block IDs merged with NC request API csv
See repo HERE
- [ ] Update folium layered map with type filters
- [ ] Update streamlit app
- [ ] Add per capita variable to cluster analysis and potentially classification model for hot spot target based on variables (variables to consider are being formulated)
At least at the NC level, we have visualization on the total number of requests over the years. See the bottom on the dashboard here. I can take a stab on using some clustering algorithm to further identify smaller regions.
Thanks Josh! Yes, ideally, I think we'd want to get as granular as address-level, and then one notch above that, block-level. I think an individual NC would like to see if, e.g., 50% of their NC's 311 requests are coming from a single address.
Power BI Demo:

Next Steps:
- Decide on the specific granularity
- Generate a list of top 10 address/specified granularity for each NC
- Think of how to implement this as a feature into existing dashboard
Apparently we have an API endpoint that can produce "hotspots", see #1034. I'm not sure if this is helpful, or changes how we do things, but it's worth looking into.
The API use the clustering algorithm to identify hotspots. It is definitely useful if we want to implement it as a future feature. However it's not that useful for analysis purposes.
Wrote a quick function that basically round the longitude latitude pair by 2 decimal places and count the number of request in a neighborhood council. We can use this function for the 311 requests available every year since 2016. I can conduct some basic metrics like Year over Year comparison / quarter over quarters for the number of requests, but I 'll focus on bulky items, homeless encampments, and graffiti.
See function below:
def generate_hotspot_dataframe(df):
"""Generates the hotspots of each NC by the number of 311 requests.
This function takes in a raw LA 311 requests dataframe and aggregate by
the longitude and latitude of 311 requests in 2 decimal places for a
neighborhood council.
Args:
df: raw LA 311 requests for any year.
Return:
An aggregate 311 request dataframe that contains the count of 311 requests
per long/lat pair in each neighborhood council.
"""
print("* Rounding requests Long/Lat to 2 Decimal Places")
df['lat_2dp'] = df['Latitude'].round(decimals=2)
df['long_2dp'] = df['Longitude'].round(decimals=2)
print("* Aggregating dataframes")
final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
return final_df
I'm not sure if two decimal places is small enough--1 degree of latitude/longitude is 69 miles, so two decimal places would be 0.69 miles, which is quite considerable. We can fine tune the number of decimal places as necessary.
Those target request types look good to me! I would also add illegal dumping and animal remains. Both are issues that might be concentrated in certain areas, and could be addressed with additional signage.
Thanks for the review!
def generate_hotspot_dataframe(df, dp, req_type):
"""Generates the hotspots of each NC by the number of 311 requests.
This function takes in a raw LA 311 requests dataframe, filter by "req_type" request type,
and aggregate by the longitude and latitude of 311 requests to 'dp' number of decimal places
for a neighborhood council.
Args:
df: a pandas dataframe with raw LA 311 requests for any year.
dp: an integer for the number of decimal places to round the lat/long to.
req_type: a string column name for the request type to filter the dataframe by.
Return:
An aggregate 311 request dataframe that contains the count of 311 requests
per long/lat pair in each neighborhood council.
"""
print("* Filtering dataframe by " + req_type)
df = df[df['RequestType'] == req_type]
print("* Rounding requests Long/Lat to " + str(dp) + " Decimal Places")
df['lat_2dp'] = df['Latitude'].round(decimals=dp)
df['long_2dp'] = df['Longitude'].round(decimals=dp)
print("* Aggregating dataframes")
final_df = df.groupby(['NCName', 'lat_2dp', 'long_2dp'], as_index=False)['SRNumber'].count().sort_values(['NCName', 'SRNumber']).reset_index()
return final_df
req_type_lst = ['Graffiti Removal', 'Bulky Items', 'Homeless Encampment', 'Dead Animal Removal', 'Illegal Dumping Pickup']
for r in req_type_lst:
final_df = generate_hotspot_dataframe(df, 2, r)
final_df.to_csv("311_2020_Hotspot_" + r + ".csv")
Really rough function that generates corresponding dataframe for each request types. Still using 2 decimal points right now, but could be fine tuned now. Next step is to figure out a way to present this, or just send the list as is.
Hey Josh and Nich. I started digging in a little to familiarize myself with the 311 data around locations and request type. I'll bring questions I have from this initial exploration to the project meeting. I think you could do some clustering on past data to maybe predict types of requests in the different granular areas to help allocate resources but need to figure out how to do API calls to collect enough historical data and also create new features for granular location. The API call I used only gives up to 1000 records which was a question I was going to bring to the project call.
Here's where I'm storing all my code. https://github.com/ajmachado42/Hack-for-LA-311-Data
Hey Dri, thanks for taking a look at this! To get all the requests for a certain date range, you can use this tool. Feel free to reach out to @priyakalyan if you have any questions about using it.
Re: the clustering: not sure if you saw this already, but we already have one implementation that does this. Please take a look and see if it looks useful to you.
Btw, if you're blocked on anything, feel free to reach out to us on Slack or write out your questions here on GitHub. It can be a pain to write them out, but we want to help our teammates to be productive throughout the week!
Thanks Nich! I'll definitely use this API code and take a look at the clustering!
I made some pretty decent headway on the EDA and identifying hot spots by neighborhood council and address in this notebook.
I'm still figuring out breaking LA into small hot spot chunks and then mapping out the data points there but I started going down a rabbit hole about geopandas so the research is taking a little longer than I thought it would.
Some points for tomorrow's meeting (09/28/22):
- Size of each area to look at (each lat lon increment is about 69 miles, could break it into 100ths so .69 miles each)
- Should "hot spots" only include addresses that have multiple requests? A lot of requests are one offs for bulky item pickups. When you break it down, graffiti becomes the number one offender for repeat requests.
- API maxes out at 20000 requests -- 09/23/22-09/17/2022 were only date range able to be pulled
@ajmachado42 Thanks so much for the comprehensive update! The notebook is very clear and comprehensive.
- I like the idea of breaking them into 100ths. I initially did 2 decimal place of each lat/lon but I figured it would be not granular enough. It would be great to see the distribution of counts after you break them down into .69 miles each. If there are too many "hot spot blocks", we can take a larger block.
- As per our discussion during our meeting, I think the >=2 requests make sense. At the same time, I'd suggest checking the LA weekly/monthly/yearly nc request count average and treat that as the decision rule. Ultimately, we want something actionable and make an impact. If there are not that many requests we can't really do much about them as it's likely due to random chance / one-offs
- Hmm is that the case with the get_request_tool? I'd just use the 2021 LA 311 dataset and download as csv instead. I can take a look at the API
Once again, thanks so much for your hard work - Let me know what you think!
@joshuayhwu Thank you, Josh! I'll work on this this week.
Anupriya shared some Census resources for mapping files that breaks LA into the official city blocks and I think her and Nich fixed the API bug after the meeting. I'm going to be visiting family in Florida this week but will have time to update my notebook with the full year data set and start doing some geospatial analysis as well.
Geospatial Analysis
- Folium choropleth maps with geospatial analysis on Neighborhood Council level and block-by-block level (taken from Census data, thanks Anupriya!)
- Some loss happened with the data as the raw data was broken down to granular levels. This might have been due to inconsistencies between locations in the boundary datasets.
- I did inner joins for areas that were within the larger area (Neighborhood Council > Block > Request location points).
- Can there be multiple entries for one request? Duplicate requestId's are appearing in the raw data set and traveling downstream to the block data set.
- I saved a csv of the request data I have (10/01/2021 - 10/01/2022) merged with the block IDs and can do an EDA on it this next week
- The geospatial notebook wouldn't render on my browser in GitHub, probably because it has a lot of data in it, but it should be able to be downloaded. Going to work on getting it up on Streamlit so it's presentable.
Clustering
- Updated with DBScan cluster analysis. Ran on smaller sample of full dataset; not final but the code is there as well as some preliminary conclusions.
- Could put the results into a classification model to predict clusters but this would need more processing power than I have locally
https://github.com/ajmachado42/Hack-for-LA-311-Data/tree/master/I-1279
@ajmachado42 Thanks so much for the comprehensive updates - really appreciate the documentation on the notebooks!
Geospatial Analysis:
- I can't open the geospatial notebook after downloading the raw file as it takes up a lot of memory. Do you mind breaking it down into smaller sections?
- I don't think there could be multiple entires for the same requests. If it appears downstream, it's likely that the requests' coordinate locates on the boundary, causing multiple combinations to appear. Perhaps decide on a rule to remove duplicate after joining?
Clustering:
- I like the sampling idea. From the preliminary analysis, it seems like we'd only be able to get two main clusters which doesn't generate direct insights. Also, the aggregate DBSCAN would be biased by population of each NC (more densely populated NCs would have more requests). Perhaps we can stratified sample at an NC level then do DBSCAN?
- Unfortunately processing power is something that's too expensive for this project.
@joshuayhwu I updated the visualization notebook so it's broken up more. Github still won't render the folium maps though.
This is my Drive link for it which has all the datasets, etc. Let me know if that works! (I was able to create a layered map by type in the nc_only notebook.) https://drive.google.com/drive/folders/1njMKXLcs6CSgcZ_Gs9Fwxr6Iq2Wro45m?usp=sharing
Noted about clustering. Once I finish getting the maps and block data set to a good spot then I'll shift to focusing on the cluster analysis more.
@ajmachado42 thanks for breakit up! Notebook looks good and I really appreciate the comments!
I can take a look at the app and see how to render it if that's your only blocker. Otherwise, happy to check in on other blockers. Let me know which area you want most help with. Thanks for your hard work this week!
- Updated block data set and solved duplicates issue.
- Researching a good data set for population density to get a per capita for each NC and type of request.
- Working on function to get a csv specific for each NC
- haven't had time to work on streamlit but should have time this week!
Hey @ajmachado42 and @joshuayhwu, Do you have an update for us on this issue?
Please update:
- Progress:
- Blockers:
- Availability:
- ETA:
Thanks!
Hey @mc759
Progress:
- Completed EDA on NC level data
- Completed function for spatial joins for block level IDs per 311 request
- Built a few folium maps to display a year's worth of geospatial data analysis on block level and NC level
Blockers:
- Having issues rendering folium maps in Streamlit so may pivot to Tableau as a final dashboard instead. I'm not sure if a dashboard is needed or if the code is more useful to have as a resource.
Availability:
- Feel free to reach out on Slack. Schedule is kind of all over the place right now.
ETA:
- Depends on the direction the final presentation needs to go in. Probably no more than a day of work left though.
-Adriana (sent from mobile)
On Mon, Dec 12, 2022, 7:25 PM mc759 @.***> wrote:
Hey @ajmachado42 https://github.com/ajmachado42 and @joshuayhwu https://github.com/joshuayhwu, Do you have an update for us on this issue?
Please update:
- Progress:
- Blockers:
- Availability:
- ETA:
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/hackforla/311-data/issues/1279#issuecomment-1347693457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARSELD3RR3XT7EPIQP4VILDWM7UC5ANCNFSM53CHQQ2A . You are receiving this because you were mentioned.Message ID: @.***>
Moving this one to closed after discussed with Josh. Lots of templates for analyses (statistical and geospatial) and mini program to generate a report that adds census block IDs to each request based on the address of the request. Feel free to reach out to me if you need anything!