Regex match to newline characters on windows10
Hi Anthony,
Thank you for the wonderful book.
I'm working through Ch. 13 on regular expressions on Windows10 using postgreSQL10 in pgAdmin4
I was having trouble getting the regular expressions to work for example in code listing 13-7.
I believe that the issue is related to the way new line characters are handled on windows.
This also may be related to the following issues (I am using a clean version of the imported csv file from the crime data):
https://github.com/anthonydb/practical-sql/issues/4 and https://github.com/anthonydb/practical-sql/issues/10
I was able to solve this issue with this SO answer: https://stackoverflow.com/a/20056634. Apparently windows may match newlines to \r\n
Here is my sql code for the crime time and the output, where crime_type_orig is the original from the book and the other crime_type2 and crime_type3 are based on the above SO answer:
select
regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') as crime_type_orig,
-- See https://stackoverflow.com/a/20056634
regexp_match(original_text, '\r\n(?:\w+ \w+|\w+)\r\n(.*):') as crime_type2,
-- Based on https://stackoverflow.com/a/20056634
regexp_match(original_text, '(?:\r\n|\r|\n)(?:\w+ \w+|\w+)(?:\r\n|\r|\n)(.*):') as crime_type3
from crime_reports;
Here is the output from pgAdmin

Hi, @ChandlerLutz! Thanks for sharing this and glad you got it working.
Did you experience a similar issue with any of the other regexes, or just the one for the crime_type?
I tried just now with a new install of PostgreSQL 11 and pgAdmin 4.1 on a on a Windows 7 laptop and a fresh download of the data from this repo, and the book's code worked as expected. So, I'm a bit puzzled as to what's occurring on your side. It sounds to me like you did everything as expected, including a clean download of the file.
Please let me know your thoughts, and we'll leave this issue open in case others run into the same.
Hi Anthony,
So, I essentially ran in to the problem every time that a regex pattern contain the newline character \n. It might be some setting on my Windows10 machine (If I figure it out, I'll let you know). In the end I just replaced every \n with the negated capture group (?:\r\n|\r|\n). This worked well and gave me a little more regex practice!