Is there a way to extract only the second or third (for example) occurrence of a pattern in the string?
This is an excellent package, can get a lot of work done with minimal coding. And is educational as well (as the author described); it teaches the basics of regex in r.
This is a small example to illustrate my question:
d = "started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020"
rm_date(d, extract=TRUE)
when I run this code, it extracts the 3 dates in the string.
[1] "09/29/1980" "02/11/2020" "2/14/2020"
is there a way to extract only the second or the third?
thank you
Does the second instance always contain: continued till prior to the desired date and likewise the 3rd contain: last noted prior to the desired date?
Hello, thanks for the great package and the quick reply.
unfortunately the pattern before the dates is not constant (I am working on a group of reports ~ 1000). I tried initially to fit different patterns using this for example:
ex_between(d, c("STR1", "STR4"), c("STR2", "STR5"))
but there were too many variations in the pattern before and after, however the dates in the reports are always ordered in the same way. that's why I was checking if there is a way to select a particular occurrence.
Ok, if you know that you always want the 2nd or third date I'd still extract everything and then process with an sapply with [ to grab the specific element from the list output given by ex_date as seen below:
d <- c(
"started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020",
"started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020",
"started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020"
)
sapply(qdapRegex::ex_date(d), `[`, 2)
## [1] "02/11/2020" "02/11/2020" "02/11/2020"
sapply(qdapRegex::ex_date(d), `[`, 3)
## [1] "2/14/2020" "2/14/2020" "2/14/2020"
Appreciate your help. This works perfect. sorry for the late reply.