qdapRegex icon indicating copy to clipboard operation
qdapRegex copied to clipboard

Is there a way to extract only the second or third (for example) occurrence of a pattern in the string?

Open bes827 opened this issue 5 years ago • 4 comments

This is an excellent package, can get a lot of work done with minimal coding. And is educational as well (as the author described); it teaches the basics of regex in r.

This is a small example to illustrate my question:

d = "started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020"   
rm_date(d, extract=TRUE)

when I run this code, it extracts the 3 dates in the string.

[1] "09/29/1980" "02/11/2020" "2/14/2020"

is there a way to extract only the second or the third?

thank you

bes827 avatar Jul 28 '20 01:07 bes827

Does the second instance always contain: continued till prior to the desired date and likewise the 3rd contain: last noted prior to the desired date?

trinker avatar Jul 28 '20 14:07 trinker

Hello, thanks for the great package and the quick reply.

unfortunately the pattern before the dates is not constant (I am working on a group of reports ~ 1000). I tried initially to fit different patterns using this for example:

ex_between(d, c("STR1", "STR4"), c("STR2", "STR5"))

but there were too many variations in the pattern before and after, however the dates in the reports are always ordered in the same way. that's why I was checking if there is a way to select a particular occurrence.

bes827 avatar Jul 29 '20 03:07 bes827

Ok, if you know that you always want the 2nd or third date I'd still extract everything and then process with an sapply with [ to grab the specific element from the list output given by ex_date as seen below:

d <- c(
    "started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020",
    "started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020",
    "started on 09/29/1980, continued till 02/11/2020, last noted 2/14/2020" 
)
   
sapply(qdapRegex::ex_date(d), `[`, 2)
## [1] "02/11/2020" "02/11/2020" "02/11/2020"


sapply(qdapRegex::ex_date(d), `[`, 3)
## [1] "2/14/2020" "2/14/2020" "2/14/2020"

trinker avatar Jul 29 '20 15:07 trinker

Appreciate your help. This works perfect. sorry for the late reply.

bes827 avatar Aug 07 '20 21:08 bes827