This is a <sentence> with several words <demarcated> by <brackets>.
New line. Here is a <second> sentence. Glahluahlgghladaljdlhgl<jad>gdhl
Into a vector with these elements:
sentence
highlighted
brackets
second
jad
Can somebody kindly walk me through EXACTLY what to do to make this work. Like what regex function precisely will see < and > as delimiters around n substrings that I can push_back into a vector? I promise I’m not doing homework.
So I haven’t done C++ in a long while, but I can come up with regex for you. If the C++ regex syntax is a little different, fixing that will be up to you. Remember that some of these characters might have special meanings in C++ strings, which means you might need to escape them.
/<([^>]*)>/gm
Click here for a link to an online regex debugger that will describe each of the parts, but basically:
< - matches the starting bracket
() - a capture group (what we use to get only the string inside the brackets)
[^>]* - match any number of characters (0 to infinity) that are not the end bracket
> - matches end bracket
/gm - makes the regex global (so it returns all matches instead of just the first) and multiline
And a note: probably best not to attempt this with regex if you want to support nested brackets.
It’s C++ so may as well do something horrific. I would loop through the string searching for ‘<’ and putting the char pointer to that character plus one in the vector. Then I would loop again on the original string to substitute the ‘>’ characters with NULLs lol. If it’s a const pointer then just const_cast it away.
I wouldn’t trust a regex in poorly-formatted input, I think it’d be more resilient finding the delimiters like Broco said and then checking them (even number found, alternating < and >) then extracting the words with ranges.
If you’re sure the string is always well-formed, put it in an istringstream and read it out a word at a time.
Yeah, if there’s an easy way to step through a string and identify a pattern without a regex, I always prefer that.
Regexes are sort of a last resort “well, the pattern is complicated and not easy to parse”, and even then, you probably want to do a few sanity checks before applying one.
It’s very easy to make a poorly performing regex, they’re hard to reason about, and they’re hard to maintain and update.
We have an evolving in-house doc where devs highlight hilarious comments other devs made in the code and while I’m sure the entire exercise is beyond/beneath cliche I really enjoy it
All those things are true, but regexes (regexi) are still a valuable tool to always have in the back of your head when you gotta match string patterns.
Just saw someone posting about this cheatsheet site. Seems decent?
Ad hoc loop-and-index-based C++ string parsing is even harder to reason about and maintain than regexes, in my experience. I used to have to write a lot of it before regexes were added to the standard library.