C++ shame corner (little help please)

I want to make a string like this:

This is a <sentence> with several words <demarcated> by <brackets>.
New line. Here is a <second> sentence. Glahluahlgghladaljdlhgl<jad>gdhl

Into a vector with these elements:

sentence
highlighted
brackets
second
jad

Can somebody kindly walk me through EXACTLY what to do to make this work. Like what regex function precisely will see < and > as delimiters around n substrings that I can push_back into a vector? I promise I’m not doing homework.

So I haven’t done C++ in a long while, but I can come up with regex for you. If the C++ regex syntax is a little different, fixing that will be up to you. Remember that some of these characters might have special meanings in C++ strings, which means you might need to escape them.

/<([^>]*)>/gm

Click here for a link to an online regex debugger that will describe each of the parts, but basically:

< - matches the starting bracket
() - a capture group (what we use to get only the string inside the brackets)
[^>]* - match any number of characters (0 to infinity) that are not the end bracket
> - matches end bracket
/gm - makes the regex global (so it returns all matches instead of just the first) and multiline

And a note: probably best not to attempt this with regex if you want to support nested brackets.

4 Likes

No, I only need one level, that should be perfect. Thanks very much. This helps a lot.

It’s C++ so may as well do something horrific. I would loop through the string searching for ‘<’ and putting the char pointer to that character plus one in the vector. Then I would loop again on the original string to substitute the ‘>’ characters with NULLs lol. If it’s a const pointer then just const_cast it away.

not really but I’ve seen worse code in production

4 Likes

I wouldn’t trust a regex in poorly-formatted input, I think it’d be more resilient finding the delimiters like Broco said and then checking them (even number found, alternating < and >) then extracting the words with ranges.

If you’re sure the string is always well-formed, put it in an istringstream and read it out a word at a time.

1 Like

Yeah, if there’s an easy way to step through a string and identify a pattern without a regex, I always prefer that.

Regexes are sort of a last resort “well, the pattern is complicated and not easy to parse”, and even then, you probably want to do a few sanity checks before applying one.

It’s very easy to make a poorly performing regex, they’re hard to reason about, and they’re hard to maintain and update.

1 Like

I love regex

We have an evolving in-house doc where devs highlight hilarious comments other devs made in the code and while I’m sure the entire exercise is beyond/beneath cliche I really enjoy it

1 Like

All those things are true, but regexes (regexi) are still a valuable tool to always have in the back of your head when you gotta match string patterns.

Just saw someone posting about this cheatsheet site. Seems decent?

Ad hoc loop-and-index-based C++ string parsing is even harder to reason about and maintain than regexes, in my experience. I used to have to write a lot of it before regexes were added to the standard library.

1 Like

I hate regex but walking through a string character-by-character is just asking for trouble.

1 Like

As always: programming is very complicated and if you think there’s some best way to solve any problem you’ve already fucked up!

3 Likes