Не сравнивайте, если между кавычками стоит символ (у АКА есть шаблон строки программирования)

Question

Не сравнивайте, если между кавычками стоит символ (у АКА есть шаблон строки программирования)

Мне поручили написать компилятор для Базовый язык программирования. В основном коды разделяются новыми строками или : отметка. например, чтобы следующие коды действительны.
Модель № 1

 10 PRINT "Hello World 1" : PRINT "Hello World 2"

Модель № 2

 10 PRINT "Hello World 1"20 PRINT "Hello World 2"

Вы можете проверить эти Вот.
Первое, что мне нужно сделать перед тем, как разбирать коды в моем компиляторе, это разделить коды.
Я уже разбил коды в строках, но я застрял в поиске регулярного выражения для разбиения Следующий пример кода:
Следующий пример кода должен быть разбит на 2 PRINT коды.

 10 PRINT "Hello World 1" : PRINT "Hello World 2"

Но НЕ сопоставляйте это:
Следующий пример кода представляет собой отдельную отдельную команду.

 10 PRINT "Hello World 1" ": PRINT Hello World 2"

Вопрос

Любой шаблон регулярного выражения для DO соответствует первому из приведенных выше примеров кода, который : находится за пределами пары " а НЕ совпадает со вторым?

Кто-нибудь может помочь мне здесь?
Любая вещь поможет. 🙂

4

basic boost c++compiler-design regex

Решение

Другие решения

Идея избежать такой проблемы состоит в том, чтобы сопоставить содержимое внутри кавычек, прежде чем пытаться сопоставить пример двоеточия:

"(?>[^\\"]++|\\{2}|\\.)*"|:

Вы можете добавить группы захвата, чтобы узнать, какая часть чередования была найдена.

Тем не менее, хорошим инструментом для выполнения таких задач, вероятно, является lex / yacc

0

Благодаря @Mauren мне удалось сделать то, что я хотел сделать.
Вот мой код (возможно, помогите кому-нибудь позже):
Обратите внимание, что содержимое исходного файла содержится в char* buffer а также vector<string> source_code,

    /* lines' tokens container */
std::string token;
/* Tokenize the file's content into seperate lines */
/* fetch and tokenizing line version of readed data  and maintain it into the container vector*/
for(int top = 0, bottom = 0; top < strlen(buffer) ; top++)
{
/* inline tokenizing with line breakings */
if(buffer[top] != '\n' || top == bottom)
{ /* collect current line's tokens */ token += char(buffer[top]); /* continue seeking */continue; }
/* if we reach here we have collected the current line's tokens */
/* normalize current tokens */
boost::algorithm::trim(token);
/* concurrent statements check point */
if(token.find(':') != std::string::npos)
{
/* a quotation mark encounter flag */
bool quotation_meet = false;
/* process entire line from beginning */
for(int index = 0; true ; index++)
{
/* loop's exit cond. */
if(!(index < token.length())) { break; }
/* fetch currently processing char */
char _char = token[index];
/* if encountered  a quotation mark */
/* we are moving into a string */
/* note that in basic for printing quotation mark, should use `CHR$(34)`
* so there is no `\"` to worry about! :) */
if(_char == '"')
{
/* change quotation meeting flag */
quotation_meet = !quotation_meet;
/* proceed with other chars. */
continue;
}
/* if we have meet the `:` char and also we are not in a pair quotation*/
if(_char == ':' && !quotation_meet)
{
/* this is the first sub-token of current token */
std::string subtoken(token.substr(0, index - 1));
/* normalize the sub-token */
boost::algorithm::trim(subtoken);
/* add sub-token as new line */
source_codes.push_back(subtoken);
/* replace the rest of sub-token as new token */
/**
* Note: We keep the `:` mark intentionally, since every code line in BASIC
* should start with a number; by keeping `:` while processing lines starting with `:` means
* they are meant to execute semi-concurrent with previous numbered statement.
* So we use following `substr` pattern instead of `token.substr(index + 1, token.length() - 1);`
*/
token = token.substr(index, token.length() - 1);
/* normalize the sub-token */
boost::algorithm::trim(token);
/* reset the index for new token */
index = 0;
/* continue with other chars */
continue;
}
}
/* if we have any remained token and not empty one? */
if(token.length())
/* a the tokens into collection */
goto __ADD_TOKEN;
}
__ADD_TOKEN:
/* if the token is not empty? */
if(token.length())
/* add fetched of token to our source code */
source_codes.push_back(token);
__NEXT_TOKEN:
/* move pointer to next tokens' position */
bottom = top + 1;
/* clear the token buffer */
token.clear();
/* a fail safe for loop */
continue;
}
/* We NOW have our source code departed into lines and saved in a vector */

0

Источник

Accepted Answer

Я считаю, что лучшим вариантом для вас является токенизация вашего исходного кода с помощью устройства, такого как цикл, вместо того, чтобы пытаться токенизировать его с помощью регулярных выражений.

В псевдокоде

string lexeme;
token t;

for char in string
if char fits current token
lexeme = lexeme + char;
else
t.lexeme = lexeme;
t.type = type;
lexeme = null;
end if
// other treatments here
end for

Вы можете увидеть реальную реализацию этого устройства в этот исходный код, более конкретно в строке 86.

1