JavaScript — RegExp, чтобы заменить соответствующие круглые скобки во вложенной структуре

Question

JavaScript — RegExp, чтобы заменить соответствующие круглые скобки во вложенной структуре

Как заменить набор совпадающих открывающих / закрывающих скобок, если первая открывающая скобка следует за ключевым словом array? Могут ли регулярные выражения помочь с этим типом проблемы?

Чтобы быть более конкретным, я хотел бы решить это с помощью JavaScript или PHP

// input
$data = array(
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)
);

// desired output
$data = [
'id' => nextId(),
'profile' => [
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
]
];

1

javascript lexical-analysis parsing php regex

Решение

Другие решения

Как насчет следующего (с использованием механизма регулярных выражений .NET):

resultString = Regex.Replace(subjectString,
@"\barray\(            # Match 'array('
(                      # Capture in group 1:
(?>                   # Start a possessive group:
(?:                  # Either match
(?!\barray\(|[()])  # only if we're not before another array or parens
.                   # any character
)+                   # once or more
|                     # or
\( (?<Depth>)        # match '(' (and increase the nesting counter)
|                     # or
\) (?<-Depth>)       # match ')' (and decrease the nesting counter).
)*                    # Repeat as needed.
(?(Depth)(?!))        # Assert that the nesting counter is at zero.
)                      # End of capturing group.
\)                     # Then match ')'.",
"[$1]", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

Это регулярное выражение соответствует array(...) где ... может содержать что угодно, кроме другого array(...) (таким образом, это соответствует только наиболее глубоко вложенным вхождениям). Это позволяет использовать другие вложенные (и правильно сбалансированные) скобки внутри ..., но он не выполняет никакой проверки, являются ли они семантическими скобками или содержатся ли они в строках или комментариях.

Другими словами, что-то вроде

array(
'name' => 'Hugo ((( Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)

не будет соответствовать (правильно).

Вы должны применять это регулярное выражение итеративно до тех пор, пока оно больше не изменяет свои входные данные — в случае вашего примера будет достаточно двух итераций.

2

Источник

Accepted Answer

Тим Пицкер дал версию для подсчета Dot-Net.
Он имеет те же элементы, что и версия PCRE (php) ниже.

Все предостережения одинаковы. В частности, немассив скобка должна
быть сбалансированным, потому что они используют те же закрывающие скобки, что и разделители.

Весь текст должен быть разобран (или должен быть).
Внешние группы 1, 2, 3, 4 позволяют получить детали
СОДЕРЖАНИЕ
CORE-1 array()
CORE-2 любой ()
ИСКЛЮЧЕНИЯ

Каждый матч дает вам одну из этих внешних вещей и является взаимоисключающими.

Хитрость заключается в том, чтобы определить php функция parse( core) это разбирает ядро.
Внутри этой функции находится while (regex.search( core ) { .. } петля.

Каждый раз CORE-1 или 2 группы совпадают, звоните parse( core ) передача функций
содержание группы этого ядра к нему.

А внутри цикла просто снимите контент и назначьте его хешу.

Очевидно, конструкция группы 1, которая вызывает (?&content) должны быть заменены
с конструкциями для получения хеш-данных переменных.

В детальном масштабе это может быть очень утомительно.
Обычно вам нужно учитывать каждый отдельный символ, чтобы правильно
разобрать всю вещь.

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))

расширенный

 # 1:  CONTENT
# 2:  CORE-1
# 3:  CORE-2
# 4:  EXCEPTIONS

(?is)

(?:
(                                  # (1), Take off   CONTENT
(?&content)
)
|                                   # OR -----------------------------
(?>                                # Start 'array('
\b array \s* \(
)
(                                  # (2), Take off   'array( CORE-1 )'
(?= . )
(?&core)
|
)
\)                                 # End ')'
|                                   # OR -----------------------------
\(                                 # Start '('
(                                  # (3), Take off   '( any CORE-2 )'
(?= . )
(?&core)
|
)
\)                                 # End ')'
|                                   # OR -----------------------------
(                                  # (4), Take off   Unbalanced or Exceptions
\b array \s* \(
|  [()]
)
)

# Subroutines
# ---------------

(?(DEFINE)

# core
(?<core>
(?>
(?&content)
|
(?> \b array \s* \( )
# recurse core of  array()
(?:
(?= . )
(?&core)
|
)
\)
|
\(
# recurse core of any  ()
(?:
(?= . )
(?&core)
|
)
\)
)+
)

# content
(?<content>
(?>
(?!
\b array \s* \(
|  [()]
)
.
)+
)
)

Выход

 **  Grp 0           -  ( pos 0 , len 11 )
some_var =
**  Grp 1           -  ( pos 0 , len 11 )
some_var =
**  Grp 2           -  NULL
**  Grp 3           -  NULL
**  Grp 4 [core]    -  NULL
**  Grp 5 [content] -  NULL

-----------------------

**  Grp 0           -  ( pos 11 , len 153 )
array(
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)
)
**  Grp 1           -  NULL
**  Grp 2           -  ( pos 17 , len 146 )

'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)

**  Grp 3           -  NULL
**  Grp 4 [core]    -  NULL
**  Grp 5 [content] -  NULL

-------------------------------------

**  Grp 0           -  ( pos 164 , len 3 )
;

**  Grp 1           -  ( pos 164 , len 3 )
;

**  Grp 2           -  NULL
**  Grp 3           -  NULL
**  Grp 4 [core]    -  NULL
**  Grp 5 [content] -  NULL

Предыдущее воплощение чего-то другого, чтобы получить представление о использование

 # Perl code:
#
#     use strict;
#     use warnings;
#
#     use Data::Dumper;
#
#     $/ = undef;
#     my $content = <DATA>;
#
#     # Set the error mode on/off here ..
#     my $BailOnError = 1;
#     my $IsError = 0;
#
#     my $href = {};
#
#     ParseCore( $href, $content );
#
#     #print Dumper($href);
#
#     print "\n\n";
#     print "\nBase======================\n";
#     print $href->{content};
#     print "\nFirst======================\n";
#     print $href->{first}->{content};
#     print "\nSecond======================\n";
#     print $href->{first}->{second}->{content};
#     print "\nThird======================\n";
#     print $href->{first}->{second}->{third}->{content};
#     print "\nFourth======================\n";
#     print $href->{first}->{second}->{third}->{fourth}->{content};
#     print "\nFifth======================\n";
#     print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
#     print "\nSix======================\n";
#     print $href->{six}->{content};
#     print "\nSeven======================\n";
#     print $href->{six}->{seven}->{content};
#     print "\nEight======================\n";
#     print $href->{six}->{seven}->{eight}->{content};
#
#     exit;
#
#
#     sub ParseCore
#     {
#         my ($aref, $core) = @_;
#         my ($k, $v);
#         while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
#         {
#            if (defined $1)
#            {
#              # CONTENT
#                $aref->{content} .= $1;
#            }
#            elsif (defined $2)
#            {
#              # CORE
#                $k = $2; $v = $3;
#                $aref->{$k} = {};
#      #         $aref->{$k}->{content} = $v;
#      #         $aref->{$k}->{match} = $&;
#
#                my $curraref = $aref->{$k};
#                my $ret = ParseCore($aref->{$k}, $v);
#                if ( $BailOnError && $IsError ) {
#                    last;
#                }
#                if (defined $ret) {
#                    $curraref->{'#next'} = $ret;
#                }
#            }
#            else
#            {
#              # ERRORS
#                print "Unbalanced '$4' at position = ", $-[0];
#                $IsError = 1;
#
#                # Decide to continue here ..
#                # If BailOnError is set, just unwind recursion.
#                # -------------------------------------------------
#                if ( $BailOnError ) {
#                   last;
#                }
#            }
#         }
#         return $k;
#     }
#
#     #================================================
#     __DATA__
#     some html content here top base
#     <!--block:first-->
#         <table border="1" style="color:red;">
#         <tr class="lines">
#             <td align="left" valign="<--valign-->">
#         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
#         <!--hello--> <--again--><!--world-->
#         some html content here 1 top
#         <!--block:second-->
#             some html content here 2 top
#             <!--block:third-->
#                 some html content here 3 top
#                 <!--block:fourth-->
#                     some html content here 4 top
#                     <!--block:fifth-->
#                         some html content here 5a
#                         some html content here 5b
#                     <!--endblock-->
#                 <!--endblock-->
#                 some html content here 3a
#                 some html content here 3b
#             <!--endblock-->
#             some html content here 2 bottom
#         <!--endblock-->
#         some html content here 1 bottom
#     <!--endblock-->
#     some html content here1-5 bottom base
#
#     some html content here 6-8 top base
#     <!--block:six-->
#         some html content here 6 top
#         <!--block:seven-->
#             some html content here 7 top
#             <!--block:eight-->
#                 some html content here 8a
#                 some html content here 8b
#             <!--endblock-->
#             some html content here 7 bottom
#         <!--endblock-->
#         some html content here 6 bottom
#     <!--endblock-->
#     some html content here 6-8 bottom base
#
# Output >>
#
#     Base======================
#     some html content here top base
#
#     some html content here1-5 bottom base
#
#     some html content here 6-8 top base
#
#     some html content here 6-8 bottom base
#
#     First======================
#
#         <table border="1" style="color:red;">
#         <tr class="lines">
#             <td align="left" valign="<--valign-->">
#         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
#         <!--hello--> <--again--><!--world-->
#         some html content here 1 top
#
#         some html content here 1 bottom
#
#     Second======================
#
#             some html content here 2 top
#
#             some html content here 2 bottom
#
#     Third======================
#
#                 some html content here 3 top
#
#                 some html content here 3a
#                 some html content here 3b
#
#     Fourth======================
#
#                     some html content here 4 top
#
#
#     Fifth======================
#
#                         some html content here 5a
#                         some html content here 5b
#
#     Six======================
#
#         some html content here 6 top
#
#         some html content here 6 bottom
#
#     Seven======================
#
#             some html content here 7 top
#
#             some html content here 7 bottom
#
#     Eight======================
#
#                 some html content here 8a
#                 some html content here 8b
#

3