Еще одно регулярное выражение. Получение изображения из уценки, прослушивается, если уценка внутри

Я пытаюсь получить информацию об изображениях из вики, у меня есть работающее регулярное выражение, но у меня не получается, когда в описании также есть уценка.

Формат изображений по уценке:

//[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
//[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
//[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]

{{основные статьи | [[Христианский анархизм]] и [[Анархизм и религия]]}}

Вот попытки: https://regex101.com/r/pD6nF8/1

Я пытаюсь что-то вроде:

// \[\[Image:(.*?)\|(.*?)\|(.*?)\|(.*?)\|\[*(.*?)\|*(.*?)\]*
$re = "/\\[\\[Image:(.*?)\\|(.*?)\\|(.*?)\\|(.*?)\\|\\[*(.*?)\\|*(.*?)\\]*/i";

Он должен найти 14 для этого теста, но я пока получаю 11, или, если я получаю 14, я получаю также некоторый шум вроде]] или просто части описания …

Как я могу включить необязательный случай с чем-то вроде этого [[(. *?)]] Внутри последней части?

0

Решение

Вы можете определить вложенные части раньше, используя такой синтаксис:

$pattern = '~
# definitions
(?(DEFINE)
(?<nested> \[\[ [^][]*+ (?:\[\[ \g<nested> ]] [^][]*)*+ ]] )
(?<part>   [^][|]*+ (?: \g<nested> [^][|]* )*+             )
)
# main pattern
\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) ]]
~ix';

демонстрация

Очевидно, вы можете быть более точным. Если вы уже знаете, что четвертая часть имеет размер, вы можете заменить ее:

\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\d+ px) \| (\g<part>) ]]

Вы также можете сделать некоторую часть необязательной, если это необходимо (например, с параметром выравнивания, который можно опустить):

\[\[ Image: (\g<part>) \| (\g<part>) (?:\| (\g<part>) )? \| (\d+ px) \| (\g<part>) ]]

Или вы можете сказать, что все параметры являются необязательными и могут встречаться только один раз, но в этом случае вам нужно быть точным:

~
(?(DEFINE)
(?<nested> \[\[ [^][]*+ (?: \[\[ \g<nested> ]] [^][]* )*+ ]] )
(?<part>   [^][|]*+ (?: \g<nested> [^][|]* )*+               )
)

\[\[Image: (?<name> [^]|]* )
(?:
\|
(?: (?<align>       left|right|center ) |
(?<type>        thumb             ) |
(?<size>        \d+[a-z]{0,3}     ) |
(?<description> \g<part>          )
)
)*
]]
~ix

демонстрация

2

Другие решения

Это многострочное регулярное выражение использует следующие флаги: Игнорировать пробелы, Глобальные и Без учета регистра

[[]{2}Image:
([^|]*\.(?:jpe?g|svg))[|]
([^|]*)[|]
((?:[[]{2}[^\]]*\]\]|[^|[])*)[|]
(?:((?:[[]{2}[^\]]*\]\]|[^|[])*)[|])?
((?:[[]{2}[^\]]*\]\]|(?:(?!\]|\|).))*)
(?:[|]|\]\])

Визуализация регулярных выражений

Это регулярное выражение будет делать следующее:

  • Найти [[image:....]] подстроки из вашего образца текста
  • требует, чтобы изображение заканчивалось одним из следующих .jpg, .jpeg, или же .svg, Вы можете удалить это поведение, удалив \.(?:jpe?g|svg) построить.
  • разбирать различные | поля с разделителями
  • избегайте сложных крайних случаев в последних нескольких полях, которые могут содержать дополнительную разметку

Live Demo

https://regex101.com/r/kI2wE5/2

Пример текста

Я взял на себя смелость потянуть все 14 матчей, но живое демо все еще есть ваш оригинальный текст

[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]

Образцы матчей

[0][0] = [[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[0][1] = WilliamGodwin.jpg
[0][2] = thumb
[0][3] = right
[0][4] = 150px
[0][5] = William Godwin

[1][0] = [[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[1][1] = Pierre_Joseph_Proudhon.jpg
[1][2] = 110px
[1][3] = thumb
[1][4] = left
[1][5] = Pierre Joseph Proudhon

[2][0] = [[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[2][1] = BenjaminTucker.jpg
[2][2] = thumb
[2][3] = 150px
[2][4] = left
[2][5] = [[Benjamin Tucker]]

[3][0] = [[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[3][1] = Bakuninfull.jpg
[3][2] = thumb
[3][3] = 150px
[3][4] = right
[3][5] = [[Bakunin|Mikhail Bakunin 1814-1876]]

[4][0] = [[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[4][1] = PeterKropotkin.jpg
[4][2] = thumb
[4][3] = 150px
[4][4] = right
[4][5] = Peter Kropotkin

[5][0] = [[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[5][1] = JohannMost.jpg
[5][2] = left
[5][3] = 150px
[5][4] = thumb
[5][5] = [[Johann Most]] was an outspoken advocate of violence

[6][0] = [[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[6][1] = Flag of Anarcho syndicalism.svg
[6][2] = thumb
[6][3] = 175px
[6][4] =
[6][5] = The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.

[7][0] = [[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[7][1] = CNT_tu_votar_y_ellos_deciden.jpg
[7][2] = thumb
[7][3] = 175px
[7][4] =
[7][5] = CNT propaganda from April 2004.  Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.

[8][0] = [[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[8][1] = CNT-armoured-car-factory.jpg
[8][2] = right
[8][3] = thumb
[8][4] = 270px
[8][5] = [[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.

[9][0] = [[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[9][1] = LeoTolstoy.jpg
[9][2] = thumb
[9][3] = 150px
[9][4] =
[9][5] = [[Leo Tolstoy|Leo Tolstoy]] 1828-1910

[10][0] = [[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[10][1] = Goldman-4.jpg
[10][2] = thumb
[10][3] = left
[10][4] = 150px
[10][5] = [[Emma Goldman]]

[11][0] = [[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[11][1] = Murray Rothbard Smile.JPG
[11][2] = thumb
[11][3] = left
[11][4] = 150px
[11][5] = [[Murray Rothbard]] (1926-1995)

[12][0] = [[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[12][1] = Hakim Bey.jpeg
[12][2] = thumb
[12][3] = right
[12][4] =
[12][5] = [[Hakim Bey]]

[13][0] = [[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]
[13][1] = Noam_chomsky.jpg
[13][2] = thumb
[13][3] = 150px
[13][4] = right
[13][5] =  [[Noam Chomsky]] (1928–)
NODE                     EXPLANATION
----------------------------------------------------------------------
[[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
Image:                   'Image:'
----------------------------------------------------------------------
(                        group and capture to \1:
----------------------------------------------------------------------
[^|]*                    any character except: '|' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\.                       '.'
----------------------------------------------------------------------
(?:                      group, but do not capture:
----------------------------------------------------------------------
jp                       'jp'
----------------------------------------------------------------------
e?                       'e' (optional (matching the most
amount possible))
----------------------------------------------------------------------
g                        'g'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
svg                      'svg'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
)                        end of \1
----------------------------------------------------------------------
[|]                      any character of: '|'
----------------------------------------------------------------------
(                        group and capture to \2:
----------------------------------------------------------------------
[^|]*                    any character except: '|' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)                        end of \2
----------------------------------------------------------------------
[|]                      any character of: '|'
----------------------------------------------------------------------
(                        group and capture to \3:
----------------------------------------------------------------------
(?:                      group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]*                   any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
[^|[]                    any character except: '|', '['
----------------------------------------------------------------------
)*                       end of grouping
----------------------------------------------------------------------
)                        end of \3
----------------------------------------------------------------------
[|]                      any character of: '|'
----------------------------------------------------------------------
(?:                      group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
(                        group and capture to \4:
----------------------------------------------------------------------
(?:                      group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]*                   any character except: '\]' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
[^|[]                    any character except: '|', '['
----------------------------------------------------------------------
)*                       end of grouping
----------------------------------------------------------------------
)                        end of \4
----------------------------------------------------------------------
[|]                      any character of: '|'
----------------------------------------------------------------------
)?                       end of grouping
----------------------------------------------------------------------
(                        group and capture to \5:
----------------------------------------------------------------------
(?:                      group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2}                   any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]*                   any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
(?:                      group, but do not capture:
----------------------------------------------------------------------
(?!                      look ahead to see if there is not:
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
\|                       '|'
----------------------------------------------------------------------
)                        end of look-ahead
----------------------------------------------------------------------
.                        any character except \n
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
)*                       end of grouping
----------------------------------------------------------------------
)                        end of \5
----------------------------------------------------------------------
(?:                      group, but do not capture:
----------------------------------------------------------------------
[|]                      any character of: '|'
----------------------------------------------------------------------
|                        OR
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
\]                       ']'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
1

Хорошо, если я правильно понял, вы хотите только изображения со стилем, без описания.

Так что я думаю, что это может сработать для вас

\[\[Image:.*?[jpg|svg][^\s]+(?=\|)

Затем просто добавьте]] к своим матчам.

0

Что если вы просто сопоставите их с этим регулярным выражением: \[\[Image\:(.*)\]\] а затем просто разделить каждый результат с |, Не знаю, хорошая ли это идея, но в попытках нет вреда.

0
По вопросам рекламы [email protected]