Я пытаюсь получить информацию об изображениях из вики, у меня есть работающее регулярное выражение, но у меня не получается, когда в описании также есть уценка.
Формат изображений по уценке:
//[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
//[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
//[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004. Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
{{основные статьи | [[Христианский анархизм]] и [[Анархизм и религия]]}}
Вот попытки: https://regex101.com/r/pD6nF8/1
Я пытаюсь что-то вроде:
// \[\[Image:(.*?)\|(.*?)\|(.*?)\|(.*?)\|\[*(.*?)\|*(.*?)\]*
$re = "/\\[\\[Image:(.*?)\\|(.*?)\\|(.*?)\\|(.*?)\\|\\[*(.*?)\\|*(.*?)\\]*/i";
Он должен найти 14 для этого теста, но я пока получаю 11, или, если я получаю 14, я получаю также некоторый шум вроде]] или просто части описания …
Как я могу включить необязательный случай с чем-то вроде этого [[(. *?)]] Внутри последней части?
Вы можете определить вложенные части раньше, используя такой синтаксис:
$pattern = '~
# definitions
(?(DEFINE)
(?<nested> \[\[ [^][]*+ (?:\[\[ \g<nested> ]] [^][]*)*+ ]] )
(?<part> [^][|]*+ (?: \g<nested> [^][|]* )*+ )
)
# main pattern
\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) \| (\g<part>) ]]
~ix';
Очевидно, вы можете быть более точным. Если вы уже знаете, что четвертая часть имеет размер, вы можете заменить ее:
\[\[ Image: (\g<part>) \| (\g<part>) \| (\g<part>) \| (\d+ px) \| (\g<part>) ]]
Вы также можете сделать некоторую часть необязательной, если это необходимо (например, с параметром выравнивания, который можно опустить):
\[\[ Image: (\g<part>) \| (\g<part>) (?:\| (\g<part>) )? \| (\d+ px) \| (\g<part>) ]]
Или вы можете сказать, что все параметры являются необязательными и могут встречаться только один раз, но в этом случае вам нужно быть точным:
~
(?(DEFINE)
(?<nested> \[\[ [^][]*+ (?: \[\[ \g<nested> ]] [^][]* )*+ ]] )
(?<part> [^][|]*+ (?: \g<nested> [^][|]* )*+ )
)
\[\[Image: (?<name> [^]|]* )
(?:
\|
(?: (?<align> left|right|center ) |
(?<type> thumb ) |
(?<size> \d+[a-z]{0,3} ) |
(?<description> \g<part> )
)
)*
]]
~ix
Это многострочное регулярное выражение использует следующие флаги: Игнорировать пробелы, Глобальные и Без учета регистра
[[]{2}Image:
([^|]*\.(?:jpe?g|svg))[|]
([^|]*)[|]
((?:[[]{2}[^\]]*\]\]|[^|[])*)[|]
(?:((?:[[]{2}[^\]]*\]\]|[^|[])*)[|])?
((?:[[]{2}[^\]]*\]\]|(?:(?!\]|\|).))*)
(?:[|]|\]\])
Это регулярное выражение будет делать следующее:
[[image:....]]
подстроки из вашего образца текста.jpg
, .jpeg
, или же .svg
, Вы можете удалить это поведение, удалив \.(?:jpe?g|svg)
построить.|
поля с разделителямиLive Demo
https://regex101.com/r/kI2wE5/2
Пример текста
Я взял на себя смелость потянуть все 14 матчей, но живое демо все еще есть ваш оригинальный текст
[[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004. Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]
Образцы матчей
[0][0] = [[Image:WilliamGodwin.jpg|thumb|right|150px|William Godwin]]
[0][1] = WilliamGodwin.jpg
[0][2] = thumb
[0][3] = right
[0][4] = 150px
[0][5] = William Godwin
[1][0] = [[Image:Pierre_Joseph_Proudhon.jpg|110px|thumb|left|Pierre Joseph Proudhon]]
[1][1] = Pierre_Joseph_Proudhon.jpg
[1][2] = 110px
[1][3] = thumb
[1][4] = left
[1][5] = Pierre Joseph Proudhon
[2][0] = [[Image:BenjaminTucker.jpg|thumb|150px|left|[[Benjamin Tucker]]]]
[2][1] = BenjaminTucker.jpg
[2][2] = thumb
[2][3] = 150px
[2][4] = left
[2][5] = [[Benjamin Tucker]]
[3][0] = [[Image:Bakuninfull.jpg|thumb|150px|right|[[Bakunin|Mikhail Bakunin 1814-1876]]]]
[3][1] = Bakuninfull.jpg
[3][2] = thumb
[3][3] = 150px
[3][4] = right
[3][5] = [[Bakunin|Mikhail Bakunin 1814-1876]]
[4][0] = [[Image:PeterKropotkin.jpg|thumb|150px|right|Peter Kropotkin]]
[4][1] = PeterKropotkin.jpg
[4][2] = thumb
[4][3] = 150px
[4][4] = right
[4][5] = Peter Kropotkin
[5][0] = [[Image:JohannMost.jpg|left|150px|thumb|[[Johann Most]] was an outspoken advocate of violence]]
[5][1] = JohannMost.jpg
[5][2] = left
[5][3] = 150px
[5][4] = thumb
[5][5] = [[Johann Most]] was an outspoken advocate of violence
[6][0] = [[Image:Flag of Anarcho syndicalism.svg|thumb|175px|The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.]]
[6][1] = Flag of Anarcho syndicalism.svg
[6][2] = thumb
[6][3] = 175px
[6][4] =
[6][5] = The red-and-black flag, coming from the experience of anarchists in the labour movement, is particularly associated with anarcho-syndicalism.
[7][0] = [[Image:CNT_tu_votar_y_ellos_deciden.jpg|thumb|175px|CNT propaganda from April 2004. Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.]]
[7][1] = CNT_tu_votar_y_ellos_deciden.jpg
[7][2] = thumb
[7][3] = 175px
[7][4] =
[7][5] = CNT propaganda from April 2004. Reads: Don't let the politicians rule our lives/ You vote and they decide/ Don't allow it/ Unity, Action, Self-management.
[8][0] = [[Image:CNT-armoured-car-factory.jpg|right|thumb|270px|[[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.]]
[8][1] = CNT-armoured-car-factory.jpg
[8][2] = right
[8][3] = thumb
[8][4] = 270px
[8][5] = [[Spain]], [[1936]]. Members of the [[CNT]] construct [[armoured car]]s to fight against the [[fascist]]s in one of the [[collectivisation|collectivised]] factories.
[9][0] = [[Image:LeoTolstoy.jpg|thumb|150px|[[Leo Tolstoy|Leo Tolstoy]] 1828-1910]]
[9][1] = LeoTolstoy.jpg
[9][2] = thumb
[9][3] = 150px
[9][4] =
[9][5] = [[Leo Tolstoy|Leo Tolstoy]] 1828-1910
[10][0] = [[Image:Goldman-4.jpg|thumb|left|150px|[[Emma Goldman]]]]
[10][1] = Goldman-4.jpg
[10][2] = thumb
[10][3] = left
[10][4] = 150px
[10][5] = [[Emma Goldman]]
[11][0] = [[Image:Murray Rothbard Smile.JPG|thumb|left|150px|[[Murray Rothbard]] (1926-1995)]]
[11][1] = Murray Rothbard Smile.JPG
[11][2] = thumb
[11][3] = left
[11][4] = 150px
[11][5] = [[Murray Rothbard]] (1926-1995)
[12][0] = [[Image:Hakim Bey.jpeg|thumb|right|[[Hakim Bey]]]]
[12][1] = Hakim Bey.jpeg
[12][2] = thumb
[12][3] = right
[12][4] =
[12][5] = [[Hakim Bey]]
[13][0] = [[Image:Noam_chomsky.jpg|thumb|150px|right| [[Noam Chomsky]] (1928–)]]
[13][1] = Noam_chomsky.jpg
[13][2] = thumb
[13][3] = 150px
[13][4] = right
[13][5] = [[Noam Chomsky]] (1928–)
NODE EXPLANATION
----------------------------------------------------------------------
[[]{2} any character of: '[' (2 times)
----------------------------------------------------------------------
Image: 'Image:'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
jp 'jp'
----------------------------------------------------------------------
e? 'e' (optional (matching the most
amount possible))
----------------------------------------------------------------------
g 'g'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
svg 'svg'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[|] any character of: '|'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
[|] any character of: '|'
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2} any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^|[] any character except: '|', '['
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
[|] any character of: '|'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2} any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^|[] any character except: '|', '['
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
[|] any character of: '|'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
[[]{2} any character of: '[' (2 times)
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[|] any character of: '|'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Хорошо, если я правильно понял, вы хотите только изображения со стилем, без описания.
Так что я думаю, что это может сработать для вас
\[\[Image:.*?[jpg|svg][^\s]+(?=\|)
Затем просто добавьте]] к своим матчам.
Что если вы просто сопоставите их с этим регулярным выражением: \[\[Image\:(.*)\]\]
а затем просто разделить каждый результат с |
, Не знаю, хорошая ли это идея, но в попытках нет вреда.