linux — C ++ обрезает символ (ы) при чтении строк из файла

Question

linux — C ++ обрезает символ (ы) при чтении строк из файла

Я знаю, что это связано с различиями между указателями конца строки в Windows и Linux, но я не знаю, как это исправить.

Я посмотрел на пост
Получение std :: ifstream для обработки LF, CR и CRLF?
но когда я использовал упрощенную версию из этого поста (я использовал прямое чтение вместо буферизованного чтения, зная, что было снижение производительности, но пока хочу сохранить его простым), это не решило мою проблему, поэтому я надеюсь, что некоторые указания здесь , Я протестировал свою модифицированную версию поста, и она успешно нашла и заменила символы и вкладку, которую я временно использовал для тестового сценария, поэтому логика работает, но у меня все еще есть проблема.

Я знаю, что мне здесь не хватает чего-то очень простого, и я, вероятно, буду чувствовать себя очень глупо, когда кто-то поможет мне разобраться в этом, поэтому я бы предпочел не признавать свою глупость публично, но я работаю над этим уже неделю и не могу ее решить поэтому я обращаюсь за помощью.

Я новичок в C ++, поэтому, пожалуйста, будьте осторожны в ваших ответах, если я делаю что-то действительно нуби здесь 🙂

У меня есть следующая однофайловая программа, созданная мной для создания прототипа того, что я хочу сделать. Так что это простой пример, но мне нужно заставить это работать дальше. Это НЕ домашняя проблема; Мне действительно нужно решить эту проблему, чтобы создать приложение.

Программа (показана ниже):

компилируется без ошибок и предупреждений и работает на компьютере CentOS без ошибок;
кросс-компиляция без ошибок или предупреждений с использованием mingw32 на коробке CentOS и работает без ошибок в Windows;
выдает правильный (ожидаемый) вывод как в Linux, так и в Windows, когда я использую входной текстовый файл, созданный в Linux
НЕ выдает правильный (ожидаемый) вывод при использовании входного текстового файла, созданного в Windows

Так что да, это как-то связано с различными форматами файлов между linux и Windows, и это, вероятно, связано с кодами новой строки, но я пытался приспособиться к этому, и это не работает.

Чтобы сделать его более сложным, я обнаружил, что старые символы новой строки Mac снова отличаются:

linux = \ n
Windows = \ r \ n
Mac = \ r

.

Пожалуйста помоги! …

.

Я бы хотел:

читать в текстовом файле
выполнить некоторые проверки правильности содержимого (здесь не сделано; мы сделаем следующее)
вывести отчет в другой текстовый файл

поэтому мне нужно проверить файл, определить используемые символы новой строки и соответственно обработать

Какие-либо предложения?

Мой текущий (упрощенный) код (без проверок еще):

[код]

int main(int argc, char** argv)
{
std::string rc_input_file_name = "rc_input_file.txt";
std::string rc_output_file_name = "rc_output_file.txt";

char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

bool failure_flag = false;

std::ifstream rc_input_file_holder;
rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

if ( ! rc_input_file_holder.is_open() )
{
std::cout << "Error - Could not open the input file" << std::endl;
failure_flag = true;
}
else
{
std::ofstream rc_output_file_holder;
rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

if ( ! rc_output_file_holder.is_open() )
{
std::cout << "Error - Could not open or create the output file" << std::endl;
failure_flag = true;
}
else
{
std::streampos char_num = 0;

long int line_num = 0;
long int starting_char_pos = 0;

std::string file_line = "";
while ( getline( rc_input_file_holder , file_line ) )
{
line_num = line_num + 1;
long int file_line_length = file_line.length() +1 ;
long int char_num = 0;
for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
{
if ( file_line[ char_num ] == '\n' )
{
if ( char_num == file_line_length - 1 )
{
file_line[ char_num ] = '-';
}
else
{
if ( file_line[ char_num + 1 ] == '\n' )
{
file_line[ char_num ] = ' ';
}
else
{
file_line[ char_num ] = ' ';
}
}
}
}

int field_display_width = 4;
std::cout << "Line " << std::setw( field_display_width ) << line_num <<
", starting at character position " << std::setw( field_display_width ) << starting_char_pos <<
", contains " << file_line << "." << std::endl;

starting_char_pos = rc_input_file_holder.tellg();

rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
}

rc_input_file_holder.close();
rc_output_file_holder.close();
delete [] RC_INPUT_FILE_NAME;
delete [] RC_OUTPUT_FILE_NAME;
}
}

if ( failure_flag )
{
return EXIT_FAILURE;
}
else
{
return EXIT_SUCCESS;
}
}

[/код]

Тот же код с множеством комментариев (для моей пользы в качестве опыта обучения):

[код]

/*
* The main function, from which all else is accessed
*/
int main(int argc, char** argv)
{/*
*Program to:
*  1) read from a text file
*  2) do some validation checks on the content of that text file
*  3) output a report to another text file
*/

// Set the filenames to be used in this file-handling program
std::string rc_input_file_name = "rc_input_file.txt";
std::string rc_output_file_name = "rc_output_file.txt";

// Note that when the filenames are used in the .open statements below
//   they have to be in a cstring format, not a string format
//   so the conversion is done here once
// Use the Capitalized form of the file name to indicate the converted value
//   (remember, variable names are case-sensitive in C/C++ so NAME is different than name)
// This conversion could be done 3 ways:
// - done each time the cstring is needed:
//          file_holder_name.open( string_file_name.c_str() )
// - done once and referred to each time
//     simple method:
//          const char * converted_file_name = string_file_name.c_str()
//     explicit method (2-step):
//          char * converted_file_name = new char[ string_file_name.length() + 1 ];
//          strcpy( converted_file_name, string_file_name.c_str() );
// This program uses the explicit method to do it once for each filename
// because by doing so, the char array created has variable length
// and you do not risk buffer overflow
char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

// This will be set to true if either the input or output file cannot be opened
bool failure_flag = false;

// Open the input file
std::ifstream rc_input_file_holder;
rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

// Validate that the input file was properly opened/created
// If not, set failure flag
if ( ! rc_input_file_holder.is_open() )
{
// Could not open the input file; set failure flag to true
std::cout << "Error - Could not open the input file" << std::endl;
failure_flag = true;
}
else
{
// Open the output file
// Create one if none previously existed
// Erase the contents if it already existed
std::ofstream rc_output_file_holder;
rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

// Validate that the output file was properly opened/created
// If not, set failure flag
if ( ! rc_output_file_holder.is_open() )
{
// Could not open the output file; set failure flag to true
std::cout << "Error - Could not open or create the output file" << std::endl;
failure_flag = true;
}
else
{
// Get the current position where the character pointer is at
// Get it before the getline is executed so it gives you where the current line starts
std::streampos char_num = 0;

// Initialize the line_number and starting character position to 0
long int line_num = 0;
long int starting_char_pos = 0;

std::string file_line = "";
while ( getline( rc_input_file_holder , file_line ) )
{
// Set the line number counter to the current line (first line is Line 1, not 0)
line_num = line_num + 1;// Check if the new line designator uses the standard for:
//   - linux (\n)
//   - Windows (\r\n)
//   - Old Mac (\r)
// Convert any non-linux new line designator to linux new line designator (\n)
long int file_line_length = file_line.length() +1 ;
long int char_num = 0;
for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
{
// If a \r character is found, decide what to do with it
if ( file_line[ char_num ] == '\n' )
{
// If the \r char  is the last line character (before the null terminator)
//   the file use the old Mac format to indicate new line
//   so replace the \r with \n
if ( char_num == file_line_length - 1 )
{
file_line[ char_num ] = '-';
}
else
// If the \r char is NOT the last line character (before the null terminator)
{
// If the next character is a \n, the file uses the Windows format to indicate new line
//   so replace the \r with space
if ( file_line[ char_num + 1 ] == '\n' )
{
file_line[ char_num ] = ' ';
}
// If the next char is NOT a \n (and the pointer is NOT at the last line character)
//   then for some reason, there is a \r in the interior of the string
// At this point, I do  not know why this would be
//   but I don't want it left there, so replace it with a space
// Yes, I  know this is the same as the above action,
//   but I left is separate to allow for future flexibility
else
{
file_line[ char_num ] = '-';
}
}
}
}// Output the contents of the line just fetched
// This is done in this prototype file as a placeholder
// In the real program, this is where the validation check(s) for the line would occur)
//   and would likely be done in a function or class
// The setw() function requires #include <iomanip>
int field_display_width = 4;
std::cout << "Line " << std::setw( field_display_width ) << line_num <<
", starting at character position " << std::setw( field_display_width ) << starting_char_pos <<
", contains " << file_line << "." << std::endl;

// Reset the character pointer to the end of this line => start of next line
starting_char_pos = rc_input_file_holder.tellg();

// Output the (edited) contents of the line just fetched
// This is done in this prototype file as a placeholder
// In the real program, this is where the results of the validation checks would be recorded
// You could put this in an if statement and record nothing if the line was valid
rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
}

// Clean up by:
//  - closing the files that were opened (input and output)
//  - deleting the character arrays created
rc_input_file_holder.close();
rc_output_file_holder.close();
delete [] RC_INPUT_FILE_NAME;
delete [] RC_OUTPUT_FILE_NAME;
}
}

// Check to see if all operations have successfully completed
// If so exit this program with success indicated
// If not,exit this program with failure indicated
if ( failure_flag )
{
return EXIT_FAILURE;
}
else
{
return EXIT_SUCCESS;
}
}

[/код]

У меня есть все необходимые функции, и при компиляции для Linux или кросс-компиляции для Windows не возникает никаких ошибок или предупреждений.

Входной файл, который я использую, содержит всего 5 строк (глупого) текста:

A new beginning
just in case
the file was corrupted
and the darn program was working fine ...
at least it was on linux

и вывод на Linux, как и ожидалось:

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   16, contains just in case.
Line    3, starting at character position   29, contains the file was corrupted.
Line    4, starting at character position   52, contains and the darn program was working fine ....
Line    5, starting at character position   94, contains at least it was on linux.

Вывод в Windows такой же, когда я импортирую текстовый файл, созданный в Linux, но когда я использую блокнот и вручную воссоздаю тот же файл в Windows, вывод

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   20, contains t in case.
Line    3, starting at character position   33, contains e file was corrupted.
Line    4, starting at character position   56, contains nd the darn program was working fine ....
Line    5, starting at character position   98, contains at least it was on linux.

Обратите внимание на различия в начальной позиции символа для строк 2,3,4 и 5
Обратите внимание на отсутствующие символы в начале строк 2,3 и 4

3 строки отсутствуют в строке 2
2 строки отсутствуют в строке 3
1 строка отсутствует в строке 5
В строке 5 пропущено 0 символов

Любые идеи приветствуются …

0

c++eol linux windows

Решение

Другие решения

Источник

Accepted Answer

Посмотреть разрешение на

кросс-компилятор устарел

Для этого кросс-компилятор mingw, установленный через apt-get install, устарел и устарел. Когда я вручную установил обновленный кросс-компилятор и обновил настройки, чтобы предотвратить появление некоторых сообщений об ошибках, все работало нормально.

0