6 Commits

Author SHA1 Message Date
ado
50de5b3a5a Add const where fitting, make splitter class members private, add #pragma once to ssp.hpp 2024-03-14 04:40:07 +01:00
red0124
f4bca3915f Add [[nodiscard]] where fitting, update unit tests (#49) 2024-03-14 04:28:33 +01:00
ado
809939d0e2 Update parser error messages, fix parser tests 2024-03-13 19:39:07 +01:00
ado
b9f4afdd5f Add header and raw_header methods, update header usage methods error handling, write new and update existing unit tests 2024-03-13 17:15:31 +01:00
red0124
69875c238e Resolve clang-tidy warnings (#48)
* Resolve clang-tidy warnings, update single_header_generator.py

* Update single header test, resolve additional clang-tidy warnings
2024-03-12 18:31:24 +01:00
red0124
457defadaa Bugfix/odr violations (#47)
* Make common non-member functions inline, remove unreachable line from get_line_buffer

* [skip ci] Fix namespace comments
2024-03-12 10:22:10 +01:00
9 changed files with 210 additions and 343 deletions

View File

@@ -2,7 +2,7 @@ cmake_minimum_required(VERSION 3.14)
project(
ssp
VERSION 1.8.0
VERSION 1.7.2
DESCRIPTION "csv parser"
HOMEPAGE_URL "https://github.com/red0124/ssp"
LANGUAGES CXX

View File

@@ -74,7 +74,7 @@ Bill (Heath) Gates 65 3.3
# Single header
The library can be used with a single header file **`ssp.hpp`**, but it suffers a significant performance loss when converting floating point values since the **`fast_float`** library is not present within the file.
The library can be used with a single header file **`ssp.hpp`**, but it suffers a slight performance loss when converting floating point values since the **`fast_float`** library is not present within the file.
# Installation
@@ -92,7 +92,7 @@ The library supports [CMake](#Cmake) and [meson](#Meson) build systems
## Headers
The parser can be told to use only certain columns by parsing the header. This can be done with the **`use_fields`** method. It accepts any number of string-like arguments or even an **`std::vector<std::string>`** with the field names. If any of the fields are not found within the header or if any fields are defined multiple times it will result in an error.
The parser can be told to use only certain columns by parsing the header. This can be done by using the **`use_fields`** method. It accepts any number of string-like arguments or even an **`std::vector<std::string>`** with the field names. If any of the fields are not found within the header or if any fields are defined multiple times it will result in an error.
```shell
$ cat students_with_header.csv
Id,Age,Grade
@@ -116,7 +116,7 @@ James Bailey 2.5
Brian S. Wolfe 1.9
Bill (Heath) Gates 3.3
```
The header can be ignored using the **`ss::ignore_header`** [setup](#Setup) option or by calling the **`ignore_next`** method after the parser has been constructed. If the header has been ignored calling any method related to header usage will result in a compilation error.
The header can be ignored using the **`ss::ignore_header`** [setup](#Setup) option or by calling the **`ignore_next`** method after the parser has been constructed.
```cpp
ss::parser<ss::ignore_header> p{file_name};
```
@@ -124,10 +124,10 @@ The fields with which the parser works with can be modified at any given time. T
```cpp
// ...
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
p.use_fields("Grade");
p.use_fields("Id", "Grade");
const auto& grade = p.get_next<std::string>();
std::cout << grade << std::endl;
const auto& [id, grade] = p.get_next<std::string, float>();
std::cout << id << ' ' << grade << std::endl;
if (p.field_exists("Id")) {
p.use_fields("Grade", "Id");
@@ -139,32 +139,10 @@ The fields with which the parser works with can be modified at any given time. T
```
```shell
$ ./a.out
2.5
1.9 Brian S. Wolfe
3.3 Bill (Heath) Gates
James Bailey 2.5
40 Brian S. Wolfe
65 Bill (Heath) Gates
```
The header is parsed with the same rules as other rows, the only difference is that **`multiline`** will be disabled when parsing the header. To get the data that is
present in the header as a **`std::vector<std::string>`**, the **`header`** method can be used, and to get the header line before it has been parsed, the **`raw_header`** method can be used:
```cpp
// ...
ss::parser<ss::throw_on_error> p{"students_with_header.csv"};
std::cout << p.raw_header() << std::endl;
for (const auto& field: p.header()) {
std::cout << "> " << field << std::endl;
}
// ...
```
```shell
$ ./a.out
Id,Age,Grade
> Id
> Age
> Grade
```
Methods related to headers can also fail, the error handling of these is done in the same way as for other methods.
## Conversions
An alternate loop to the example above would look like:
```cpp

View File

@@ -531,6 +531,10 @@ private:
[[nodiscard]] bool strict_split(header_splitter& splitter,
std::string& header) {
if (header.empty()) {
return false;
}
if constexpr (throw_on_error) {
try {
splitter.split(header.data(), reader_.delim_);
@@ -558,6 +562,11 @@ private:
for (const auto& [begin, end] : splitter.get_split_data()) {
std::string field{begin, end};
if (field.empty()) {
handle_error_duplicate_header_field(field);
header_.clear();
return;
}
if (std::find(header_.begin(), header_.end(), field) !=
header_.end()) {
handle_error_duplicate_header_field(field);

View File

@@ -6,7 +6,7 @@ project(
'cpp_std=c++17',
'buildtype=debugoptimized',
'wrap_mode=forcefallback'],
version: '1.8.0',
version: '1.7.2',
meson_version:'>=0.54.0')
fast_float_dep = dependency('fast_float')

View File

@@ -2805,6 +2805,10 @@ private:
[[nodiscard]] bool strict_split(header_splitter& splitter,
std::string& header) {
if (header.empty()) {
return false;
}
if constexpr (throw_on_error) {
try {
splitter.split(header.data(), reader_.delim_);
@@ -2832,6 +2836,11 @@ private:
for (const auto& [begin, end] : splitter.get_split_data()) {
std::string field{begin, end};
if (field.empty()) {
handle_error_duplicate_header_field(field);
header_.clear();
return;
}
if (std::find(header_.begin(), header_.end(), field) !=
header_.end()) {
handle_error_duplicate_header_field(field);

View File

@@ -33,11 +33,10 @@ set(DOCTEST "${FETCHCONTENT_BASE_DIR}/doctest-src")
enable_testing()
foreach(name IN ITEMS test_splitter test_parser1_1 test_parser1_2
test_parser1_3 test_parser1_4 test_parser1_5
test_converter test_extractions test_parser2_1
test_parser2_2 test_parser2_3 test_parser2_4
test_parser2_5 test_parser2_6
test_extractions_without_fast_float)
test_parser1_3 test_parser1_4 test_converter
test_extractions test_parser2_1 test_parser2_2
test_parser2_3 test_parser2_4 test_parser2_5
test_parser2_6 test_extractions_without_fast_float)
add_executable("${name}" "${name}.cpp")
target_link_libraries("${name}" PRIVATE ssp::ssp fast_float
doctest::doctest)

View File

@@ -6,7 +6,6 @@ tests = [
'parser1_2',
'parser1_3',
'parser1_4',
'parser1_5',
'splitter',
'converter',
'extractions',

View File

@@ -178,8 +178,7 @@ void test_invalid_fields(const std::vector<std::string>& lines,
auto check_header = [&lines](auto& p) {
if (lines.empty()) {
CHECK_EQ(p.header().size(), 1);
CHECK_EQ(p.header().at(0), "");
CHECK(p.header().empty());
CHECK_EQ(merge_header(p.header(), ","), p.raw_header());
} else {
CHECK_EQ(lines[0], merge_header(p.header()));
@@ -264,7 +263,7 @@ void test_invalid_fields(const std::vector<std::string>& lines,
}
}
TEST_CASE_TEMPLATE("test invalid header fields usage", T,
TEST_CASE_TEMPLATE("test invalid fheader fields usage", T,
ParserOptionCombinations) {
test_invalid_fields<T>({}, {});
@@ -398,3 +397,178 @@ TEST_CASE_TEMPLATE("test invalid rows with header", T,
CHECK_EQ(merge_header(p.header()), p.raw_header());
}
}
TEST_CASE_TEMPLATE("test invalid header", T, ParserOptionCombinations) {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"invalid_header"};
// Empty header
{
std::ofstream out{f.name};
out << "" << std::endl;
out << "1" << std::endl;
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK(p.header().empty());
CHECK_EQ(merge_header(p.header()), p.raw_header());
CHECK(p.valid());
}
// Unterminated quote in header
{
std::ofstream out{f.name};
out << "\"Int" << std::endl;
out << "1" << std::endl;
}
{
auto [p, _] =
make_parser<buffer_mode, ErrorMode, ss::quote<'"'>>(f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "\"Int");
}
{
auto [p, _] =
make_parser<buffer_mode, ErrorMode, ss::quote<'"'>, ss::multiline>(
f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "\"Int");
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode, ss::quote<'"'>,
ss::escape<'\\'>, ss::multiline>(f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "\"Int");
}
// Unterminated escape in header
{
std::ofstream out{f.name};
out << "Int\\" << std::endl;
out << "1" << std::endl;
}
{
auto [p, _] =
make_parser<buffer_mode, ErrorMode, ss::escape<'\\'>>(f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "Int\\");
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode, ss::escape<'\\'>,
ss::multiline>(f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "Int\\");
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode, ss::escape<'\\'>,
ss::quote<'"'>, ss::multiline>(f.name);
auto command = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command);
CHECK_EQ(p.raw_header(), "Int\\");
}
}
template <typename T>
void test_ignore_empty(const std::vector<X>& data) {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"ignore_empty"};
make_and_write(f.name, data);
std::vector<X> expected;
for (const auto& d : data) {
if (d.s != X::empty) {
expected.push_back(d);
}
}
{
auto [p, _] =
make_parser<buffer_mode, ErrorMode, ss::ignore_empty>(f.name, ",");
std::vector<X> i;
for (const auto& a : p.template iterate<X>()) {
i.push_back(a);
}
CHECK_EQ(i, expected);
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name, ",");
std::vector<X> i;
size_t n = 0;
while (!p.eof()) {
try {
++n;
const auto& a = p.template get_next<X>();
if (data.at(n - 1).s == X::empty) {
CHECK_FALSE(p.valid());
continue;
}
i.push_back(a);
} catch (...) {
CHECK_EQ(data.at(n - 1).s, X::empty);
}
}
CHECK_EQ(i, expected);
}
}
TEST_CASE_TEMPLATE("test various cases with empty lines", T,
ParserOptionCombinations) {
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {5, 6, X::empty}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {5, 6, X::empty}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>({{1, 2, X::empty},
{3, 4, X::empty},
{9, 10, X::empty},
{11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, X::empty}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, X::empty}, {9, 10, X::empty}, {11, 12, "w"}});
test_ignore_empty<T>({{11, 12, X::empty}});
test_ignore_empty<T>({});
}

View File

@@ -1,301 +0,0 @@
#include "test_parser1.hpp"
TEST_CASE_TEMPLATE("test empty fields header", T, ParserOptionCombinations) {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"empty_fields_header"};
// Empty header
{
std::ofstream out{f.name};
out << "" << std::endl;
out << "1" << std::endl;
}
{
std::vector<std::string> expected_header = {""};
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK_EQ_ARRAY(expected_header, p.header());
CHECK_EQ("", p.raw_header());
CHECK(p.valid());
}
// All empty header fields
{
std::ofstream out{f.name};
out << ",," << std::endl;
out << "1,2,3" << std::endl;
}
{
std::vector<std::string> expected_header = {"", "", ""};
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK_EQ_ARRAY(expected_header, p.header());
CHECK_EQ(",,", p.raw_header());
CHECK(p.valid());
auto command1 = [&p = p] { std::ignore = p.field_exists("Int"); };
expect_error_on_command(p, command1);
auto command2 = [&p = p] { p.use_fields("Int"); };
expect_error_on_command(p, command2);
}
// One empty field
const std::vector<std::string> valid_fields = {"Int0", "Int1", ""};
using svec = std::vector<std::string>;
const std::vector<std::vector<std::string>> valid_field_combinations =
{svec{"Int0"},
svec{"Int1"},
svec{""},
svec{"", "Int0"},
svec{"Int0", "Int1"},
svec{"Int1", ""},
svec{"Int0", "", "Int1"},
svec{"", "Int1", "Int0"}};
// Last header field empty
{
std::ofstream out{f.name};
out << "Int0,Int1," << std::endl;
out << "1,2,3" << std::endl;
}
{
std::vector<std::string> expected_header = {"Int0", "Int1", ""};
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK_EQ_ARRAY(expected_header, p.header());
CHECK_EQ("Int0,Int1,", p.raw_header());
CHECK(p.valid());
for (const auto& field : valid_fields) {
CHECK(p.field_exists(field));
CHECK(p.valid());
}
for (const auto& fields : valid_field_combinations) {
p.use_fields(fields);
CHECK(p.valid());
}
}
// First header field empty
{
std::ofstream out{f.name};
out << ",Int0,Int1" << std::endl;
out << "1,2,3" << std::endl;
}
{
std::vector<std::string> expected_header = {"", "Int0", "Int1"};
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK_EQ_ARRAY(expected_header, p.header());
CHECK_EQ(",Int0,Int1", p.raw_header());
CHECK(p.valid());
for (const auto& field : valid_fields) {
CHECK(p.field_exists(field));
CHECK(p.valid());
}
for (const auto& fields : valid_field_combinations) {
p.use_fields(fields);
CHECK(p.valid());
}
}
// Middle header field empty
{
std::ofstream out{f.name};
out << "Int0,,Int1" << std::endl;
out << "1,2,3" << std::endl;
}
{
std::vector<std::string> expected_header = {"Int0", "", "Int1"};
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name);
CHECK_EQ_ARRAY(expected_header, p.header());
CHECK_EQ("Int0,,Int1", p.raw_header());
CHECK(p.valid());
for (const auto& field : valid_fields) {
CHECK(p.field_exists(field));
CHECK(p.valid());
}
for (const auto& fields : valid_field_combinations) {
p.use_fields(fields);
CHECK(p.valid());
}
}
}
template <typename T, typename... Ts>
void test_unterminated_quote_header() {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"unterminated_quote_header"};
{
std::ofstream out{f.name};
out << "\"Int" << std::endl;
out << "1" << std::endl;
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode, Ts...>(f.name);
auto command0 = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command0);
CHECK_EQ(p.raw_header(), "\"Int");
auto command1 = [&p = p] { std::ignore = p.field_exists("Int"); };
expect_error_on_command(p, command1);
auto command2 = [&p = p] { p.use_fields("Int"); };
expect_error_on_command(p, command2);
}
}
TEST_CASE_TEMPLATE("test unterminated quote header", T,
ParserOptionCombinations) {
using quote = ss::quote<'"'>;
using escape = ss::escape<'\\'>;
test_unterminated_quote_header<T, quote>();
test_unterminated_quote_header<T, quote, ss::multiline>();
test_unterminated_quote_header<T, quote, escape>();
test_unterminated_quote_header<T, quote, escape, ss::multiline>();
}
template <typename T, typename... Ts>
void test_unterminated_escape_header() {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"unterminated_escape_header"};
// Unterminated escape in header
{
std::ofstream out{f.name};
out << "Int\\" << std::endl;
out << "1" << std::endl;
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode, Ts...>(f.name);
auto command0 = [&p = p] { std::ignore = p.header(); };
expect_error_on_command(p, command0);
CHECK_EQ(p.raw_header(), "Int\\");
auto command1 = [&p = p] { std::ignore = p.field_exists("Int"); };
expect_error_on_command(p, command1);
auto command2 = [&p = p] { p.use_fields("Int"); };
expect_error_on_command(p, command2);
}
}
TEST_CASE_TEMPLATE("test unterminated escape header", T,
ParserOptionCombinations) {
using quote = ss::quote<'"'>;
using escape = ss::escape<'\\'>;
test_unterminated_escape_header<T, escape>();
test_unterminated_escape_header<T, escape, ss::multiline>();
test_unterminated_escape_header<T, escape, quote>();
test_unterminated_escape_header<T, escape, quote, ss::multiline>();
}
template <typename T>
void test_ignore_empty(const std::vector<X>& data) {
constexpr auto buffer_mode = T::BufferMode::value;
using ErrorMode = typename T::ErrorMode;
unique_file_name f{"ignore_empty"};
make_and_write(f.name, data);
std::vector<X> expected;
for (const auto& d : data) {
if (d.s != X::empty) {
expected.push_back(d);
}
}
{
auto [p, _] =
make_parser<buffer_mode, ErrorMode, ss::ignore_empty>(f.name, ",");
std::vector<X> i;
for (const auto& a : p.template iterate<X>()) {
i.push_back(a);
}
CHECK_EQ(i, expected);
}
{
auto [p, _] = make_parser<buffer_mode, ErrorMode>(f.name, ",");
std::vector<X> i;
size_t n = 0;
while (!p.eof()) {
try {
++n;
const auto& a = p.template get_next<X>();
if (data.at(n - 1).s == X::empty) {
CHECK_FALSE(p.valid());
continue;
}
i.push_back(a);
} catch (...) {
CHECK_EQ(data.at(n - 1).s, X::empty);
}
}
CHECK_EQ(i, expected);
}
}
TEST_CASE_TEMPLATE("test various cases with empty lines", T,
ParserOptionCombinations) {
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {5, 6, X::empty}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {5, 6, X::empty}, {9, 10, "v"}, {11, 12, "w"}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, "v"}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, "y"}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, "y"}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>({{1, 2, X::empty},
{3, 4, X::empty},
{9, 10, X::empty},
{11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, "x"}, {3, 4, X::empty}, {9, 10, X::empty}, {11, 12, X::empty}});
test_ignore_empty<T>(
{{1, 2, X::empty}, {3, 4, X::empty}, {9, 10, X::empty}, {11, 12, "w"}});
test_ignore_empty<T>({{11, 12, X::empty}});
test_ignore_empty<T>({});
}