Robinson, Paul via llvm-dev
2020-Jun-15 14:33 UTC
[llvm-dev] FileCheck: using numeric variable defined on same line with caveats
Before addressing the CHECK-NOT case, I’m still unclear about the DAG case.
What should the first DAG line match? The regex matching would first attempt to
match “10 12” but the expression evaluation would fail; so the DAG candidate
wouldn’t match; does this mean the DAG matching does not continue searching, and
the test fails? Or would we restart the search…. where? With “0 12” (skipping
only one character from the previous fail)? In that case it would ultimately
match “12 13” from the first line. Or would it skip the entire previous
candidate, and start searching at “ 13”? In which case it would ultimately
match “10 11” on the second line.
In any case (if the first DAG ultimately matches something), the third DAG line
would match the first previously unmatched text in the DAG search range, which
would be either “10 “ or “10 12 13” from the first line, depending on the answer
to the previous paragraph.
--paulr
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James
Henderson via llvm-dev
Sent: Monday, June 15, 2020 4:08 AM
To: Thomas Preud'homme <thomasp at graphcore.ai>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
I think I already gave my opinion on one of the previous patches, regarding
CHECK-NOT, which approximately came to the same conclusion as what you've
got here, so +1 from me. I also think the CHECK-DAG example is not one to care
about. It seems to me that there's no guarantee what CHECK-DAG:
[[LINE_AFTER_FOO:.*]] would match, as, if I followed it correctly, CHECK-DAGs
don't have any guarantee of order within a group, so it could match either
the next line after BEGIN, the line after [[#VAR1:]] [[#VAR1+1]] or indeed any
line before END.
James
On Thu, 11 Jun 2020 at 12:29, Thomas Preud'homme via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi,
TL;DR: Is it ok to allow numeric variables used on same line as defined except
for CHECK-NOT and with false negatives?
FileCheck does not currently allow a numeric variable from being used on the
same line they were defined. I have a tentative patch to add that support but it
comes with caveats so before going through review I'd like to get consensus
on whether those caveats are acceptable.
== The problem =
The problem with matching variables defined on the same line is that the
matching is done separately from checking the numeric relation, because numeric
relation cannot be expressed in regex. That is, when matching [[#VAR:]]
[[#VAR+1]] FileCheck is first matching the input against ([0-9]+) ([0-9]+) and
then the value of the two captured integer are checked.
This can lead to at times confusing or downward wrong outcomes. Consider the
following input with the CHECK pattern mentioned above:
10 12 13
The regex would match numbers 10 and 12 and fail the CHECK directive despite 12
and 13 verifying the +1 relation. This could happen as a result of a change in
the input after a new commit has landed. In the case of a CHECK directive, it
would make the test regress and a developer would need to tighten the pattern
somehow, for instance by chaning it for [[#VAR:]] [[#VAR+1]]{{$}}. Now in the
context of a CHECK-NOT this could be a change from input 10 12 14 to 10 12 13
and the pattern would still fail to match and thus the test still pass despite
the compiler having regressed.
== Proposed "solution" =
Given the above, we can summarize the risks of supporting numeric expression
using a variable defined on the same line to:
* test regression on positive matching directives (CHECK, CHECK-NEXT, ...)
* silent compiler regression on negative matching directives (CHECK-NOT)
I am therefore proposing to prevent using numeric variables defined on the same
line for negative matching directives but allow it for positive matching
directives with a note in the documentation to be careful to make the pattern as
tight as possible.
== CHECK-DAG case =
CHECK-DAG is interesting because despite it being a positive matching directive,
there's a risk with CHECK-DAG in case a test rely on the way CHECK-DAG is
implemented. Consider the following directives which rely on each directive
being matched in order:
CHECK: BEGIN
CHECK-DAG: [[#VAR1:]] [[#VAR1+1]]
CHECK-DAG: FOO
CHECK-DAG: [[LINE_AFTER_FOO:.*]]
CHECK: END
CHECK-NOT: [[LINE_AFTER_FOO]] BAZ
This could be written if the line checked by the first CHECK-DAG is guaranteed
to always be either before FOO or after the line after FOO. Now consider the
following input that verifies this invariant:
BEGIN
10 12 13
FOO 10 11
FOOBAR
END
10 12 13 FOOBAR BAZ
The expectation from the test author relying on the CHECK-DAG behavior would be
for LINE_AFTER_FOO to have the value FOOBAR once the CHECK-DAG block has
matched. However due to the caveats mentioned above it would end up being set to
"10 12 13" and thus the CHECK-NOT would pass because "10 12
13" is not followed by "BAZ". That's far fetched though,
I'm not convinced we should worry about this beyond documenting CHECK-DAG as
being able to match in any order.
Thoughts?
Best regards,
Thomas
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200615/4e66ee69/attachment-0001.html>
Thomas Preud'homme via llvm-dev
2020-Jun-15 14:59 UTC
[llvm-dev] FileCheck: using numeric variable defined on same line with caveats
Hi Paul,
Thanks for your question, for some reason I was thinking of CHECK-DAG matching
as trying line by line instead of looking for the first match from the start of
the block. To answer the first question, the first CHECK-DAG would fail to match
altogether since the regex would match 10 12 as you pointed out which
wouldn't satisfy the operation. I don't think we should skip and try
matching again as it is difficult in the general case (think about CHECK-DAG:
[[#NUMVAR:]]{{.*}}[[#NUMVAR+1]] and how to deal with the same input 10 12 13).
So my point is completely moot, for a valid input either a DAG match is found
and it's a legitimate match, or a match is not found and the failure will be
on the line with the use of a variable defined on the same line which would not
be too surprising. My apologies for the confusion.
So my questions should thus be:
* are we fine with false negative (failing on valid input due to regex
engine not understanding numeric values)
* can you think of any situation that would lead to a false positive
(directive match on invalid input) besides CHECK-NOT?
Best regards,
Thomas
________________________________
From: Robinson, Paul <paul.robinson at sony.com>
Sent: 15 June 2020 15:33
To: jh7370.2008 at my.bristol.ac.uk <jh7370.2008 at my.bristol.ac.uk>;
Thomas Preud'homme <thomasp at graphcore.ai>; 'llvm-dev at
lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
Before addressing the CHECK-NOT case, I’m still unclear about the DAG case.
What should the first DAG line match? The regex matching would first attempt to
match “10 12” but the expression evaluation would fail; so the DAG candidate
wouldn’t match; does this mean the DAG matching does not continue searching, and
the test fails? Or would we restart the search…. where? With “0 12” (skipping
only one character from the previous fail)? In that case it would ultimately
match “12 13” from the first line. Or would it skip the entire previous
candidate, and start searching at “ 13”? In which case it would ultimately
match “10 11” on the second line.
In any case (if the first DAG ultimately matches something), the third DAG line
would match the first previously unmatched text in the DAG search range, which
would be either “10 “ or “10 12 13” from the first line, depending on the answer
to the previous paragraph.
--paulr
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James
Henderson via llvm-dev
Sent: Monday, June 15, 2020 4:08 AM
To: Thomas Preud'homme <thomasp at graphcore.ai>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
I think I already gave my opinion on one of the previous patches, regarding
CHECK-NOT, which approximately came to the same conclusion as what you've
got here, so +1 from me. I also think the CHECK-DAG example is not one to care
about. It seems to me that there's no guarantee what CHECK-DAG:
[[LINE_AFTER_FOO:.*]] would match, as, if I followed it correctly, CHECK-DAGs
don't have any guarantee of order within a group, so it could match either
the next line after BEGIN, the line after [[#VAR1:]] [[#VAR1+1]] or indeed any
line before END.
James
On Thu, 11 Jun 2020 at 12:29, Thomas Preud'homme via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi,
TL;DR: Is it ok to allow numeric variables used on same line as defined except
for CHECK-NOT and with false negatives?
FileCheck does not currently allow a numeric variable from being used on the
same line they were defined. I have a tentative patch to add that support but it
comes with caveats so before going through review I'd like to get consensus
on whether those caveats are acceptable.
== The problem =
The problem with matching variables defined on the same line is that the
matching is done separately from checking the numeric relation, because numeric
relation cannot be expressed in regex. That is, when matching [[#VAR:]]
[[#VAR+1]] FileCheck is first matching the input against ([0-9]+) ([0-9]+) and
then the value of the two captured integer are checked.
This can lead to at times confusing or downward wrong outcomes. Consider the
following input with the CHECK pattern mentioned above:
10 12 13
The regex would match numbers 10 and 12 and fail the CHECK directive despite 12
and 13 verifying the +1 relation. This could happen as a result of a change in
the input after a new commit has landed. In the case of a CHECK directive, it
would make the test regress and a developer would need to tighten the pattern
somehow, for instance by chaning it for [[#VAR:]] [[#VAR+1]]{{$}}. Now in the
context of a CHECK-NOT this could be a change from input 10 12 14 to 10 12 13
and the pattern would still fail to match and thus the test still pass despite
the compiler having regressed.
== Proposed "solution" =
Given the above, we can summarize the risks of supporting numeric expression
using a variable defined on the same line to:
* test regression on positive matching directives (CHECK, CHECK-NEXT, ...)
* silent compiler regression on negative matching directives (CHECK-NOT)
I am therefore proposing to prevent using numeric variables defined on the same
line for negative matching directives but allow it for positive matching
directives with a note in the documentation to be careful to make the pattern as
tight as possible.
== CHECK-DAG case =
CHECK-DAG is interesting because despite it being a positive matching directive,
there's a risk with CHECK-DAG in case a test rely on the way CHECK-DAG is
implemented. Consider the following directives which rely on each directive
being matched in order:
CHECK: BEGIN
CHECK-DAG: [[#VAR1:]] [[#VAR1+1]]
CHECK-DAG: FOO
CHECK-DAG: [[LINE_AFTER_FOO:.*]]
CHECK: END
CHECK-NOT: [[LINE_AFTER_FOO]] BAZ
This could be written if the line checked by the first CHECK-DAG is guaranteed
to always be either before FOO or after the line after FOO. Now consider the
following input that verifies this invariant:
BEGIN
10 12 13
FOO 10 11
FOOBAR
END
10 12 13 FOOBAR BAZ
The expectation from the test author relying on the CHECK-DAG behavior would be
for LINE_AFTER_FOO to have the value FOOBAR once the CHECK-DAG block has
matched. However due to the caveats mentioned above it would end up being set to
"10 12 13" and thus the CHECK-NOT would pass because "10 12
13" is not followed by "BAZ". That's far fetched though,
I'm not convinced we should worry about this beyond documenting CHECK-DAG as
being able to match in any order.
Thoughts?
Best regards,
Thomas
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200615/0cd09199/attachment.html>
Robinson, Paul via llvm-dev
2020-Jun-15 15:52 UTC
[llvm-dev] FileCheck: using numeric variable defined on same line with caveats
Any kind of variable definition on a CHECK-NOT line would seem like it would be
asking for trouble. Do we allow text variable definitions on a NOT?
False fails are better than false matches. Given that it will fail on a line
where you'd expect a match, or possibly for the line to be skipped, it's
a matter of refining the match expression, which is something that you have to
do sometimes anyway. The two-level matching process (regex first, evaluation
later) might be surprising to people, and I'd hope the diagnostic would give
a hint in that direction.
--paulr
From: Thomas Preud'homme <thomasp at graphcore.ai>
Sent: Monday, June 15, 2020 10:59 AM
To: Robinson, Paul <paul.robinson at sony.com>; jh7370.2008 at
my.bristol.ac.uk; 'llvm-dev at lists.llvm.org' <llvm-dev at
lists.llvm.org>
Subject: Re: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
Hi Paul,
Thanks for your question, for some reason I was thinking of CHECK-DAG matching
as trying line by line instead of looking for the first match from the start of
the block. To answer the first question, the first CHECK-DAG would fail to match
altogether since the regex would match 10 12 as you pointed out which
wouldn't satisfy the operation. I don't think we should skip and try
matching again as it is difficult in the general case (think about CHECK-DAG:
[[#NUMVAR:]]{{.*}}[[#NUMVAR+1]] and how to deal with the same input 10 12 13).
So my point is completely moot, for a valid input either a DAG match is found
and it's a legitimate match, or a match is not found and the failure will be
on the line with the use of a variable defined on the same line which would not
be too surprising. My apologies for the confusion.
So my questions should thus be:
* are we fine with false negative (failing on valid input due to regex
engine not understanding numeric values)
* can you think of any situation that would lead to a false positive
(directive match on invalid input) besides CHECK-NOT?
Best regards,
Thomas
________________________________
From: Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at
sony.com>>
Sent: 15 June 2020 15:33
To: jh7370.2008 at my.bristol.ac.uk<mailto:jh7370.2008 at
my.bristol.ac.uk> <jh7370.2008 at my.bristol.ac.uk<mailto:jh7370.2008
at my.bristol.ac.uk>>; Thomas Preud'homme <thomasp at
graphcore.ai<mailto:thomasp at graphcore.ai>>; 'llvm-dev at
lists.llvm.org' <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: RE: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
Before addressing the CHECK-NOT case, I'm still unclear about the DAG case.
What should the first DAG line match? The regex matching would first attempt to
match "10 12" but the expression evaluation would fail; so the DAG
candidate wouldn't match; does this mean the DAG matching does not continue
searching, and the test fails? Or would we restart the search.... where? With
"0 12" (skipping only one character from the previous fail)? In that
case it would ultimately match "12 13" from the first line. Or would
it skip the entire previous candidate, and start searching at " 13"?
In which case it would ultimately match "10 11" on the second line.
In any case (if the first DAG ultimately matches something), the third DAG line
would match the first previously unmatched text in the DAG search range, which
would be either "10 " or "10 12 13" from the first line,
depending on the answer to the previous paragraph.
--paulr
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces
at lists.llvm.org>> On Behalf Of James Henderson via llvm-dev
Sent: Monday, June 15, 2020 4:08 AM
To: Thomas Preud'homme <thomasp at graphcore.ai<mailto:thomasp at
graphcore.ai>>
Cc: llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>>
Subject: Re: [llvm-dev] FileCheck: using numeric variable defined on same line
with caveats
I think I already gave my opinion on one of the previous patches, regarding
CHECK-NOT, which approximately came to the same conclusion as what you've
got here, so +1 from me. I also think the CHECK-DAG example is not one to care
about. It seems to me that there's no guarantee what CHECK-DAG:
[[LINE_AFTER_FOO:.*]] would match, as, if I followed it correctly, CHECK-DAGs
don't have any guarantee of order within a group, so it could match either
the next line after BEGIN, the line after [[#VAR1:]] [[#VAR1+1]] or indeed any
line before END.
James
On Thu, 11 Jun 2020 at 12:29, Thomas Preud'homme via llvm-dev <llvm-dev
at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi,
TL;DR: Is it ok to allow numeric variables used on same line as defined except
for CHECK-NOT and with false negatives?
FileCheck does not currently allow a numeric variable from being used on the
same line they were defined. I have a tentative patch to add that support but it
comes with caveats so before going through review I'd like to get consensus
on whether those caveats are acceptable.
== The problem =
The problem with matching variables defined on the same line is that the
matching is done separately from checking the numeric relation, because numeric
relation cannot be expressed in regex. That is, when matching [[#VAR:]]
[[#VAR+1]] FileCheck is first matching the input against ([0-9]+) ([0-9]+) and
then the value of the two captured integer are checked.
This can lead to at times confusing or downward wrong outcomes. Consider the
following input with the CHECK pattern mentioned above:
10 12 13
The regex would match numbers 10 and 12 and fail the CHECK directive despite 12
and 13 verifying the +1 relation. This could happen as a result of a change in
the input after a new commit has landed. In the case of a CHECK directive, it
would make the test regress and a developer would need to tighten the pattern
somehow, for instance by chaning it for [[#VAR:]] [[#VAR+1]]{{$}}. Now in the
context of a CHECK-NOT this could be a change from input 10 12 14 to 10 12 13
and the pattern would still fail to match and thus the test still pass despite
the compiler having regressed.
== Proposed "solution" =
Given the above, we can summarize the risks of supporting numeric expression
using a variable defined on the same line to:
* test regression on positive matching directives (CHECK, CHECK-NEXT, ...)
* silent compiler regression on negative matching directives (CHECK-NOT)
I am therefore proposing to prevent using numeric variables defined on the same
line for negative matching directives but allow it for positive matching
directives with a note in the documentation to be careful to make the pattern as
tight as possible.
== CHECK-DAG case =
CHECK-DAG is interesting because despite it being a positive matching directive,
there's a risk with CHECK-DAG in case a test rely on the way CHECK-DAG is
implemented. Consider the following directives which rely on each directive
being matched in order:
CHECK: BEGIN
CHECK-DAG: [[#VAR1:]] [[#VAR1+1]]
CHECK-DAG: FOO
CHECK-DAG: [[LINE_AFTER_FOO:.*]]
CHECK: END
CHECK-NOT: [[LINE_AFTER_FOO]] BAZ
This could be written if the line checked by the first CHECK-DAG is guaranteed
to always be either before FOO or after the line after FOO. Now consider the
following input that verifies this invariant:
BEGIN
10 12 13
FOO 10 11
FOOBAR
END
10 12 13 FOOBAR BAZ
The expectation from the test author relying on the CHECK-DAG behavior would be
for LINE_AFTER_FOO to have the value FOOBAR once the CHECK-DAG block has
matched. However due to the caveats mentioned above it would end up being set to
"10 12 13" and thus the CHECK-NOT would pass because "10 12
13" is not followed by "BAZ". That's far fetched though,
I'm not convinced we should worry about this beyond documenting CHECK-DAG as
being able to match in any order.
Thoughts?
Best regards,
Thomas
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!u8dXPx858KRkn3NJFFUKY46ZVBaBOz9jKGaTk7iC6v9IhpabzCjCnB1FRnf7DQ0Bbw$>
** We have updated our privacy policy, which contains important information
about how we collect and process your personal data. To read the policy, please
click
here<https://urldefense.com/v3/__http:/www.graphcore.ai/privacy__;!!JmoZiZGBv3RvKRSx!u8dXPx858KRkn3NJFFUKY46ZVBaBOz9jKGaTk7iC6v9IhpabzCjCnB1FRnf98j8GxQ$>
**
This email and its attachments are intended solely for the addressed recipients
and may contain confidential or legally privileged information.
If you are not the intended recipient you must not copy, distribute or
disseminate this email in any way; to do so may be unlawful.
Any personal data/special category personal data herein are processed in
accordance with UK data protection legislation.
All associated feasible security measures are in place. Further details are
available from the Privacy Notice on the website and/or from the Company.
Graphcore Limited (registered in England and Wales with registration number
10185006) is registered at 107 Cheapside, London, UK, EC2V 6DN.
This message was scanned for viruses upon transmission. However Graphcore
accepts no liability for any such transmission.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200615/05f2d6f6/attachment.html>
Possibly Parallel Threads
- FileCheck: using numeric variable defined on same line with caveats
- [RFC][FileCheck] New option to negate check patterns
- FileCheck: using numeric variable defined on same line with caveats
- [RFC][FileCheck] New option to negate check patterns
- FileCheck wishlist