Message ID | CAJk11WCr6iv_jF-KAhZOhqUVbapmJPJcoRNzkeGS40AFmOORMQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hi Pekka, On Sat, Oct 29, 2016 at 02:58:29PM +0300, Pekka Jääskeläinen wrote: > Hi Martin, > > Thanks for the comments and suggestions. Replies inline: > > On Thu, Oct 20, 2016 at 6:10 PM, Martin Jambor <mjambor@suse.cz> wrote: > > - Still quite few things need to be documented better, e.g.: > > + brig_to_generic::get_mangled_name_tmpl and to a lesser extent > > brig_to_generic::get_mangled_name. It should be clear what is the > > intended difference in usage of the two (specially since the > > former is a template and so the parameter does not give that much > > of a hint to the reader) > > Added more comments. thanks. > > > + the visitor classes need some description so that the first time > > readers see them, they understand what they are for and what they > > visit (i.e. what "visiting" even means). > > This is an adaptation of the classic gang of four Visitor design pattern. > I added a reference to it in a comment. Thanks. As you can see, I am not a C++ person and do not know names of common patterns. For me, "Gang of Four" is a reference to history of China and not something one would want to emulate. > > > - I know it was me who told you to use gcc_assert and gcc_unreachable > > instead of internal_error. The thing is, often the error is clearly > > not an internal error but an error in input. I think that we should > > plan to handle these cases differently and report the issues better > > to the user, give a meaningful error message together width the > > section and the offset there when it was encountered. I am not > > asking you to audit all asserts now and convert those in this > > category but it would be nice to have a mechanism to easily do so > > (and convert a few obvious places), so that we can convert these as > > we bump into them. > > I'm not sure about this. BRIG FE is a rather special case as we > assume HSAILasm has been used to parse and error check the original > HSAIL text to the binary BRIG format which it consumes. > > Of course HSAILasm can have bugs, but how much we should produce human > readable error messages to help debugging HSAILasm is another thing. > In case the BRIG FE > fails to consume the input, it means either the BRIG is corrupted for > a reason or another, > but typically is not a human error (those should be caught be HSAILasm). I personally primarily want this for debugging purposes, and we should try to eventually report errors in a more comprehensible way than HSAILasm. But more generally, and more importantly, if the input, whether human readable or not, is incorrect, the compiler should not produce an "internal error." If that happens, users will file bugs against gcc when in fact it is the generating tool that is broken. One you maintain the FE, you personally will not want this :-) > > "File format not recognized" error is one that might be useful though. > I added a check for the BRIG magic number and the supported version (1.0). > > Perhaps we should add error printouts later on case by case basis when we > see which error cases can be useful and worth reporting in a human readable > graceful manner? It can be as easy as converting the internal_error > to fatal_error or similar in that case. Yeah, that is what I meant, together with printing some information about offsets in different sections where the error is. > > > - A very minor suggestion: In GCC it is customary to write TODO as one > > word. We generally do not use "TO OPTIMIZE", that is just a TODO > > (as opposed to a FIXME, which hints that something is at least a bit > > wrong here). I think you can keep your way if you want but for > > example I do have emacs highlighting set up for the traditional > > formats. > > Converted all to TODO. > > > - You do not seem to handle BRIG_OPCODE_ICALL, or have I missed it? > > That is fine, I don't think anybody else does that now anyway, I > > just got curious reading the ipa analysis. > > Right. > > > - brig-lang.c: > > + x_flag_whole_program = 0; - talk about this with Honza > > I guess this is note to yourself? As we agreed I didn't try whole program > optimizations yet. I might later when optimizing for a target. It will > be really useful, I agree. Now that it used the proper builtin way, it > might work more easily. Right, it was a note for myself. > > > + brig_langhook_type_for_size: has several issues. Either always > > return a type like go or return error_mark_node instead of NULL. > > Also, do not count on long being 64-bit. I would just copy what > > go does. Or lto. > > Copied the go's version, it should work with BRIG too. Great. > > > + convert: did you avoid using convert_to_vector deliberately? The > > size check seems genuinely useful. > > BRIG/HSAIL is a bit special case due to its "untyped" variables (registers), > I use bit level casts in a lot of places to avoid accidental type conversions > or sign extensions. I was wondering why in case of VECTOR_TYPE, instead of return build1 (VIEW_CONVERT_EXPR, type, expr); you do not do return fold (convert_to_vector (type, expr)); it seemed to me natural given how you handle the other cases and that the function internally also builds a V_C_E, after two checks. So I thought the checks (matching overall size and only converting from an integer or another vector-type) were the problem and was wondering which one. Especially the size one is something I believe you should never violate too. > > > - brigfrontend/brig-to-generic.cc: > > + brig_to_generic::parse: You seem to be handling LDA instruction > > (or more generally, BRIG_KIND_INST_ADDR instructions, but there is > > just the one) with copy_move_inst_handler, which seems just wrong. > > What do you mean by 'wrong'? copy_move_inst_handler operator() starts by type-casting whatever it gets to BrigInstSourceType* and immediately dereferencing it, loading field sourceType, even though, in case of an LDA, what it is actually looking at is not a BrigInstSourceType but a BrigInstAddr, which is an entirely different structure that does not have that field. So you read the 8-bit segment and the first reserved field and happily pass that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on random data. That is wrong. Later on in the operator() you actually figure out that you are looking at BrigInstSourceType but that is too late. > The handlers do not map to instruction types > directly, but often more towards the specification chapters. > It comes straight from: > http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BRIG_syntax_copy_move.htm Ah, so that chapter is why you decided to handle type conversions and LDA together, it made little sense to me. Still, I do not think it is a very good idea implementation-wise, but if the operator() is written correctly, it certainly can be done. > > > Its operator() blindly casts the instruction to > > BrigInstSourceType, interpreting the segment as if it was a source > > type... am I missing something? > > I believe you are confused by the BRIG struct name: BrigInstSourceType > is a struct > for "instructions that have different types for their destination and > source operands" > http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/18_BRIG/BrigInstSourceType.htm > > That is, it's not a "source type" object, but an instruction type object. Ehm, no, I think I understand that, the invalid typecast/dereference is what I am pointing at. > > > + build_reinterpret_cast: gcc_unreachable followed by return > > NULL_TREE does not make sense, please convert the whole thing to > > an assert. More importantly, I think that you really need to > > Converted it to an assert. > > > solve the case of mismatching sizes differently, and not create a > > (single) V_C_E for it. For register types, you can V_C_E them to > > an unsigned int of the same size, then do a widening NOP_EXPR to > > unsigned type with the same size as the destination and then do a > > V_C_E to whatever you need. But perhaps this can be solved better > > at the caller side somewhere? > > I added the extra conversion step through unsigned ints. > I hadn't noticed that V_C_E is not guaranteed to work for this case. Well, Ada also creates mismatched V_C_Es, so we do not verify the sizes match and it mostly works. However, after hunting down expansion bugs caused by a combination of that and SRA, I really really do not want to avoid adding more such situations, it still seems quite fragile to me. So thanks for doing this. > > + brig_code_entry_handler::build_address_operand: I must say I > > really dislike how the end of the function is structured, it is > > terrible difficult to read given that it is not doing anything that > > complicated and I think it does not handle correctly an (LDA of a) > > NULL address, which unfortunately I believe is valid for private > > and group segment addresses. Dealing with the most complex case > > by converting symbol to size_type looks exactly backwards, > > especially given that you have converted the base of the > > POINTER_PLUS_EXPR only few lines before. I think the code would > > be a lot nicer and easier to comprehend if you clearly > > distinguished the various cases (symbol_base != NULL (and sub > > cases when ptr_base is or is not NULL), ptrbase != NULL and simple > > constant, even NULL constant, which you do not handle but fail an > > assert, I think) and handled them separately, including all type > > conversions. ptr_base is an unfortunate name, IMHO, in many cases > > it has the role of a variable offset rather than a base. > > Similarly, ptr_ofset is really a constant_offset. > > Renamed ptr_base to var_offset and ptr_offset to const_offset which > indeed are more descriptive. > > I also cleaned the if..else mess at the end of the function and made the > different cases very explicit, instead of the spaghetti mess previous > version. It looks much better now to me at least. Indeed it does, thanks. However, I think that at the point when you do: if (offs > 0) const_offset = build_int_cst (size_type_node, offs); const_offset can already be non-NULL you may lose whatever value it had before. You might want to keep the constant offset as integer until the very moment when you may want to use it. (Also, single statements should not have braces around them ;-) > > - brig-basic-inst-handler.cc: > > + scalarized_sat_arithmetics::scalarized_sat_arithmetics: Do not > > undef macros preemptively. Instead, undef them right after using > > them, after the include of a .def file. > > This seems to be an idiom with the builtin import mechanism elsewhere > also. builtins.def defines a default macro which one must undef if not > wanting to do anything with that builtin type in that particular import > location. I see, OK. > > > + brig_basic_inst_handler::must_be_scalarized: I am intrigued by the > > (elements < 16) part of the condition. This function would also > > benefit from a comment. > > This function black listed the known cases of MULHI with vectors > that have been tested to break with AMD64/x86-64. It seems the > support for vector MULT_HIGHPART_EXPR is flaky and undertested. > > The robust thing to do here is to force scalarization always with MULHI for now, > until these issues are debugged further. I added an exception for 2x64b > MULT_HIGHPART_EXPR to avoid the need for 128b scalar arithmetics, > and as it seems to work for the CPUs I've tested. > > The decision should not IMO belong to the frontend, but there'd be > better a step where the vector operations are optionally scalarized if the > target prefers scalarized operations which should be caught by this > step also. Something to fix during optimization work. I see. If you think that MULT_HIGHPART_EXPR is broken, it may be worth filing a bug report (probably during next stage1). > > + brig_basic_inst_handler::operator (): It seems that the opcode > > local variable is only used to identify the return brig > > instructions which seems wasteful. Generally, it would be nice to > > It's also used to catch MULHI which is generated from multiple brig > opcodes. > > > clean this function up a little by moving assignments to some of > > the very many local variables down, as close to their first use as > > reasonable. Surprisingly often, you'd remove the need to compute > > them in many cases at all, e.g. look at element_count and > > element_size_bits. > > Moved element_count and is_fp16_operation. element_size_bits is now > used for catching mulhis for 64b elements. > > > Extra points for a function comment explaining how work is divided > > in between operator() itself and its main helpers such as > > build_instr_expr. > > I added a method comment, but the truth is that the division of work > is a bit artificial, mostly the build_inst_expr() call is there to split > a complex if..else structure to two functions to improve readability. Yeah, I thought so. No worries. > > - brig-cmp-inst-handler.cc: > > + brig_cmp_inst_handler::operator (): the neg_expr seems to be > > something left from earlier times? Use BITS_PER_UNIT instead of > > Correct. Removed. > > > 8, having both result_width and element_width seems unnecessary > > Removed element_width and moved result_width definition > closer to its use. > > > (and speaking of elements, is that actually even a vector case?), > > and should be initialized only in the case when it is used. In > > case of vector results, please build either all_ones or all_zeros, > > it is wasteful to allocate both. > > They both are used to produce the HSA required all_ones/all_zeros > output. We already discussed this on IRC, this was my oversight. > > > - brig-mem-inst-handler.cc: I believe that using the alignment > > modifier is something that we should try to get done as soon as > > possible. > > I agree. Probably one of the first things that will pop up during optimizing > the performance. > > > - brig-inst-mod-handler.cc: This seems like something that we should > > at least warn about (in case when effectively an unsupported > > operation is requested). > > If there will be an upgrade for the frontend to support the 'full' profile (it's > only supporting 'base' now) with all the rounding modifiers, a better way might > be found than injecting fesetround() calls around all float > expressions. Probably > in that case all float ops must be converted to builtin calls that > ensure the wanted > rounding (ungh!). Yay. OK. > > > - brig-seg-inst-handler.cc: At this point I'm trying to read quickly > > but it seems to me you do not support conversion between flat and > > global segment... how come? > > Global address is already a flat address. Check the description tab in > http://www.hsafoundation.com/html/HSA_Library.htm#PRM/Topics/05_Arithmetic/segment_conversion.htm?Highlight=global Oh, I did not realize that HSAIL does not allow stof_global and ftos_global variants. Interesting. OK then. > > > - brig-copy-move-inst-handler.cc: > > + brig_copy_move_inst_handler::operator (): The function definitely > > should not cast LDA instructions to BrigInstSourceType*! > > Yes it should. Like I explained above, the struct name is misleading, > but it's actually an instruction calls not a source type. but surely LDA is encoded as struct BrigInstAddr, not BrigInstSourceType ??? > > > - brig-branch-inst-handler.cc: I believe that as long as the builtins > > representing barriers are not pure, they will not be hoisted out of > > a loop. Nonduplication might indeed be a problem, although short of > > whole function cloning, I could not think of a transformation gcc > > performs that might pause a problem. Nevertheless, we probably > > should introduce an attribute for it and look for it in > > gimple_can_duplicate_bb_p (and in cfg_layout_can_duplicate_bb_p?). > > An important issue, but hopefully for later. > > Agreed. > > > > - brig-function-handler.cc: > > + brig_directive_function_handler::operator: Please use gcc_assert > > instead of assert. (Well, in this case it is clearly input error > > which, eventually, we will want to give nice errors about. But at > > least do not use assert.) > > Converted to gcc_assert() for now. > > > + brig_function::build_launcher_and_metadata: The ASM directive is > > really an ugly hack. It is isolated so I am not that much > > concerned, but building a structure and filling it with data (like > > we do for example in hsa_output_libgomp_mapping) seems cleaner. > > Hmm. I don't generate any metadata structs to data section, but a separate > custom ELF section per kernel. I agree the ASM directive is not ideal > in general, > but I don't think there's a generic way to add custom ELF sections. > > I'm not sure how much building the structure field by field would be a better > approach in comparison to a raw dump as the point is to just transfer the > metadata to the HSA runtime with the finalized binary. The runtime should use > exactly the same struct layout, otherwise it won't work anyways. If I > do a raw dump, > at least I ensure that if the struct is updated I won't forget to update the > serialization code. Well, hopefully nobody will object too strongly against this. As I wrote before, it is isolated enough for me. > > - gcc/builtins.def: In the added DEF_HSAIL_*_BUILTIN macros, please > > arrange it so that they only pass true in the last argument of > > DEF_BUILTIN when gcc is configured with BRIG FE. Builtins are not > > free and should not be added needlessly. > > It doesn't even include the hsail-builtins.def in case BRIG FE not > enabled now. I suppose that's even better than executing those macros > for nothing. Oh, I have missed that. Sure. > > If I understood you correctly, both you and your sponsor have already > > signed the Copyright assignment, right? If that is so, I'll ask the > > steering committee to approve the intention and then ask a global > > reviewer to also peek at it. > > Correct, and I see you already did. Thanks! > > > Thanks for your patience, > > Thank a lot for the comments. I know how much patience it requires to wade > through a big bunch of someone else's boring code. > > New BRIG FE patch set attached. This week I have checked out the updated tree from github and looked at only a few changed functions there. Hopefully the steering committee will decide soon and we'll get attention of a global reviewer during the next few weeks too. Thanks for addressing so many of my comments, Martin
Hi Martin, On Fri, Nov 4, 2016 at 4:58 PM, Martin Jambor <mjambor@suse.cz> wrote: > I personally primarily want this for debugging purposes, and we should > try to eventually report errors in a more comprehensible way than > HSAILasm. But more generally, and more importantly, if the input, > whether human readable or not, is incorrect, the compiler should not > produce an "internal error." If that happens, users will file bugs > against gcc when in fact it is the generating tool that is broken. > One you maintain the FE, you personally will not want this :-) Yes, I will add better error messages as soon asissues are reported. > I was wondering why in case of VECTOR_TYPE, instead of > return build1 (VIEW_CONVERT_EXPR, type, expr); > you do not do > return fold (convert_to_vector (type, expr)); > > it seemed to me natural given how you handle the other cases and that > the function internally also builds a V_C_E, after two checks. So I > thought the checks (matching overall size and only converting from an > integer or another vector-type) were the problem and was wondering > which one. Especially the size one is something I believe you should > never violate too. I see. I cannot recall the origins of this, but my guess is that the code is copied from an older Go frontend version. I tested with your suggestion and it seems to work fine. > copy_move_inst_handler operator() starts by type-casting whatever it > gets to BrigInstSourceType* and immediately dereferencing it, loading > field sourceType, even though, in case of an LDA, what it is actually > looking at is not a BrigInstSourceType but a BrigInstAddr, which is an > entirely different structure that does not have that field. So you > read the 8-bit segment and the first reserved field and happily pass > that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on > random data. That is wrong. > > Later on in the operator() you actually figure out that you are > looking at BrigInstSourceType but that is too late. Ah, I see the issue now. It worked as with LDA the input type was forced and handled explicitly. I now made the code more explicit regarding this. > Indeed it does, thanks. However, I think that at the point when you > do: > > if (offs > 0) > const_offset = build_int_cst (size_type_node, offs); > > const_offset can already be non-NULL you may lose whatever value it > had before. You might want to keep the constant offset as integer > until the very moment when you may want to use it. Great catch! There indeed was a major memory corruption with group and private variable which was not caught by the PRM conformance suite I use for testing. Should be now fixed. > (Also, single statements should not have braces around them ;-) ...also fixed these, once again ;) > I see. If you think that MULT_HIGHPART_EXPR is broken, it may be > worth filing a bug report (probably during next stage1). It can be considered broken or not just fully implemented for all targets and various vector types (especially those not directly supported by the target). Yes, I rather look closer and report this after the patch has been merged to be able to provide BRIG test cases that work with upstream code base. > This week I have checked out the updated tree from github and looked > at only a few changed functions there. Hopefully the steering > committee will decide soon and we'll get attention of a global > reviewer during the next few weeks too. > > Thanks for addressing so many of my comments, Thanks again, an updated FE patch attached. BR, Pekka
This patch set adds a BRIG (HSAIL) frontend. It can be used as a core for an HSAIL finalizer implementation for processors with gcc backends. It is a bit unusual frontend as the consumed format is a binary representation. The textual HSAIL can be compiled to it with a separate assembler tool. The frontend has been mostly tested with the HSA PRM conformance suite which it now passes. The accompanied GENERIC-scanning test suite is supposed to be only a smoke test. libhsail-rt implements HSAIL specific builtins and includes a simple runtime that implements SPMD execution via either Pth-based fibers or loops to execute multiple work-item work groups without SPMD/SIMD-default hardware. I've split it to 4 patches: 001 - the configuration file changes and misc. 002 - the frontend itself 003 - libhsail-rt 004 - the smoke test suite The diffstat is as follows: .gitignore | 2 +- Makefile.def | 3 + Makefile.in | 489 + configure | 1 + configure.ac | 1 + gcc/brig/Make-lang.in | 247 + gcc/brig/brig-builtins.h | 99 + gcc/brig/brig-c.h | 66 + gcc/brig/brig-lang.c | 770 + gcc/brig/brigfrontend/brig-arg-block-handler.cc | 66 + gcc/brig/brigfrontend/brig-atomic-inst-handler.cc | 265 + gcc/brig/brigfrontend/brig-basic-inst-handler.cc | 865 + gcc/brig/brigfrontend/brig-branch-inst-handler.cc | 221 + gcc/brig/brigfrontend/brig-cmp-inst-handler.cc | 198 + gcc/brig/brigfrontend/brig-code-entry-handler.cc | 1712 ++ gcc/brig/brigfrontend/brig-code-entry-handler.h | 422 + gcc/brig/brigfrontend/brig-comment-handler.cc | 39 + gcc/brig/brigfrontend/brig-control-handler.cc | 108 + .../brigfrontend/brig-copy-move-inst-handler.cc | 58 + gcc/brig/brigfrontend/brig-cvt-inst-handler.cc | 260 + gcc/brig/brigfrontend/brig-fbarrier-handler.cc | 44 + gcc/brig/brigfrontend/brig-function-handler.cc | 373 + gcc/brig/brigfrontend/brig-function.cc | 719 + gcc/brig/brigfrontend/brig-function.h | 213 + gcc/brig/brigfrontend/brig-inst-mod-handler.cc | 58 + gcc/brig/brigfrontend/brig-label-handler.cc | 37 + gcc/brig/brigfrontend/brig-lane-inst-handler.cc | 84 + gcc/brig/brigfrontend/brig-machine.c | 44 + gcc/brig/brigfrontend/brig-machine.h | 33 + gcc/brig/brigfrontend/brig-mem-inst-handler.cc | 180 + gcc/brig/brigfrontend/brig-module-handler.cc | 41 + gcc/brig/brigfrontend/brig-queue-inst-handler.cc | 93 + gcc/brig/brigfrontend/brig-seg-inst-handler.cc | 146 + gcc/brig/brigfrontend/brig-signal-inst-handler.cc | 42 + gcc/brig/brigfrontend/brig-to-generic.cc | 812 + gcc/brig/brigfrontend/brig-to-generic.h | 226 + gcc/brig/brigfrontend/brig-util.cc | 446 + gcc/brig/brigfrontend/brig-util.h | 53 + gcc/brig/brigfrontend/brig-variable-handler.cc | 263 + gcc/brig/brigfrontend/phsa.h | 69 + gcc/brig/brigspec.c | 135 + gcc/brig/config-lang.in | 41 + gcc/brig/lang-specs.h | 28 + gcc/brig/lang.opt | 41 + gcc/builtin-types.def | 80 +- gcc/builtins.def | 41 + gcc/config.in | 6 + gcc/configure | 10 +- gcc/configure.ac | 5 + gcc/doc/frontends.texi | 2 +- gcc/doc/invoke.texi | 4 + gcc/doc/standards.texi | 8 + gcc/hsail-builtins.def | 659 + gcc/testsuite/brig.dg/README | 10 + gcc/testsuite/brig.dg/dg.exp | 27 + gcc/testsuite/brig.dg/test/gimple/alloca.hsail | 37 + gcc/testsuite/brig.dg/test/gimple/atomics.hsail | 33 + gcc/testsuite/brig.dg/test/gimple/branches.hsail | 58 + gcc/testsuite/brig.dg/test/gimple/fbarrier.hsail | 74 + .../brig.dg/test/gimple/function_calls.hsail | 59 + gcc/testsuite/brig.dg/test/gimple/kernarg.hsail | 25 + gcc/testsuite/brig.dg/test/gimple/mem.hsail | 39 + gcc/testsuite/brig.dg/test/gimple/mulhi.hsail | 33 + gcc/testsuite/brig.dg/test/gimple/packed.hsail | 78 + .../brig.dg/test/gimple/smoke_test.hsail | 91 + gcc/testsuite/brig.dg/test/gimple/variables.hsail | 124 + gcc/testsuite/brig.dg/test/gimple/vector.hsail | 57 + gcc/testsuite/lib/brig-dg.exp | 29 + gcc/testsuite/lib/brig.exp | 40 + include/hsa-interface.h | 630 + libhsail-rt/Makefile.am | 124 + libhsail-rt/Makefile.in | 740 + libhsail-rt/README | 4 + libhsail-rt/aclocal.m4 | 978 + libhsail-rt/config.h.in | 217 + libhsail-rt/configure | 17016 ++++++++++++++++++ libhsail-rt/configure.ac | 151 + libhsail-rt/include/internal/fibers.h | 95 + .../include/internal/phsa-queue-interface.h | 60 + libhsail-rt/include/internal/phsa-rt.h | 94 + libhsail-rt/include/internal/workitems.h | 107 + libhsail-rt/m4/libtool.m4 | 7997 ++++++++ libhsail-rt/m4/ltoptions.m4 | 384 + libhsail-rt/m4/ltsugar.m4 | 123 + libhsail-rt/m4/ltversion.m4 | 23 + libhsail-rt/m4/lt~obsolete.m4 | 98 + libhsail-rt/rt/arithmetic.c | 475 + libhsail-rt/rt/atomics.c | 115 + libhsail-rt/rt/bitstring.c | 190 + libhsail-rt/rt/fbarrier.c | 87 + libhsail-rt/rt/fibers.c | 212 + libhsail-rt/rt/fp16.c | 135 + libhsail-rt/rt/misc.c | 89 + libhsail-rt/rt/multimedia.c | 135 + libhsail-rt/rt/queue.c | 71 + libhsail-rt/rt/sat_arithmetic.c | 299 + libhsail-rt/rt/segment.c | 57 + libhsail-rt/rt/workitems.c | 952 + libhsail-rt/target-config.h.in | 68 + 99 files changed, 43463 insertions(+), 5 deletions(-)