diff mbox

[gomp4] Support multi-dimensional pointer based arrays in OpenACC data clauses

Message ID 9bd92682-c1d3-5530-4f76-fdc68318d8e9@mentor.com
State New
Headers show

Commit Message

Chung-Lin Tang Jan. 10, 2017, 8:26 a.m. UTC
This patch implements support for dynamically allocated multi-dimensional arrays
in OpenACC data clauses. To illustrate, these kinds of arrays now work:

int **a;
float *f[100];
double ***d;

#pragma acc parallel copy (a[0:100][x:y], f[10:20][0:30]) copyout(d[x:y][x:y][x:y])
{
 ...
}

The pointer-to-array-rows kind of case is supposedly also supported in the OpenACC
spec (e.g. int (*x)[50]), though support for that is currently still TBD. I've
rejected those cases in omp-low.

Instead of using multiple continuous map entries like pset/pointer maps, I've
opted to use a different style. The compiler creates a descriptor on stack, and
passes the pointer into libgomp. libgomp will then process and exchange it
for the actual target dynamic array pointer before kernel launch.

Tested and committed to gomp-4_0-branch. This will probably also be sent some
time during the next stage-1 for trunk.

Chung-Lin

2017-01-10  Chung-Lin Tang  <cltang@codesourcery.com>

        gcc/c/
        * c-typeck.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
        parameter, adjust recursive call site, add cases for allowing
        pointer based multi-dimensional arrays for OpenACC.
        (handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
        handle non-contiguous case to create dynamic array map.

        gcc/cp/
        * semantics.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
        parameter, adjust recursive call site, add cases for allowing
        pointer based multi-dimensional arrays for OpenACC.
        (handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
        handle non-contiguous case to create dynamic array map.

        gcc/
        * gimplify.c (gimplify_scan_omp_clauses): For dynamic array map kinds,
        make sure bias in each dimension are put into firstprivate variables.
        * tree-pretty-print.c (dump_omp_clauses): Add cases for printing
        GOMP_MAP_DYNAMIC_ARRAY map kinds.
        * omp-low.c (struct omp_context):
        Add 'hash_map<tree_operand_hash, tree> *dynamic_arrays' field, also
        added include of "tree-hash-traits.h".
        (append_field_to_record_type): New function.
        (create_dynamic_array_descr_type): Likewise.
        (create_dynamic_array_descr_init_code): Likewise.
        (new_omp_context): Add initialize of dynamic_arrays field.
        (delete_omp_context): Add delete of dynamic_arrays field.
        (scan_sharing_clauses): For dynamic array map kinds, check for
        supported dimension structure, and install dynamic array variable into
        current omp_context.
        (lower_omp_target): Add handling for dynamic array map kinds.
        (dynamic_array_lookup): New function.
        (dynamic_array_reference_start): Likewise.
        (scan_for_op): Likewise.
        (scan_for_reference): Likewise.
        (da_create_bias): Likewise.
        (da_dimension_peel): Likewise.
        (lower_omp_1): Add case to look for start of dynamic array reference,
        and handle bias adjustments for the code sequence.

        include/
        * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define.
        (enum gomp_map_kind): Add GOMP_MAP_DYNAMIC_ARRAY,
        GOMP_MAP_DYNAMIC_ARRAY_TO, GOMP_MAP_DYNAMIC_ARRAY_FROM,
        GOMP_MAP_DYNAMIC_ARRAY_TOFROM, GOMP_MAP_DYNAMIC_ARRAY_FORCE_TO,
        GOMP_MAP_DYNAMIC_ARRAY_FORCE_FROM, GOMP_MAP_DYNAMIC_ARRAY_FORCE_TOFROM,
        GOMP_MAP_DYNAMIC_ARRAY_ALLOC, GOMP_MAP_DYNAMIC_ARRAY_FORCE_ALLOC,
        GOMP_MAP_DYNAMIC_ARRAY_FORCE_PRESENT.
        (GOMP_MAP_DYNAMIC_ARRAY_P): Define.

        libgomp/
        * target.c (struct da_dim): New struct declaration.
        (struct da_descr_type): Likewise.
        (struct da_info): Likewise.
        (gomp_dynamic_array_count_rows): New function.
        (gomp_dynamic_array_compute_info): Likewise.
        (gomp_dynamic_array_fill_rows_1): Likewise.
        (gomp_dynamic_array_fill_rows): Likewise.
        (gomp_dynamic_array_create_ptrblock): Likewise.
        (gomp_map_vars): Add code to handle dynamic array map kinds.
        * testsuite/libgomp.oacc-c-c++-common/da-1.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/da-2.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/da-3.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/da-4.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/da-utils.h: New test.

Comments

Thomas Schwinge Nov. 7, 2019, 12:48 a.m. UTC | #1
Hi Chung-Lin!

On 2019-11-05T22:35:43+0800, Chung-Lin Tang <chunglin_tang@mentor.com> wrote:
> Hi Thomas,

> after your last round of review, I realized that the bulk of the compiler omp-low work was

> simply a case of dumb over-engineering in the wrong direction :P

> (although it did painstakingly function correctly)


Hehe -- that happens.  ;-)

> However, the issue of ACC_DEVICE_TYPE=host not working (and hence "!openacc_host_selected"

> in the testcases)


Actually not just for that, but also generally for any shared-memory
models that may come into existance at some point, such as CUDA Unified
Memory, for example?

> actually is a bit more sophisticated than I thought:

>

> The reason it doesn't work for the host device, is because we use the map pointer (i.e.

> a hostaddrs[] entry when passed into libgomp) to point to an array descriptor to pass

> the whole array information, and rely on code inside gomp_map_vars_* to setup things,

> and place the final on-device address of the non-contig. array into devaddrs[], therefore

> only using a single map entry (something I thought was quite clever)

>

> However, this broke down on the host and host-fallback devices, simply because, there

> we do NOT do any gomp_map_vars processing; our current code in GOACC_parallel_keyed

> simply skips it and passes the offload function the original hostaddrs[] contents.

> Lacking the processing to transform the descriptor pointer into a proper array ref,

> things of course segfault.

>

> So I think we have three options for this (which may have some interactions with say,

> the "proper" host-side parallelization we eventually need to implement for OpenACC 2.7)

>

> (1) The simplest solution: implement a processing which searches and reverts such

> non-contiguous array map entries in GOACC_parallel_keyed.

> (note: I have implemented this in the current attached "v2" patch)

>

> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;

> i.e. still do the proper gomp_map_vars processing for all cases.

>

> (3) Modify the non-contiguous array map conventions: a possible solution is to use

> two maps placed together: one for the array pointer, another for the array descriptor (as

> opposed to the current style of using only one map) This needs more further elaborate

> compiler/runtime work.

>

> The first two options will pessimize host-mode performance somewhat. The third I have

> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do.


I'll have to think about it some more, but variant (1) doesn't seem so
bad actually, for a first take.  While it's not nice to pessimize in
particular directives with 'if (false)' clauses, at least it does work,
the run-time overhead should not be too bad (also compared to variant
(2), I suppose), and variant (3) can still be implemented later.


A few comments/questions:

Please reference PR76739 in your submission/ChangeLog updates.

> --- gcc/c/c-typeck.c	(revision 277827)

> +++ gcc/c/c-typeck.c	(working copy)

> @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t

>  static tree

>  handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,

>  			     bool &maybe_zero_len, unsigned int &first_non_one,

> -			     enum c_omp_region_type ort)

> +			     bool &non_contiguous, enum c_omp_region_type ort)

>  {

>    tree ret, low_bound, length, type;

>    if (TREE_CODE (t) != TREE_LIST)


> @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec<t

>  	  return error_mark_node;

>  	}

>        /* If there is a pointer type anywhere but in the very first

> -	 array-section-subscript, the array section can't be contiguous.  */

> +	 array-section-subscript, the array section can't be contiguous.

> +	 Note that OpenACC does accept these kinds of non-contiguous pointer

> +	 based arrays.  */


That comment update should instead be moved to the function comment
before the 'handle_omp_array_sections_1' function definition, and should
then also explain the new 'non_contiguous' out variable.  The latter
needs to be done anyway, and the former (no comment here) is easy enough
to tell from the code:

>        if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND

>  	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)

>  	{

> -	  error_at (OMP_CLAUSE_LOCATION (c),

> -		    "array section is not contiguous in %qs clause",

> -		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);

> -	  return error_mark_node;

> +	  if (ort == C_ORT_ACC)

> +	    non_contiguous = true;

> +	  else

> +	    {

> +	      error_at (OMP_CLAUSE_LOCATION (c),

> +			"array section is not contiguous in %qs clause",

> +			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);

> +	      return error_mark_node;

> +	    }

>  	}


> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi

>        unsigned int num = types.length (), i;

>        tree t, side_effects = NULL_TREE, size = NULL_TREE;

>        tree condition = NULL_TREE;

> +      tree ncarray_dims = NULL_TREE;

>  

>        if (int_size_in_bytes (TREE_TYPE (first)) <= 0)

>  	maybe_zero_len = true;

> @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi

>  	    length = fold_convert (sizetype, length);

>  	  if (low_bound == NULL_TREE)

>  	    low_bound = integer_zero_node;

> +

> +	  if (non_contiguous)

> +	    {

> +	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);

> +	      continue;

> +	    }

> +

>  	  if (!maybe_zero_len && i > first_non_one)

>  	    {

>  	      if (integer_nonzerop (low_bound))


I'm not at all familiar with this array sections code, will trust your
understanding that we don't need any of the processing that you're
skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length
expressions, and other things.

> @@ -13357,6 +13374,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi

>  		size = size_binop (MULT_EXPR, size, l);

>  	    }

>  	}

> +      if (non_contiguous)

> +	{

> +	  int kind = OMP_CLAUSE_MAP_KIND (c);

> +	  OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);

> +	  OMP_CLAUSE_DECL (c) = t;

> +	  OMP_CLAUSE_SIZE (c) = ncarray_dims;

> +	  return false;

> +	}

>        if (side_effects)

>  	size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);

>        if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION


Likewise for all the code being skipped here ('return false').

> --- gcc/cp/semantics.c	(revision 277827)

> +++ gcc/cp/semantics.c	(working copy)


Analoguous to the C front end.

> --- gcc/gimplify.c	(revision 277827)

> +++ gcc/gimplify.c	(working copy)

> @@ -8622,9 +8622,17 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_se

>  	  if (OMP_CLAUSE_SIZE (c) == NULL_TREE)

>  	    OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)

>  				  : TYPE_SIZE_UNIT (TREE_TYPE (decl));

> +	  if (OMP_CLAUSE_SIZE (c)

> +	      && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST

> +	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))


Per the code above, 'OMP_CLAUSE_SIZE (c)' will always be set to
something, so no point in checking that here?

Isn't the 'GOMP_MAP_NONCONTIG_ARRAY_P' check alone sufficient already?
And then maybe 'assert (TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST' in
here:

>  	    {

> +	      /* For non-contiguous array maps, OMP_CLAUSE_SIZE is a TREE_LIST

> +		 of the individual array dimensions, which gimplify_expr doesn't

> +		 handle, so skip the call to gimplify_expr here.  */

> +	    }


> -	  if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,

> -			     NULL, is_gimple_val, fb_rvalue) == GS_ERROR)

> +	  else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,

> +				  NULL, is_gimple_val, fb_rvalue) == GS_ERROR)

> +	    {

>  	      remove = true;

>  	      break;

>  	    }


Again, that means we're skipping other code here; don't understand yet.

Your ChangeLog update says:

> 	* gimplify.c (gimplify_scan_omp_clauses): For non-contiguous array map kinds,

> 	make sure bias in each dimension are put into firstprivate variables.


I'm not yet seeing how that's happening.

Ah, I see that ChangeLog comment is probably just a remnant from the
previous version.

> --- gcc/omp-low.c	(revision 277827)

> +++ gcc/omp-low.c	(working copy)


Have not yet reviewed in detail.

> @@ -1367,6 +1498,38 @@ scan_sharing_clauses (tree clauses, omp_context *c

>  	      install_var_local (decl, ctx);

>  	      break;

>  	    }

> +

> +	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP

> +	      && GOMP_MAP_NONCONTIG_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))

> +	    {

> +	      tree array_decl = OMP_CLAUSE_DECL (c);

> +	      tree array_type = TREE_TYPE (array_decl);

> +	      bool by_ref = (TREE_CODE (array_type) == ARRAY_TYPE

> +			     ? true : false);

> +

> +	      /* Checking code to ensure we only have arrays at top dimension.

> +		 This limitation might be lifted in the future.  */


Please reference PR76739 here, and in PR76739 also add a comment about
this limitation.  (As well as any other limitations, of course.)

> +	      if (TREE_CODE (array_type) == REFERENCE_TYPE)

> +		array_type = TREE_TYPE (array_type);

> +	      tree t = array_type, prev_t = NULL_TREE;

> +	      while (t)

> +		{

> +		  if (TREE_CODE (t) == ARRAY_TYPE && prev_t)

> +		    {

> +		      error_at (gimple_location (ctx->stmt), "array types are"

> +				" only allowed at outermost dimension of"

> +				" non-contiguous array");

> +		      break;

> +		    }

> +		  prev_t = t;

> +		  t = TREE_TYPE (t);

> +		}

> +

> +	      install_var_field (array_decl, by_ref, 3, ctx);

> +	      install_var_local (array_decl, ctx);

> +	      break;

> +	    }

> +


Assuming this intentionally means to skip ('break' just above) the
following 'if (DECL_P (decl))' and its 'else' branch, then maybe remove
the 'break' just above, and instead do 'else if (DECL_P (decl))'?

>  	  if (DECL_P (decl))

>  	    {

>  	      if (DECL_SIZE (decl)


> @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o

>        gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);

>      }

> 

> +  /* If is OpenACC construct, put non-contiguous array clauses (if any)

> +     in front of clause chain. The runtime can then test the first to see

> +     if the additional map processing for them is required.  */

> +  if (is_gimple_omp_oacc (stmt))

> +    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));


Should that be deemed unsuitable for any reason, then add a new
'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.

> --- include/gomp-constants.h	(revision 277827)

> +++ include/gomp-constants.h	(working copy)

> @@ -40,6 +40,7 @@

>  #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)

>  #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)

>  #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)

> +#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)

>  #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \

>  					 | GOMP_MAP_FLAG_SPECIAL_0)

>  /* Flag to force a specific behavior (or else, trigger a run-time error).  */

> @@ -127,6 +128,26 @@ enum gomp_map_kind

>      /* Decrement usage count and deallocate if zero.  */

>      GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2

>  					 | GOMP_MAP_DELETE),

> +    /* Mapping kinds for non-contiguous arrays.  */

> +    GOMP_MAP_NONCONTIG_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),

> +    GOMP_MAP_NONCONTIG_ARRAY_TO =	(GOMP_MAP_NONCONTIG_ARRAY

> +					 | GOMP_MAP_TO),

> +    GOMP_MAP_NONCONTIG_ARRAY_FROM =	(GOMP_MAP_NONCONTIG_ARRAY

> +					 | GOMP_MAP_FROM),

> +    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY

> +					 | GOMP_MAP_TOFROM),

> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =	(GOMP_MAP_NONCONTIG_ARRAY_TO

> +					 | GOMP_MAP_FLAG_FORCE),

> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =	(GOMP_MAP_NONCONTIG_ARRAY_FROM

> +						 | GOMP_MAP_FLAG_FORCE),

> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY_TOFROM

> +						 | GOMP_MAP_FLAG_FORCE),

> +    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =		(GOMP_MAP_NONCONTIG_ARRAY

> +						 | GOMP_MAP_ALLOC),

> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =	(GOMP_MAP_NONCONTIG_ARRAY

> +						 | GOMP_MAP_FORCE_ALLOC),

> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT =	(GOMP_MAP_NONCONTIG_ARRAY

> +						 | GOMP_MAP_FORCE_PRESENT),


Just an idea: instead of this long list, would it maybe be better (if
feasible at all?) to have a single "lead-in" mapping
'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the
following (normal) mappings belong to that "non-contiguous array mode".
(Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any
'GOMP_MAP_POINTER's following it.)  Might that make some things simpler,
or even more complicated (more internal state to keep)?

> --- libgomp/oacc-parallel.c	(revision 277827)

> +++ libgomp/oacc-parallel.c	(working copy)


> +static inline void

> +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,

> +				     unsigned short *kinds)

> +{

> +  for (int i = 0; i < mapnum; i++)

> +    {

> +      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))

> +	hostaddrs[i] = *((void **)hostaddrs[i]);


Can we be (or, do we make) sure that 'hostaddrs' will never be in
read-only memory?

And, it's permissible to alter 'hostaddrs'?

Ah, other code (including 'libgomp/target.c') is doing such things, too,
so it must be fine.

> +      else

> +	/* We assume all non-contiguous array map entries are placed at the

> +	   start; first other map kind means we can exit.  */

> +	break;

> +    }

> +}


> --- libgomp/target.c	(revision 277827)

> +++ libgomp/target.c	(working copy)


Have not yet reviewed in detail.

> @@ -533,9 +679,37 @@ gomp_map_vars_internal (struct gomp_device_descr *

>    const int typemask = short_mapkind ? 0xff : 0x7;

>    struct splay_tree_s *mem_map = &devicep->mem_map;

>    struct splay_tree_key_s cur_node;

> -  struct target_mem_desc *tgt

> -    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);

> -  tgt->list_count = mapnum;

> +  struct target_mem_desc *tgt;

> +

> +  bool process_noncontig_arrays = false;

> +  size_t nca_data_row_num = 0, row_start = 0;

> +  size_t nca_info_num = 0, nca_index;

> +  struct ncarray_info *nca_info = NULL;

> +  struct target_var_desc *row_desc;

> +  uintptr_t target_row_addr;

> +  void **host_data_rows = NULL, **target_data_rows = NULL;

> +  void *row;

> +

> +  if (mapnum > 0)

> +    {


Also add such a comment here: "We assume all non-contiguous array map
entries are placed at the start".

> +      int kind = get_kind (short_mapkind, kinds, 0);

> +      process_noncontig_arrays = GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask);

> +    }

> +

> +  if (process_noncontig_arrays)

> +    for (i = 0; i < mapnum; i++)

> +      {

> +	int kind = get_kind (short_mapkind, kinds, i);

> +	if (GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))

> +	  {

> +	    nca_data_row_num += gomp_noncontig_array_count_rows (hostaddrs[i]);

> +	    nca_info_num += 1;

> +	  }

> +      }


Or, actually, can the 'if (mapnum > 0)' above and the 'for' loop here
again be simplified to just one loop with 'break', like you've done in
'libgomp/oacc-parallel.c:revert_noncontig_array_map_pointers'?

> +

> +  tgt = gomp_malloc (sizeof (*tgt)

> +		     + sizeof (tgt->list[0]) * (mapnum + nca_data_row_num));

> +  tgt->list_count = mapnum + nca_data_row_num;

>    tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;

>    tgt->device_descr = devicep;

>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;


> @@ -735,6 +931,56 @@ gomp_map_vars_internal (struct gomp_device_descr *

>  	}

>      }

>  

> +  /* For non-contiguous arrays. Each data row is one target item, separated

> +     from the normal map clause items, hence we order them after mapnum.  */

> +  if (process_noncontig_arrays)

> +    for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)

> +      {

> +	int kind = get_kind (short_mapkind, kinds, i);

> +	if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))

> +	  continue;


Can instead 'break' again?

> @@ -1044,8 +1299,112 @@ gomp_map_vars_internal (struct gomp_device_descr *

>  		array++;

>  	      }

>  	  }

> +

> +      /* Processing of non-contiguous array rows.  */

> +      if (process_noncontig_arrays)

> +	{

> +	  for (i = 0, nca_index = 0, row_start = 0; i < mapnum; i++)

> +	    {

> +	      int kind = get_kind (short_mapkind, kinds, i);

> +	      if (!GOMP_MAP_NONCONTIG_ARRAY_P (kind & typemask))

> +		continue;


Likewise?


It's now gotten too late; more review to follow later.


Grüße
 Thomas
Chung-Lin Tang Nov. 12, 2019, 12:35 p.m. UTC | #2
Hi Thomas,
thanks for the first review. I'm still working on another revision,
but wanted to respond to some of the issues you raised first:

On 2019/11/7 8:48 AM, Thomas Schwinge wrote:
>> (1) The simplest solution: implement a processing which searches and reverts such

>> non-contiguous array map entries in GOACC_parallel_keyed.

>> (note: I have implemented this in the current attached "v2" patch)

>>

>> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;

>> i.e. still do the proper gomp_map_vars processing for all cases.

>>

>> (3) Modify the non-contiguous array map conventions: a possible solution is to use

>> two maps placed together: one for the array pointer, another for the array descriptor (as

>> opposed to the current style of using only one map) This needs more further elaborate

>> compiler/runtime work.

>>

>> The first two options will pessimize host-mode performance somewhat. The third I have

>> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we should do.

> I'll have to think about it some more, but variant (1) doesn't seem so

> bad actually, for a first take.  While it's not nice to pessimize in

> particular directives with 'if (false)' clauses, at least it does work,

> the run-time overhead should not be too bad (also compared to variant

> (2), I suppose), and variant (3) can still be implemented later.


The issue is that (1),(2) vs (3) have different binary interfaces, so a decision has to be
made first, lest we again have compatibility issues later.

Also, (1) vs (2) also may be somewhat different do to the memory copying effects of
gomp_map_vars()  (possible semantic difference versus the usual shared memory expectations?)

I'm currently working on another way of implementing something similar to (3),
but using the variadic arguments of GOACC_parallel_keyed instead of maps, WDYT?

>> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi

>>         unsigned int num = types.length (), i;

>>         tree t, side_effects = NULL_TREE, size = NULL_TREE;

>>         tree condition = NULL_TREE;

>> +      tree ncarray_dims = NULL_TREE;

>>   

>>         if (int_size_in_bytes (TREE_TYPE (first)) <= 0)

>>   	maybe_zero_len = true;

>> @@ -13261,6 +13271,13 @@ handle_omp_array_sections (tree c, enum c_omp_regi

>>   	    length = fold_convert (sizetype, length);

>>   	  if (low_bound == NULL_TREE)

>>   	    low_bound = integer_zero_node;

>> +

>> +	  if (non_contiguous)

>> +	    {

>> +	      ncarray_dims = tree_cons (low_bound, length, ncarray_dims);

>> +	      continue;

>> +	    }

>> +

>>   	  if (!maybe_zero_len && i > first_non_one)

>>   	    {

>>   	      if (integer_nonzerop (low_bound))

> I'm not at all familiar with this array sections code, will trust your

> understanding that we don't need any of the processing that you're

> skipping here ('continue'): 'TREE_SIDE_EFFECTS' handling for the length

> expressions, and other things.


I will re-check on this.

Ditto for the other minor issues you raised.

>>   	  if (DECL_P (decl))

>>   	    {

>>   	      if (DECL_SIZE (decl)

>> @@ -2624,6 +2830,14 @@ scan_omp_target (gomp_target *stmt, omp_context *o

>>         gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);

>>       }

>>

>> +  /* If is OpenACC construct, put non-contiguous array clauses (if any)

>> +     in front of clause chain. The runtime can then test the first to see

>> +     if the additional map processing for them is required.  */

>> +  if (is_gimple_omp_oacc (stmt))

>> +    reorder_noncontig_array_clauses (gimple_omp_target_clauses_ptr (stmt));

> Should that be deemed unsuitable for any reason, then add a new

> 'GOACC_FLAG_*' flag to indicate existance of non-contiguous arrays.


I'm considering using that convention unconditionally, not sure if it's faster
though, since that means we can't do the 'early breaking' you mentioned when
scanning through maps looking for GOMP_MAP_NONCONTIG_ARRAY_P.

>> --- include/gomp-constants.h	(revision 277827)

>> +++ include/gomp-constants.h	(working copy)

>> @@ -40,6 +40,7 @@

>>   #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)

>>   #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)

>>   #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)

>> +#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)

>>   #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \

>>   					 | GOMP_MAP_FLAG_SPECIAL_0)

>>   /* Flag to force a specific behavior (or else, trigger a run-time error).  */

>> @@ -127,6 +128,26 @@ enum gomp_map_kind

>>       /* Decrement usage count and deallocate if zero.  */

>>       GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2

>>   					 | GOMP_MAP_DELETE),

>> +    /* Mapping kinds for non-contiguous arrays.  */

>> +    GOMP_MAP_NONCONTIG_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),

>> +    GOMP_MAP_NONCONTIG_ARRAY_TO =	(GOMP_MAP_NONCONTIG_ARRAY

>> +					 | GOMP_MAP_TO),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FROM =	(GOMP_MAP_NONCONTIG_ARRAY

>> +					 | GOMP_MAP_FROM),

>> +    GOMP_MAP_NONCONTIG_ARRAY_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY

>> +					 | GOMP_MAP_TOFROM),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO =	(GOMP_MAP_NONCONTIG_ARRAY_TO

>> +					 | GOMP_MAP_FLAG_FORCE),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM =	(GOMP_MAP_NONCONTIG_ARRAY_FROM

>> +						 | GOMP_MAP_FLAG_FORCE),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM =	(GOMP_MAP_NONCONTIG_ARRAY_TOFROM

>> +						 | GOMP_MAP_FLAG_FORCE),

>> +    GOMP_MAP_NONCONTIG_ARRAY_ALLOC =		(GOMP_MAP_NONCONTIG_ARRAY

>> +						 | GOMP_MAP_ALLOC),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC =	(GOMP_MAP_NONCONTIG_ARRAY

>> +						 | GOMP_MAP_FORCE_ALLOC),

>> +    GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT =	(GOMP_MAP_NONCONTIG_ARRAY

>> +						 | GOMP_MAP_FORCE_PRESENT),

> Just an idea: instead of this long list, would it maybe be better (if

> feasible at all?) to have a single "lead-in" mapping

> 'GOMP_MAP_NONCONTIG_ARRAY_MODE', which specifies how many of the

> following (normal) mappings belong to that "non-contiguous array mode".

> (Roughly similar to what 'GOMP_MAP_TO_PSET' is doing with any

> 'GOMP_MAP_POINTER's following it.)  Might that make some things simpler,

> or even more complicated (more internal state to keep)?


I prefer not, wrangling with multiple-map sequences in the complex gomp_map_vars code
is proving to be a tedious task; my now given-up version of method (3) above tried using
two map kinds (an 'array' and an 'array descriptor'). Haven't yet got it to work properly.

Also, a non-contiguous array is just a data clause specification feature, and should support
all modes (copy/in/out,present,alloc,etc.) Using a whole GOMP_MAP_FLAG_SPECIAL_3 bit in
combination with other flags independently should be warranted.


>> --- libgomp/oacc-parallel.c	(revision 277827)

>> +++ libgomp/oacc-parallel.c	(working copy)

>> +static inline void

>> +revert_noncontig_array_map_pointers (size_t mapnum, void **hostaddrs,

>> +				     unsigned short *kinds)

>> +{

>> +  for (int i = 0; i < mapnum; i++)

>> +    {

>> +      if (GOMP_MAP_NONCONTIG_ARRAY_P (kinds[i] & 0xff))

>> +	hostaddrs[i] = *((void **)hostaddrs[i]);

> Can we be (or, do we make) sure that 'hostaddrs' will never be in

> read-only memory?

> 

> And, it's permissible to alter 'hostaddrs'?

> 

> Ah, other code (including 'libgomp/target.c') is doing such things, too,

> so it must be fine.


The hostaddrs[] array is the 'receiver' record built on stack by omp-low,
so it should always be safe to modify, I think.

Thanks again for the review!
Chung-Lin
diff mbox

Patch

Index: gcc/c/c-typeck.c
===================================================================
--- gcc/c/c-typeck.c	(revision 244258)
+++ gcc/c/c-typeck.c	(revision 244259)
@@ -11926,7 +11926,7 @@ 
 static tree
 handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
 			     bool &maybe_zero_len, unsigned int &first_non_one,
-			     enum c_omp_region_type ort)
+			     bool &non_contiguous, enum c_omp_region_type ort)
 {
   tree ret, low_bound, length, type;
   if (TREE_CODE (t) != TREE_LIST)
@@ -11982,7 +11982,8 @@ 
     }
 
   ret = handle_omp_array_sections_1 (c, TREE_CHAIN (t), types,
-				     maybe_zero_len, first_non_one, ort);
+				     maybe_zero_len, first_non_one,
+				     non_contiguous, ort);
   if (ret == error_mark_node || ret == NULL_TREE)
     return ret;
 
@@ -12142,6 +12143,21 @@ 
 		    }
 		}
 	    }
+
+	  /* For OpenACC, if the low_bound/length suggest this is a subarray,
+	     and is referenced through by a pointer, then mark this as
+	     non-contiguous.  */
+	  if (ort == C_ORT_ACC
+	      && types.length () > 0
+	      && (TREE_CODE (low_bound) != INTEGER_CST
+		  || integer_nonzerop (low_bound)
+		  || (length && (TREE_CODE (length) != INTEGER_CST
+				 || !tree_int_cst_equal (size, length)))))
+	    {
+	      tree x = types.last ();
+	      if (TREE_CODE (x) == POINTER_TYPE)
+		non_contiguous = true;
+	    }
 	}
       else if (length == NULL_TREE)
 	{
@@ -12183,13 +12199,16 @@ 
       /* If there is a pointer type anywhere but in the very first
 	 array-section-subscript, the array section can't be contiguous.  */
       if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
-	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
+	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST
+	  && ort != C_ORT_ACC)
 	{
 	  error_at (OMP_CLAUSE_LOCATION (c),
 		    "array section is not contiguous in %qs clause",
 		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  return error_mark_node;
 	}
+      else if (TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
+	non_contiguous = true;
     }
   else
     {
@@ -12217,10 +12236,11 @@ 
 {
   bool maybe_zero_len = false;
   unsigned int first_non_one = 0;
+  bool non_contiguous = false;
   auto_vec<tree, 10> types;
   tree first = handle_omp_array_sections_1 (c, OMP_CLAUSE_DECL (c), types,
 					    maybe_zero_len, first_non_one,
-					    ort);
+					    non_contiguous, ort);
   if (first == error_mark_node)
     return true;
   if (first == NULL_TREE)
@@ -12253,6 +12273,7 @@ 
       unsigned int num = types.length (), i;
       tree t, side_effects = NULL_TREE, size = NULL_TREE;
       tree condition = NULL_TREE;
+      tree da_dims = NULL_TREE;
 
       if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
 	maybe_zero_len = true;
@@ -12276,6 +12297,13 @@ 
 	    length = fold_convert (sizetype, length);
 	  if (low_bound == NULL_TREE)
 	    low_bound = integer_zero_node;
+
+	  if (non_contiguous)
+	    {
+	      da_dims = tree_cons (low_bound, length, da_dims);
+	      continue;
+	    }
+
 	  if (!maybe_zero_len && i > first_non_one)
 	    {
 	      if (integer_nonzerop (low_bound))
@@ -12368,6 +12396,14 @@ 
 		size = size_binop (MULT_EXPR, size, l);
 	    }
 	}
+      if (non_contiguous)
+	{
+	  int kind = OMP_CLAUSE_MAP_KIND (c);
+	  OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_DYNAMIC_ARRAY);
+	  OMP_CLAUSE_DECL (c) = t;
+	  OMP_CLAUSE_SIZE (c) = da_dims;
+	  return false;
+	}
       if (side_effects)
 	size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
       if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 244258)
+++ gcc/cp/semantics.c	(revision 244259)
@@ -4482,7 +4482,7 @@ 
 static tree
 handle_omp_array_sections_1 (tree c, tree t, vec<tree> &types,
 			     bool &maybe_zero_len, unsigned int &first_non_one,
-			     enum c_omp_region_type ort)
+			     bool &non_contiguous, enum c_omp_region_type ort)
 {
   tree ret, low_bound, length, type;
   if (TREE_CODE (t) != TREE_LIST)
@@ -4565,7 +4565,8 @@ 
       && TREE_CODE (TREE_CHAIN (t)) == FIELD_DECL)
     TREE_CHAIN (t) = omp_privatize_field (TREE_CHAIN (t), false);
   ret = handle_omp_array_sections_1 (c, TREE_CHAIN (t), types,
-				     maybe_zero_len, first_non_one, ort);
+				     maybe_zero_len, first_non_one,
+				     non_contiguous, ort);
   if (ret == error_mark_node || ret == NULL_TREE)
     return ret;
 
@@ -4737,6 +4738,21 @@ 
 		    }
 		}
 	    }
+
+	  /* For OpenACC, if the low_bound/length suggest this is a subarray,
+	     and is referenced through by a pointer, then mark this as
+	     non-contiguous.  */
+	  if (ort == C_ORT_ACC
+	      && types.length () > 0
+	      && (TREE_CODE (low_bound) != INTEGER_CST
+		  || integer_nonzerop (low_bound)
+		  || (length && (TREE_CODE (length) != INTEGER_CST
+				 || !tree_int_cst_equal (size, length)))))
+	    {
+	      tree x = types.last ();
+	      if (TREE_CODE (x) == POINTER_TYPE)
+		non_contiguous = true;
+	    }
 	}
       else if (length == NULL_TREE)
 	{
@@ -4778,13 +4794,16 @@ 
       /* If there is a pointer type anywhere but in the very first
 	 array-section-subscript, the array section can't be contiguous.  */
       if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
-	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
+	  && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST
+	  && ort != C_ORT_ACC)
 	{
 	  error_at (OMP_CLAUSE_LOCATION (c),
 		    "array section is not contiguous in %qs clause",
 		    omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  return error_mark_node;
 	}
+      else if (TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
+	non_contiguous = true;
     }
   else
     {
@@ -4812,10 +4831,11 @@ 
 {
   bool maybe_zero_len = false;
   unsigned int first_non_one = 0;
+  bool non_contiguous = false;
   auto_vec<tree, 10> types;
   tree first = handle_omp_array_sections_1 (c, OMP_CLAUSE_DECL (c), types,
 					    maybe_zero_len, first_non_one,
-					    ort);
+					    non_contiguous, ort);
   if (first == error_mark_node)
     return true;
   if (first == NULL_TREE)
@@ -4849,6 +4869,7 @@ 
       unsigned int num = types.length (), i;
       tree t, side_effects = NULL_TREE, size = NULL_TREE;
       tree condition = NULL_TREE;
+      tree da_dims = NULL_TREE;
 
       if (int_size_in_bytes (TREE_TYPE (first)) <= 0)
 	maybe_zero_len = true;
@@ -4874,6 +4895,13 @@ 
 	    length = fold_convert (sizetype, length);
 	  if (low_bound == NULL_TREE)
 	    low_bound = integer_zero_node;
+
+	  if (non_contiguous)
+	    {
+	      da_dims = tree_cons (low_bound, length, da_dims);
+	      continue;
+	    }
+
 	  if (!maybe_zero_len && i > first_non_one)
 	    {
 	      if (integer_nonzerop (low_bound))
@@ -4961,6 +4989,14 @@ 
 	}
       if (!processing_template_decl)
 	{
+	  if (non_contiguous)
+	    {
+	      int kind = OMP_CLAUSE_MAP_KIND (c);
+	      OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_DYNAMIC_ARRAY);
+	      OMP_CLAUSE_DECL (c) = t;
+	      OMP_CLAUSE_SIZE (c) = da_dims;
+	      return false;
+	    }
 	  if (side_effects)
 	    size = build2 (COMPOUND_EXPR, sizetype, side_effects, size);
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 244258)
+++ gcc/gimplify.c	(revision 244259)
@@ -6928,9 +6928,29 @@ 
 	  if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
 	    OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
 				  : TYPE_SIZE_UNIT (TREE_TYPE (decl));
-	  if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
-			     NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
+	  if (OMP_CLAUSE_SIZE (c)
+	      && TREE_CODE (OMP_CLAUSE_SIZE (c)) == TREE_LIST
+	      && GOMP_MAP_DYNAMIC_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
 	    {
+	      tree dims = OMP_CLAUSE_SIZE (c);
+	      for (tree t = dims; t; t = TREE_CHAIN (t))
+		{
+		  /* If a dimension bias isn't a constant, we have to ensure
+		     that the value gets transferred to the offload target.  */
+		  tree low_bound = TREE_PURPOSE (t);
+		  if (TREE_CODE (low_bound) != INTEGER_CST)
+		    {
+		      low_bound = get_initialized_tmp_var (low_bound, pre_p,
+							   NULL);
+		      omp_add_variable (ctx, low_bound, 
+					GOVD_FIRSTPRIVATE | GOVD_SEEN);
+		      TREE_PURPOSE (t) = low_bound;
+		    }
+		}
+	    }
+	  else if (gimplify_expr (&OMP_CLAUSE_SIZE (c), pre_p,
+				  NULL, is_gimple_val, fb_rvalue) == GS_ERROR)
+	    {
 	      remove = true;
 	      break;
 	    }
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	(revision 244258)
+++ gcc/tree-pretty-print.c	(revision 244259)
@@ -737,6 +737,33 @@ 
 	case GOMP_MAP_LINK:
 	  pp_string (pp, "link");
 	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_TO:
+	  pp_string (pp, "to,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FROM:
+	  pp_string (pp, "from,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_TOFROM:
+	  pp_string (pp, "tofrom,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FORCE_TO:
+	  pp_string (pp, "force_to,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FORCE_FROM:
+	  pp_string (pp, "force_from,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FORCE_TOFROM:
+	  pp_string (pp, "force_tofrom,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_ALLOC:
+	  pp_string (pp, "alloc,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FORCE_ALLOC:
+	  pp_string (pp, "force_alloc,dynamic_array");
+	  break;
+	case GOMP_MAP_DYNAMIC_ARRAY_FORCE_PRESENT:
+	  pp_string (pp, "force_present,dynamic_array");
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
@@ -758,6 +785,10 @@ 
 	    case GOMP_MAP_TO_PSET:
 	      pp_string (pp, " [pointer set, len: ");
 	      break;
+	    case GOMP_MAP_DYNAMIC_ARRAY:
+	      gcc_assert (TREE_CODE (OMP_CLAUSE_SIZE (clause)) == TREE_LIST);
+	      pp_string (pp, " [dimensions: ");
+	      break;
 	    default:
 	      pp_string (pp, " [len: ");
 	      break;
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 244258)
+++ gcc/omp-low.c	(revision 244259)
@@ -84,6 +84,7 @@ 
 #include "hsa.h"
 #include "params.h"
 #include "tree-ssa-propagate.h"
+#include "tree-hash-traits.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
    phases.  The first phase scans the function looking for OMP statements
@@ -203,6 +204,9 @@ 
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* Hash map of dynamic arrays in this context.  */
+  hash_map<tree_operand_hash, tree> *dynamic_arrays;
 };
 
 /* A structure holding the elements of:
@@ -1619,7 +1623,136 @@ 
   return error_mark_node;
 }
 
+/* Helper function for create_dynamic_array_descr_type(), to append a new field
+   to a record type.  */
 
+static void
+append_field_to_record_type (tree record_type, tree fld_ident, tree fld_type)
+{
+  tree *p, fld = build_decl (UNKNOWN_LOCATION, FIELD_DECL, fld_ident, fld_type);
+  DECL_CONTEXT (fld) = record_type;
+
+  for (p = &TYPE_FIELDS (record_type); *p; p = &DECL_CHAIN (*p))
+    ;
+  *p = fld;
+}
+
+/* Create type for dynamic array descriptor. Returns created type, and
+   returns the number of dimensions in *DIM_NUM.  */
+
+static tree
+create_dynamic_array_descr_type (tree decl, tree dims, int *dim_num)
+{
+  int n = 0;
+  tree da_descr_type, name, x;
+  gcc_assert (TREE_CODE (dims) == TREE_LIST);
+
+  da_descr_type = lang_hooks.types.make_type (RECORD_TYPE);
+  name = create_tmp_var_name (".omp_dynamic_array_descr_type");
+  name = build_decl (UNKNOWN_LOCATION, TYPE_DECL, name, da_descr_type);
+  DECL_ARTIFICIAL (name) = 1;
+  DECL_NAMELESS (name) = 1;
+  TYPE_NAME (da_descr_type) = name;
+  TYPE_ARTIFICIAL (da_descr_type) = 1;
+
+  /* Main starting pointer/array.  */
+  tree main_var_type = TREE_TYPE (decl);
+  if (TREE_CODE (main_var_type) == REFERENCE_TYPE)
+    main_var_type = TREE_TYPE (main_var_type);
+  append_field_to_record_type (da_descr_type, DECL_NAME (decl),
+			       (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+				? main_var_type
+				: build_pointer_type (main_var_type)));
+  /* Number of dimensions.  */
+  append_field_to_record_type (da_descr_type, get_identifier ("$dim_num"),
+			       sizetype);
+
+  for (x = dims; x; x = TREE_CHAIN (x), n++)
+    {
+      char *fldname;
+      /* One for the start index.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_base", n);
+      append_field_to_record_type (da_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for the length.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_length", n);
+      append_field_to_record_type (da_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for the element size.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_elem_size", n);
+      append_field_to_record_type (da_descr_type, get_identifier (fldname),
+				   sizetype);
+      /* One for is_array flag.  */
+      ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_is_array", n);
+      append_field_to_record_type (da_descr_type, get_identifier (fldname),
+				   sizetype);
+    }
+
+  layout_type (da_descr_type);
+  *dim_num = n;
+  return da_descr_type;
+}
+
+/* Generate code sequence for initializing dynamic array descriptor.  */
+
+static void
+create_dynamic_array_descr_init_code (tree da_descr, tree da_var,
+				      tree dimensions, int da_dim_num,
+				      gimple_seq *ilist)
+{
+  tree fld, fldref;
+  tree da_descr_type = TREE_TYPE (da_descr);
+  tree dim_type = TREE_TYPE (da_var);
+
+  fld = TYPE_FIELDS (da_descr_type);
+  fldref = omp_build_component_ref (da_descr, fld);
+  gimplify_assign (fldref, (TREE_CODE (dim_type) == ARRAY_TYPE
+			    ? build_fold_addr_expr (da_var) : da_var), ilist);
+
+  if (TREE_CODE (dim_type) == REFERENCE_TYPE)
+    dim_type = TREE_TYPE (dim_type);
+
+  fld = TREE_CHAIN (fld);
+  fldref = omp_build_component_ref (da_descr, fld);
+  gimplify_assign (fldref, build_int_cst (sizetype, da_dim_num), ilist);
+
+  while (dimensions)
+    {
+      tree dim_base = fold_convert (sizetype, TREE_PURPOSE (dimensions));
+      tree dim_length = fold_convert (sizetype, TREE_VALUE (dimensions));
+      tree dim_elem_size = TYPE_SIZE_UNIT (TREE_TYPE (dim_type));
+      tree dim_is_array = (TREE_CODE (dim_type) == ARRAY_TYPE
+			   ? integer_one_node : integer_zero_node);
+      /* Set base.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (da_descr, fld);
+      dim_base = fold_build2 (MULT_EXPR, sizetype, dim_base, dim_elem_size);
+      gimplify_assign (fldref, dim_base, ilist);
+
+      /* Set length.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (da_descr, fld);
+      dim_length = fold_build2 (MULT_EXPR, sizetype, dim_length, dim_elem_size);
+      gimplify_assign (fldref, dim_length, ilist);
+
+      /* Set elem_size.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (da_descr, fld);
+      dim_elem_size = fold_convert (sizetype, dim_elem_size);
+      gimplify_assign (fldref, dim_elem_size, ilist);
+
+      /* Set is_array flag.  */
+      fld = TREE_CHAIN (fld);
+      fldref = omp_build_component_ref (da_descr, fld);
+      dim_is_array = fold_convert (sizetype, dim_is_array);
+      gimplify_assign (fldref, dim_is_array, ilist);
+
+      dimensions = TREE_CHAIN (dimensions);
+      dim_type = TREE_TYPE (dim_type);
+    }
+  gcc_assert (TREE_CHAIN (fld) == NULL_TREE);
+}
+
 /* Debugging dumps for parallel regions.  */
 void dump_omp_region (FILE *, struct omp_region *, int);
 void debug_omp_region (struct omp_region *);
@@ -1760,6 +1893,8 @@ 
 
   ctx->cb.decl_map = new hash_map<tree, tree>;
 
+  ctx->dynamic_arrays = new hash_map<tree_operand_hash, tree>;
+
   return ctx;
 }
 
@@ -1834,6 +1969,8 @@ 
   if (is_task_ctx (ctx))
     finalize_task_copyfn (as_a <gomp_task *> (ctx->stmt));
 
+  delete ctx->dynamic_arrays;
+
   XDELETE (ctx);
 }
 
@@ -2144,6 +2281,42 @@ 
 	      install_var_local (decl, ctx);
 	      break;
 	    }
+
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	      && GOMP_MAP_DYNAMIC_ARRAY_P (OMP_CLAUSE_MAP_KIND (c)))
+	    {
+	      tree da_decl = OMP_CLAUSE_DECL (c);
+	      tree da_dimensions = OMP_CLAUSE_SIZE (c);
+	      tree da_type = TREE_TYPE (da_decl);
+	      bool by_ref = (TREE_CODE (da_type) == ARRAY_TYPE
+			     ? true : false);
+
+	      /* Checking code to ensure we only have arrays at top dimension.
+		 This limitation might be lifted in the future.  */
+	      if (TREE_CODE (da_type) == REFERENCE_TYPE)
+		da_type = TREE_TYPE (da_type);
+	      tree t = da_type, prev_t = NULL_TREE;
+	      while (t)
+		{
+		  if (TREE_CODE (t) == ARRAY_TYPE && prev_t)
+		    {
+		      error_at (gimple_location (ctx->stmt), "array types are"
+				" only allowed at outermost dimension of"
+				" dynamic array");
+		      break;
+		    }
+		  prev_t = t;
+		  t = TREE_TYPE (t);
+		}
+
+	      install_var_field (da_decl, by_ref, 3, ctx);
+	      tree new_var = install_var_local (da_decl, ctx);
+
+	      bool existed = ctx->dynamic_arrays->put (new_var, da_dimensions);
+	      gcc_assert (!existed);
+	      break;
+	    }
+
 	  if (DECL_P (decl))
 	    {
 	      if (DECL_SIZE (decl)
@@ -16359,6 +16532,15 @@ 
 	  case GOMP_MAP_FORCE_PRESENT:
 	  case GOMP_MAP_FORCE_DEVICEPTR:
 	  case GOMP_MAP_DEVICE_RESIDENT:
+	  case GOMP_MAP_DYNAMIC_ARRAY_TO:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FROM:
+	  case GOMP_MAP_DYNAMIC_ARRAY_TOFROM:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FORCE_TO:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FORCE_FROM:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FORCE_TOFROM:
+	  case GOMP_MAP_DYNAMIC_ARRAY_ALLOC:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FORCE_ALLOC:
+	  case GOMP_MAP_DYNAMIC_ARRAY_FORCE_PRESENT:
 	  case GOMP_MAP_LINK:
 	    gcc_assert (is_gimple_omp_oacc (stmt));
 	    break;
@@ -16421,7 +16603,14 @@ 
 	if (offloaded && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 			   && OMP_CLAUSE_MAP_IN_REDUCTION (c)))
 	  {
-	    x = build_receiver_ref (var, true, ctx);
+	    tree var_type = TREE_TYPE (var);
+	    bool rcv_by_ref =
+	      (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	       && GOMP_MAP_DYNAMIC_ARRAY_P (OMP_CLAUSE_MAP_KIND (c))
+	       && TREE_CODE (var_type) != ARRAY_TYPE
+	       ? false : true);
+
+	    x = build_receiver_ref (var, rcv_by_ref, ctx);
 	    tree new_var = lookup_decl (var, ctx);
 
 	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
@@ -16665,6 +16854,25 @@ 
 		    avar = build_fold_addr_expr (avar);
 		    gimplify_assign (x, avar, &ilist);
 		  }
+		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+			 && (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_DYNAMIC_ARRAY))
+		  {
+		    int da_dim_num;
+		    tree dimensions = OMP_CLAUSE_SIZE (c);
+
+		    tree da_descr_type =
+		      create_dynamic_array_descr_type (OMP_CLAUSE_DECL (c),
+						       dimensions, &da_dim_num);
+		    tree da_descr =
+		      create_tmp_var_raw (da_descr_type, ".$omp_da_descr");
+		    gimple_add_tmp_var (da_descr);
+
+		    create_dynamic_array_descr_init_code
+		      (da_descr, ovar, dimensions, da_dim_num, &ilist);
+
+		    gimplify_assign (x, build_fold_addr_expr (da_descr),
+				     &ilist);
+		  }
 		else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
 		  {
 		    gcc_checking_assert (is_gimple_omp_oacc (ctx->stmt));
@@ -16725,6 +16933,9 @@ 
 		  s = TREE_TYPE (s);
 		s = TYPE_SIZE_UNIT (s);
 	      }
+	    else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		     && (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_DYNAMIC_ARRAY))
+	      s = NULL_TREE;
 	    else
 	      s = OMP_CLAUSE_SIZE (c);
 	    if (s == NULL_TREE)
@@ -17406,7 +17617,202 @@ 
 		       gimple_build_omp_return (false));
 }
 
+/* Helper to lookup dynamic array through nested omp contexts. Returns
+   TREE_LIST of dimensions, and the CTX where it was found in *CTX_P.  */
 
+static tree
+dynamic_array_lookup (tree t, omp_context **ctx_p)
+{
+  omp_context *c = *ctx_p;
+  while (c)
+    {
+      tree *dims = c->dynamic_arrays->get (t);
+      if (dims)
+	{
+	  *ctx_p = c;
+	  return *dims;
+	}
+      c = c->outer;
+    }
+  return NULL_TREE;
+}
+
+/* Tests if this gimple STMT is the start of a dynamic array access sequence.
+   Returns true if found, and also returns the gimple operand ptr and
+   dimensions tree list through *OUT_REF and *OUT_DIMS respectively.  */
+
+static bool
+dynamic_array_reference_start (gimple *stmt, omp_context **ctx_p,
+			       tree **out_ref, tree *out_dims)
+{
+  if (gimple_code (stmt) == GIMPLE_ASSIGN)
+    for (unsigned i = 1; i < gimple_num_ops (stmt); i++)
+      {
+	tree *op = gimple_op_ptr (stmt, i), dims;
+	if (TREE_CODE (*op) == ARRAY_REF)
+	  op = &TREE_OPERAND (*op, 0);
+	if (TREE_CODE (*op) == MEM_REF)
+	  op = &TREE_OPERAND (*op, 0);
+	if ((dims = dynamic_array_lookup (*op, ctx_p)) != NULL_TREE)
+	  {
+	    *out_ref = op;
+	    *out_dims = dims;
+	    return true;
+	  }
+      }
+  return false;
+}
+
+static tree
+scan_for_op (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  tree t = *tp;
+  tree op = (tree) wi->info;
+  *walk_subtrees = 1;
+  if (operand_equal_p (t, op, 0))
+    {
+      wi->info = tp;
+      return t;
+    }
+  return NULL_TREE;
+}
+
+static tree *
+scan_for_reference (gimple *stmt, tree op)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = op;
+  if (walk_gimple_op (stmt, scan_for_op, &wi))
+    return (tree *) wi.info;
+  return NULL;
+}
+
+static tree
+da_create_bias (tree orig_bias, tree unit_type)
+{
+  return build2 (MULT_EXPR, sizetype, fold_convert (sizetype, orig_bias),
+		 TYPE_SIZE_UNIT (unit_type));
+}
+
+/* Main worker for adjusting dynamic array accesses, handles the adjustment
+   of many cases of statement forms, and called multiple times to 'peel' away
+   each dimension.  */
+
+static gimple_stmt_iterator
+da_dimension_peel (omp_context *da_ctx,
+		   gimple_stmt_iterator da_gsi, tree orig_da,
+		   tree *da_op_p, tree *da_type_p, tree *da_dims_p)
+{
+  gimple *stmt = gsi_stmt (da_gsi);
+  tree lhs = gimple_assign_lhs (stmt);
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (gimple_num_ops (stmt) == 2
+      && TREE_CODE (rhs) == MEM_REF
+      && operand_equal_p (*da_op_p, TREE_OPERAND (rhs, 0), 0)
+      && !operand_equal_p (orig_da, TREE_OPERAND (rhs, 0), 0)
+      && (TREE_OPERAND (rhs, 1) == NULL_TREE
+	  || integer_zerop (TREE_OPERAND (rhs, 1))))
+    {
+      gcc_assert (TREE_CODE (TREE_TYPE (*da_type_p)) == POINTER_TYPE);
+      *da_type_p = TREE_TYPE (*da_type_p);
+    }
+  else 
+    {
+      gimple *g;
+      gimple_seq ilist = NULL;
+      tree bias, t;
+      tree op = *da_op_p;
+      tree orig_type = *da_type_p;
+      tree orig_bias = TREE_PURPOSE (*da_dims_p);
+      bool by_ref = false;
+
+      if (TREE_CODE (orig_bias) != INTEGER_CST)
+	orig_bias = lookup_decl (orig_bias, da_ctx);
+
+      if (gimple_num_ops (stmt) == 2)
+	{
+	  if (TREE_CODE (rhs) == ADDR_EXPR)
+	    {
+	      rhs = TREE_OPERAND (rhs, 0);
+	      *da_dims_p = NULL_TREE;
+	    }
+
+	  if (TREE_CODE (rhs) == ARRAY_REF
+	      && TREE_CODE (TREE_OPERAND (rhs, 0)) == MEM_REF
+	      && operand_equal_p (TREE_OPERAND (TREE_OPERAND (rhs, 0), 0),
+				  *da_op_p, 0))
+	    {
+	      bias = da_create_bias (orig_bias,
+				     TREE_TYPE (TREE_TYPE (orig_type)));
+	      *da_type_p = TREE_TYPE (TREE_TYPE (orig_type));
+	    }
+	  else if (TREE_CODE (rhs) == ARRAY_REF
+		   && TREE_CODE (TREE_OPERAND (rhs, 0)) == VAR_DECL
+		   && operand_equal_p (TREE_OPERAND (rhs, 0), *da_op_p, 0))
+	    {
+	      tree ptr_type = build_pointer_type (orig_type);
+	      op = create_tmp_var (ptr_type);
+	      gimplify_assign (op, build_fold_addr_expr (TREE_OPERAND (rhs, 0)),
+			       &ilist);
+	      bias = da_create_bias (orig_bias, TREE_TYPE (orig_type));
+	      *da_type_p = TREE_TYPE (orig_type);
+	      orig_type = ptr_type;
+	      by_ref = true;
+	    }
+	  else if (TREE_CODE (rhs) == MEM_REF
+		   && operand_equal_p (*da_op_p, TREE_OPERAND (rhs, 0), 0)
+		   && TREE_OPERAND (rhs, 1) != NULL_TREE)
+	    {
+	      bias = da_create_bias (orig_bias, TREE_TYPE (orig_type));
+	      *da_type_p = TREE_TYPE (orig_type);
+	    }
+	  else if (TREE_CODE (lhs) == MEM_REF
+		   && operand_equal_p (*da_op_p, TREE_OPERAND (lhs, 0), 0))
+	    {
+	      if (*da_dims_p != NULL_TREE)
+		{
+		  gcc_assert (TREE_CHAIN (*da_dims_p) == NULL_TREE);
+		  bias = da_create_bias (orig_bias, TREE_TYPE (orig_type));
+		  *da_type_p = TREE_TYPE (orig_type);
+		}
+	      else
+		/* This should be the end of the dynamic array access
+		   sequence.  */
+		return da_gsi;
+	    }
+	  else
+	    gcc_unreachable ();
+	}
+      else if (gimple_num_ops (stmt) == 3
+	       && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
+	       && operand_equal_p (*da_op_p, rhs, 0))
+	{
+	  bias = da_create_bias (orig_bias, TREE_TYPE (orig_type));
+	}
+      else
+	gcc_unreachable ();
+
+      bias = fold_build1 (NEGATE_EXPR, sizetype, bias);
+      bias = fold_build2 (POINTER_PLUS_EXPR, orig_type, op, bias);
+
+      t = create_tmp_var (by_ref ? build_pointer_type (orig_type) : orig_type);
+
+      g = gimplify_assign (t, bias, &ilist);
+      gsi_insert_seq_before (&da_gsi, ilist, GSI_NEW_STMT);
+      *da_op_p = gimple_assign_lhs (g);
+
+      if (by_ref)
+	*da_op_p = build2 (MEM_REF, TREE_TYPE (orig_type), *da_op_p,
+			   build_int_cst (orig_type, 0));
+      *da_dims_p = TREE_CHAIN (*da_dims_p);
+    }
+
+  return da_gsi;
+}
+
 /* Callback for lower_omp_1.  Return non-NULL if *tp needs to be
    regimplified.  If DATA is non-NULL, lower_omp_1 is outside
    of OMP context, but with task_shared_vars set.  */
@@ -17681,6 +18087,51 @@ 
 	  }
       /* FALLTHRU */
     default:
+
+      /* If we detect the start of a dynamic array reference sequence, scan
+	 and do the needed adjustments.  */
+      tree da_dims, *da_op_p;
+      omp_context *da_ctx = ctx;
+      if (da_ctx && dynamic_array_reference_start (stmt, &da_ctx,
+						   &da_op_p, &da_dims))
+	{
+	  bool started = false;
+	  tree orig_da = *da_op_p;
+	  tree da_type = TREE_TYPE (orig_da);
+	  tree next_da_op;
+
+	  gimple_stmt_iterator da_gsi = *gsi_p, new_gsi;
+	  while (da_op_p)
+	    {
+	      if (!is_gimple_assign (gsi_stmt (da_gsi))
+		  || ((gimple_assign_single_p (gsi_stmt (da_gsi))
+		       || gimple_assign_cast_p (gsi_stmt (da_gsi)))
+		      && *da_op_p == gimple_assign_rhs1 (gsi_stmt (da_gsi))))
+		break;
+
+	      new_gsi = da_dimension_peel (da_ctx, da_gsi, orig_da,
+					   da_op_p, &da_type, &da_dims);
+	      if (!started)
+		{
+		  /* Point 'stmt' to the start of the newly added
+		     sequence.  */
+		  started = true;
+		  *gsi_p = new_gsi;
+		  stmt = gsi_stmt (*gsi_p);
+		}
+	      if (!da_dims)
+		break;
+
+	      next_da_op = gimple_assign_lhs (gsi_stmt (da_gsi));
+
+	      do {
+		gsi_next (&da_gsi);
+		da_op_p = scan_for_reference (gsi_stmt (da_gsi), next_da_op);
+	      }
+	      while (!da_op_p);
+	    }
+	}
+
       if ((ctx || task_shared_vars)
 	  && walk_gimple_op (stmt, lower_omp_regimplify_p,
 			     ctx ? NULL : &wi))
Index: include/gomp-constants.h
===================================================================
--- include/gomp-constants.h	(revision 244258)
+++ include/gomp-constants.h	(revision 244259)
@@ -40,6 +40,7 @@ 
 #define GOMP_MAP_FLAG_SPECIAL_0		(1 << 2)
 #define GOMP_MAP_FLAG_SPECIAL_1		(1 << 3)
 #define GOMP_MAP_FLAG_SPECIAL_2		(1 << 4)
+#define GOMP_MAP_FLAG_SPECIAL_3		(1 << 5)
 #define GOMP_MAP_FLAG_SPECIAL		(GOMP_MAP_FLAG_SPECIAL_1 \
 					 | GOMP_MAP_FLAG_SPECIAL_0)
 /* Flag to force a specific behavior (or else, trigger a run-time error).  */
@@ -128,7 +129,26 @@ 
     /* Decrement usage count and deallocate if zero.  */
     GOMP_MAP_RELEASE =			(GOMP_MAP_FLAG_SPECIAL_2
 					 | GOMP_MAP_DELETE),
-
+    /* Mapping kinds for dynamic arrays.  */
+    GOMP_MAP_DYNAMIC_ARRAY =		(GOMP_MAP_FLAG_SPECIAL_3),
+    GOMP_MAP_DYNAMIC_ARRAY_TO =		(GOMP_MAP_DYNAMIC_ARRAY
+					 | GOMP_MAP_TO),
+    GOMP_MAP_DYNAMIC_ARRAY_FROM =	(GOMP_MAP_DYNAMIC_ARRAY
+					 | GOMP_MAP_FROM),
+    GOMP_MAP_DYNAMIC_ARRAY_TOFROM =	(GOMP_MAP_DYNAMIC_ARRAY
+					 | GOMP_MAP_TOFROM),
+    GOMP_MAP_DYNAMIC_ARRAY_FORCE_TO =	(GOMP_MAP_DYNAMIC_ARRAY_TO
+					 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_DYNAMIC_ARRAY_FORCE_FROM =		(GOMP_MAP_DYNAMIC_ARRAY_FROM
+						 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_DYNAMIC_ARRAY_FORCE_TOFROM =	(GOMP_MAP_DYNAMIC_ARRAY_TOFROM
+						 | GOMP_MAP_FLAG_FORCE),
+    GOMP_MAP_DYNAMIC_ARRAY_ALLOC =		(GOMP_MAP_DYNAMIC_ARRAY
+						 | GOMP_MAP_ALLOC),
+    GOMP_MAP_DYNAMIC_ARRAY_FORCE_ALLOC =	(GOMP_MAP_DYNAMIC_ARRAY
+						 | GOMP_MAP_FORCE_ALLOC),
+    GOMP_MAP_DYNAMIC_ARRAY_FORCE_PRESENT =	(GOMP_MAP_DYNAMIC_ARRAY
+						 | GOMP_MAP_FORCE_PRESENT),
     /* Internal to GCC, not used in libgomp.  */
     /* Do not map, but pointer assign a pointer instead.  */
     GOMP_MAP_FIRSTPRIVATE_POINTER =	(GOMP_MAP_LAST | 1),
@@ -156,6 +176,8 @@ 
 #define GOMP_MAP_ALWAYS_P(X) \
   (GOMP_MAP_ALWAYS_TO_P (X) || ((X) == GOMP_MAP_ALWAYS_FROM))
 
+#define GOMP_MAP_DYNAMIC_ARRAY_P(X) \
+  ((X) & GOMP_MAP_DYNAMIC_ARRAY)
 
 /* Asynchronous behavior.  Keep in sync with
    libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_async_t.  */
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 244258)
+++ libgomp/target.c	(revision 244259)
@@ -375,6 +375,140 @@ 
   return tgt->tgt_start + tgt->list[i].offset;
 }
 
+/* Dynamic array related data structures, interfaces with the compiler.  */
+
+struct da_dim {
+  size_t base;
+  size_t length;
+  size_t elem_size;
+  size_t is_array;
+};
+
+struct da_descr_type {
+  void *ptr;
+  size_t ndims;
+  struct da_dim dims[];
+};
+
+/* Internal dynamic array info struct, used only here inside the runtime. */
+
+struct da_info
+{
+  struct da_descr_type *descr;
+  size_t map_index;
+  size_t ptrblock_size;
+  size_t data_row_num;
+  size_t data_row_size;
+};
+
+static size_t
+gomp_dynamic_array_count_rows (struct da_descr_type *descr)
+{
+  size_t nrows = 1;
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    nrows *= descr->dims[d].length / sizeof (void *);
+  return nrows;
+}
+
+static void
+gomp_dynamic_array_compute_info (struct da_info *da)
+{
+  size_t d, n = 1;
+  struct da_descr_type *descr = da->descr;
+
+  da->ptrblock_size = 0;
+  for (d = 0; d < descr->ndims - 1; d++)
+    {
+      size_t dim_count = descr->dims[d].length / descr->dims[d].elem_size;
+      size_t dim_ptrblock_size = (descr->dims[d + 1].is_array
+				  ? 0 : descr->dims[d].length * n);
+      da->ptrblock_size += dim_ptrblock_size;
+      n *= dim_count;
+    }
+  da->data_row_num = n;
+  da->data_row_size = descr->dims[d].length;
+}
+
+static void
+gomp_dynamic_array_fill_rows_1 (struct da_descr_type *descr, void *da,
+				size_t d, void ***row_ptr, size_t *count)
+{
+  if (d < descr->ndims - 1)
+    {
+      size_t elsize = descr->dims[d].elem_size;
+      size_t n = descr->dims[d].length / elsize;
+      void *p = da + descr->dims[d].base;
+      for (size_t i = 0; i < n; i++)
+	{
+	  void *ptr = p + i * elsize;
+	  /* Deref if next dimension is not array.  */
+	  if (!descr->dims[d + 1].is_array)
+	    ptr = *((void **) ptr);
+	  gomp_dynamic_array_fill_rows_1 (descr, ptr, d + 1, row_ptr, count);
+	}
+    }
+  else
+    {
+      **row_ptr = da + descr->dims[d].base;
+      *row_ptr += 1;
+      *count += 1;
+    }
+}
+
+static size_t
+gomp_dynamic_array_fill_rows (struct da_descr_type *descr, void *rows[])
+{
+  size_t count = 0;
+  void **p = rows;
+  gomp_dynamic_array_fill_rows_1 (descr, descr->ptr, 0, &p, &count);
+  return count;
+}
+
+static void *
+gomp_dynamic_array_create_ptrblock (struct da_info *da,
+				    void *tgt_addr, void *tgt_data_rows[])
+{
+  struct da_descr_type *descr = da->descr;
+  void *ptrblock = gomp_malloc (da->ptrblock_size);
+  void **curr_dim_ptrblock = (void **) ptrblock;
+  size_t n = 1;
+
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+    {
+      int curr_dim_len = descr->dims[d].length;
+      int next_dim_len = descr->dims[d + 1].length;
+      int curr_dim_num = curr_dim_len / sizeof (void *);
+
+      void *next_dim_ptrblock
+	= (void *)(curr_dim_ptrblock + n * curr_dim_num);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < curr_dim_num; i++)
+	  {
+	    if (d < descr->ndims - 2)
+	      {
+		void *ptr = (next_dim_ptrblock
+			     + b * curr_dim_num * next_dim_len
+			     + i * next_dim_len);
+		void *tgt_ptr = tgt_addr + (ptr - ptrblock);
+		curr_dim_ptrblock[b * curr_dim_num + i] = tgt_ptr;
+	      }
+	    else
+	      {
+		curr_dim_ptrblock[b * curr_dim_num + i]
+		  = tgt_data_rows[b * curr_dim_num + i];
+	      }
+	    void *addr = &curr_dim_ptrblock[b * curr_dim_num + i];
+	    assert (ptrblock <= addr && addr < ptrblock + da->ptrblock_size);
+	  }
+
+      n *= curr_dim_num;
+      curr_dim_ptrblock = next_dim_ptrblock;
+    }
+  assert (n == da->data_row_num);
+  return ptrblock;
+}
+
 attribute_hidden struct target_mem_desc *
 gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	       void **hostaddrs, void **devaddrs, size_t *sizes, void *kinds,
@@ -386,9 +520,29 @@ 
   const int typemask = short_mapkind ? 0xff : 0x7;
   struct splay_tree_s *mem_map = &devicep->mem_map;
   struct splay_tree_key_s cur_node;
-  struct target_mem_desc *tgt
-    = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
-  tgt->list_count = mapnum;
+  struct target_mem_desc *tgt;
+
+  size_t da_data_row_num = 0, row_start = 0;
+  size_t da_info_num = 0, da_index;
+  struct da_info *da_info = NULL;
+  struct target_var_desc *row_desc;
+  uintptr_t target_row_addr;
+  void **host_data_rows = NULL, **target_data_rows = NULL;
+  void *row;
+
+  for (i = 0; i < mapnum; i++)
+    {
+      int kind = get_kind (short_mapkind, kinds, i);
+      if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	{
+	  da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
+	  da_info_num += 1;
+	}
+    }
+
+  tgt = gomp_malloc (sizeof (*tgt)
+		     + sizeof (tgt->list[0]) * (mapnum + da_data_row_num));
+  tgt->list_count = mapnum + da_data_row_num;
   tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
   tgt->device_descr = devicep;
 
@@ -399,6 +553,14 @@ 
       return tgt;
     }
 
+  if (da_info_num)
+    da_info = gomp_alloca (sizeof (struct da_info) * da_info_num);
+  if (da_data_row_num)
+    {
+      host_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+      target_data_rows = gomp_malloc (sizeof (void *) * da_data_row_num);
+    }
+
   tgt_align = sizeof (void *);
   tgt_size = 0;
   if (pragma_kind == GOMP_MAP_VARS_TARGET)
@@ -416,7 +578,7 @@ 
       return NULL;
     }
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0, da_index = 0; i < mapnum; i++)
     {
       int kind = get_kind (short_mapkind, kinds, i);
       if (hostaddrs[i] == NULL
@@ -482,6 +644,20 @@ 
 	  has_firstprivate = true;
 	  continue;
 	}
+      else if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	{
+	  /* Ignore dynamic arrays for now, we process them together
+	     later.  */
+	  tgt->list[i].key = NULL;
+	  tgt->list[i].offset = 0;
+	  not_found_cnt++;
+
+	  struct da_info *da = &da_info[da_index++];
+	  da->descr = (struct da_descr_type *) hostaddrs[i];
+	  da->map_index = i;
+	  continue;
+	}
+
       cur_node.host_start = (uintptr_t) hostaddrs[i];
       if (!GOMP_MAP_POINTER_P (kind & typemask))
 	cur_node.host_end = cur_node.host_start + sizes[i];
@@ -545,6 +721,55 @@ 
 	}
     }
 
+  /* For dynamic arrays. Each data row is one target item, separated from
+     the normal map clause items, hence we order them after mapnum.  */
+  for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+    {
+      int kind = get_kind (short_mapkind, kinds, i);
+      if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	continue;
+
+      struct da_info *da = &da_info[da_index++];
+      struct da_descr_type *descr = da->descr;
+      size_t nr;
+
+      gomp_dynamic_array_compute_info (da);
+
+      /* We have allocated space in host/target_data_rows to place all the
+	 row data block pointers, now we can start filling them in.  */
+      nr = gomp_dynamic_array_fill_rows (descr, &host_data_rows[row_start]);
+      assert (nr == da->data_row_num);
+
+      size_t align = (size_t) 1 << (kind >> rshift);
+      if (tgt_align < align)
+	tgt_align = align;
+      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+      tgt_size += da->ptrblock_size;
+
+      for (size_t j = 0; j < da->data_row_num; j++)
+	{
+	  row = host_data_rows[row_start + j];
+	  row_desc = &tgt->list[mapnum + row_start + j];
+
+	  cur_node.host_start = (uintptr_t) row;
+	  cur_node.host_end = cur_node.host_start + da->data_row_size;
+	  splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	  if (n)
+	    {
+	      assert (n->refcount != REFCOUNT_LINK);
+	      gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+				      kind & typemask);	      
+	    }
+	  else
+	    {
+	      tgt_size = (tgt_size + align - 1) & ~(align - 1);
+	      tgt_size += da->data_row_size;
+	      not_found_cnt++;
+	    }
+	}
+      row_start += da->data_row_num;
+    }
+
   if (devaddrs)
     {
       if (mapnum != 1)
@@ -675,6 +900,15 @@ 
 	      default:
 		break;
 	      }
+
+	    if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	      {
+		tgt->list[i].key = &array->key;
+		tgt->list[i].key->tgt = tgt;
+		array++;
+		continue;
+	      }
+
 	    splay_tree_key k = &array->key;
 	    k->host_start = (uintptr_t) hostaddrs[i];
 	    if (!GOMP_MAP_POINTER_P (kind & typemask))
@@ -825,8 +1059,110 @@ 
 		array++;
 	      }
 	  }
+
+      /* Processing of dynamic array rows.  */
+      for (i = 0, da_index = 0, row_start = 0; i < mapnum; i++)
+	{
+	  int kind = get_kind (short_mapkind, kinds, i);
+	  if (!GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+	    continue;
+
+	  struct da_info *da = &da_info[da_index++];
+	  assert (da->descr == hostaddrs[i]);
+
+	  /* The map for the dynamic array itself is never copied from during
+	     unmapping, its the data rows that count. Set copy from flags are
+	     set to false here.  */
+	  tgt->list[i].copy_from = false;
+	  tgt->list[i].always_copy_from = false;
+
+	  size_t align = (size_t) 1 << (kind >> rshift);
+	  tgt_size = (tgt_size + align - 1) & ~(align - 1);
+
+	  /* For the map of the dynamic array itself, adjust so that the passed
+	     device address points to the beginning of the ptrblock.  */
+	  tgt->list[i].key->tgt_offset = tgt_size;
+
+	  void *target_ptrblock = (void*) tgt->tgt_start + tgt_size;
+	  tgt_size += da->ptrblock_size;
+
+	  /* Add splay key for each data row in current DA.  */
+	  for (size_t j = 0; j < da->data_row_num; j++)
+	    {
+	      row = host_data_rows[row_start + j];
+	      row_desc = &tgt->list[mapnum + row_start + j];
+
+	      cur_node.host_start = (uintptr_t) row;
+	      cur_node.host_end = cur_node.host_start + da->data_row_size;
+	      splay_tree_key n = splay_tree_lookup (mem_map, &cur_node);
+	      if (n)
+		{
+		  assert (n->refcount != REFCOUNT_LINK);
+		  gomp_map_vars_existing (devicep, n, &cur_node, row_desc,
+					  kind & typemask);
+		  target_row_addr = n->tgt->tgt_start + n->tgt_offset;
+		}
+	      else
+		{
+		  tgt->refcount++;
+
+		  splay_tree_key k = &array->key;
+		  k->host_start = (uintptr_t) row;
+		  k->host_end = k->host_start + da->data_row_size;
+
+		  k->tgt = tgt;
+		  k->refcount = 1;
+		  k->link_key = NULL;
+		  tgt_size = (tgt_size + align - 1) & ~(align - 1);
+		  target_row_addr = tgt->tgt_start + tgt_size;
+		  k->tgt_offset = tgt_size;
+		  tgt_size += da->data_row_size;
+
+		  row_desc->key = k;
+		  row_desc->copy_from
+		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		  row_desc->always_copy_from
+		    = GOMP_MAP_COPY_FROM_P (kind & typemask);
+		  row_desc->offset = 0;
+		  row_desc->length = da->data_row_size;
+
+		  array->left = NULL;
+		  array->right = NULL;
+		  splay_tree_insert (mem_map, array);
+
+		  if (GOMP_MAP_COPY_TO_P (kind & typemask))
+		    gomp_copy_host2dev (devicep,
+					(void *) tgt->tgt_start + k->tgt_offset,
+					(void *) k->host_start,
+					da->data_row_size);
+		  array++;
+		}
+	      target_data_rows[row_start + j] = (void *) target_row_addr;
+	    }
+
+	  /* Now we have the target memory allocated, and target offsets of all
+	     row blocks assigned and calculated, we can construct the
+	     accelerator side ptrblock and copy it in.  */
+	  if (da->ptrblock_size)
+	    {
+	      void *ptrblock = gomp_dynamic_array_create_ptrblock
+		(da, target_ptrblock, target_data_rows + row_start);
+	      gomp_copy_host2dev (devicep, target_ptrblock, ptrblock,
+				  da->ptrblock_size);
+	      free (ptrblock);
+	    }
+
+	  row_start += da->data_row_num;
+	}
+      assert (row_start == da_data_row_num && da_index == da_info_num);
     }
 
+  if (da_data_row_num)
+    {
+      free (host_data_rows);
+      free (target_data_rows);
+    }
+
   if (pragma_kind == GOMP_MAP_VARS_TARGET)
     {
       for (i = 0; i < mapnum; i++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/da-3.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/da-3.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/da-3.c	(revision 244259)
@@ -0,0 +1,45 @@ 
+/* { dg-do run { target { ! openacc_host_selected } } } */
+
+#include <assert.h>
+#include "da-utils.h"
+
+int main (void)
+{
+  int n = 20, x = 5, y = 12;
+  int *****a = (int *****) create_da (sizeof (int), n, 5);
+
+  int sum1 = 0, sum2 = 0, sum3 = 0;
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	for (int l = 0; l < n; l++)
+	  for (int m = 0; m < n; m++)
+	    {
+	      a[i][j][k][l][m] = 1;
+	      sum1++;
+	    }
+
+  #pragma acc parallel copy (a[x:y][x:y][x:y][x:y][x:y]) copy(sum2)
+  {
+    for (int i = x; i < x + y; i++)
+      for (int j = x; j < x + y; j++)
+	for (int k = x; k < x + y; k++)
+	  for (int l = x; l < x + y; l++)
+	    for (int m = x; m < x + y; m++)
+	      {
+		a[i][j][k][l][m] = 0;
+		sum2++;
+	      }
+  }
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	for (int l = 0; l < n; l++)
+	  for (int m = 0; m < n; m++)
+	    sum3 += a[i][j][k][l][m];
+
+  assert (sum1 == sum2 + sum3);
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/da-4.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/da-4.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/da-4.c	(revision 244259)
@@ -0,0 +1,36 @@ 
+/* { dg-do run { target { ! openacc_host_selected } } } */
+
+#include <assert.h>
+#include "da-utils.h"
+
+int main (void)
+{
+  int n = 128;
+  double ***a = (double ***) create_da (sizeof (double), n, 3);
+  double ***b = (double ***) create_da (sizeof (double), n, 3);
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	a[i][j][k] = i + j + k + i * j * k;
+
+  /* This test exercises async copyout of dynamic array rows.  */
+  #pragma acc parallel copyin(a[0:n][0:n][0:n]) copyout(b[0:n][0:n][0:n]) async(5)
+  {
+    #pragma acc loop gang
+    for (int i = 0; i < n; i++)
+      #pragma acc loop vector
+      for (int j = 0; j < n; j++)
+	for (int k = 0; k < n; k++)
+	  b[i][j][k] = a[i][j][k] * 2.0;
+  }
+
+  #pragma acc wait (5)
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	assert (b[i][j][k] == a[i][j][k] * 2.0);
+
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/da-utils.h
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/da-utils.h	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/da-utils.h	(revision 244259)
@@ -0,0 +1,44 @@ 
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <stdint.h>
+
+/* Allocate and create a pointer based NDIMS-dimensional array,
+   each dimension DIMLEN long, with ELSIZE sized data elements.  */
+void *
+create_da (size_t elsize, int dimlen, int ndims)
+{
+  size_t blk_size = 0;
+  size_t n = 1;
+
+  for (int i = 0; i < ndims - 1; i++)
+    {
+      n *= dimlen;
+      blk_size += sizeof (void *) * n;
+    }
+  size_t data_rows_num = n;
+  size_t data_rows_offset = blk_size;
+  blk_size += elsize * n * dimlen;
+
+  void *blk = (void *) malloc (blk_size);
+  memset (blk, 0, blk_size);
+  void **curr_dim = (void **) blk;
+  n = 1;
+
+  for (int d = 0; d < ndims - 1; d++)
+    {
+      uintptr_t next_dim = (uintptr_t) (curr_dim + n * dimlen);
+      size_t next_dimlen = dimlen * (d < ndims - 2 ? sizeof (void *) : elsize);
+
+      for (int b = 0; b < n; b++)
+        for (int i = 0; i < dimlen; i++)
+	  if (d < ndims - 1)
+	    curr_dim[b * dimlen + i]
+	      = (void*) (next_dim + b * dimlen * next_dimlen + i * next_dimlen);
+
+      n *= dimlen;
+      curr_dim = (void**) next_dim;
+    }
+  assert (n == data_rows_num);
+  return blk;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/da-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/da-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/da-1.c	(revision 244259)
@@ -0,0 +1,103 @@ 
+/* { dg-do run { target { ! openacc_host_selected } } } */
+
+#include <stdlib.h>
+#include <assert.h>
+
+#define n 100
+#define m 100
+
+int b[n][m];
+
+void
+test1 (void)
+{
+  int i, j, *a[100];
+
+  /* Array of pointers form test.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = (int *)malloc (sizeof (int) * m);
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    {
+      for (j = 0; j < m; j++)
+	assert (a[i][j] == b[i][j]);
+      /* Clean up.  */
+      free (a[i]);
+    }
+}
+
+void
+test2 (void)
+{
+  int i, j, **a = (int **) malloc (sizeof (int *) * n);
+
+  /* Separately allocated blocks.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = (int *)malloc (sizeof (int) * m);
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    {
+      for (j = 0; j < m; j++)
+	assert (a[i][j] == b[i][j]);
+      /* Clean up.  */
+      free (a[i]);
+    }
+  free (a);
+}
+
+void
+test3 (void)
+{
+  int i, j, **a = (int **) malloc (sizeof (int *) * n);
+  a[0] = (int *) malloc (sizeof (int) * n * m);
+
+  /* Rows allocated in one contiguous block.  */
+  for (i = 0; i < n; i++)
+    {
+      a[i] = *a + i * m;
+      for (j = 0; j < m; j++)
+	b[i][j] = j - i;
+    }
+
+  #pragma acc parallel loop copyout(a[0:n][0:m]) copyin(b)
+  for (i = 0; i < n; i++)
+    #pragma acc loop
+    for (j = 0; j < m; j++)
+      a[i][j] = b[i][j];
+
+  for (i = 0; i < n; i++)
+    for (j = 0; j < m; j++)
+      assert (a[i][j] == b[i][j]);
+
+  free (a[0]);
+  free (a);
+}
+
+int
+main (void)
+{
+  test1 ();
+  test2 ();
+  test3 ();
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/da-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/da-2.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/da-2.c	(revision 244259)
@@ -0,0 +1,37 @@ 
+/* { dg-do run { target { ! openacc_host_selected } } } */
+
+#include <assert.h>
+#include "da-utils.h"
+
+int
+main (void)
+{
+  int n = 10;
+  int ***a = (int ***) create_da (sizeof (int), n, 3);
+  int ***b = (int ***) create_da (sizeof (int), n, 3);
+  int ***c = (int ***) create_da (sizeof (int), n, 3);
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	{
+	  a[i][j][k] = i + j * k + k;
+	  b[i][j][k] = j + k * i + i * j;
+	  c[i][j][k] = a[i][j][k];
+	}
+
+  #pragma acc parallel copy (a[0:n][0:n][0:n]) copyin (b[0:n][0:n][0:n])
+  {
+    for (int i = 0; i < n; i++)
+      for (int j = 0; j < n; j++)
+	for (int k = 0; k < n; k++)
+	  a[i][j][k] += b[k][j][i] + i + j + k;
+  }
+
+  for (int i = 0; i < n; i++)
+    for (int j = 0; j < n; j++)
+      for (int k = 0; k < n; k++)
+	assert (a[i][j][k] == c[i][j][k] + b[k][j][i] + i + j + k);
+
+  return 0;
+}