svn commit: r236962 - in head/contrib/gcc: . config/i386 doc
Pedro F. Giffuni
pfg at FreeBSD.org
Tue Jun 12 15:04:19 UTC 2012
Author: pfg
Date: Tue Jun 12 15:04:18 2012
New Revision: 236962
URL: http://svn.freebsd.org/changeset/base/236962
Log:
Add experimental support for amdfam10/barcelona from the GCC 4.3 branch.
Initial support for the AMD barcelona chipsets has been available in the
gcc43 branch under GPLv2 but was not included when the Core 2 support
was brought to the system gcc.
AMD and some linux distributions (OpenSUSE) did a backport of the amdfam10
support and made them available. Unfortunately this is still experimental
and while it can improve performance, enabling the CPUTYPE may break some
C++ ports (like clang).
Special care was taken to make sure that the patches predate the GPLv3
switch upstream.
Tested by: Vladimir Kushnir
Reviewed by: mm
Approved by: jhb (mentor)
MFC after: 2 weeks
Added:
head/contrib/gcc/config/i386/ammintrin.h (contents, props changed)
Modified:
head/contrib/gcc/ChangeLog.gcc43
head/contrib/gcc/builtins.c
head/contrib/gcc/config.gcc
head/contrib/gcc/config/i386/athlon.md
head/contrib/gcc/config/i386/driver-i386.c
head/contrib/gcc/config/i386/i386.c
head/contrib/gcc/config/i386/i386.h
head/contrib/gcc/config/i386/i386.md
head/contrib/gcc/config/i386/i386.opt
head/contrib/gcc/config/i386/mmx.md
head/contrib/gcc/config/i386/pmmintrin.h
head/contrib/gcc/config/i386/sse.md
head/contrib/gcc/config/i386/tmmintrin.h
head/contrib/gcc/doc/extend.texi
head/contrib/gcc/doc/invoke.texi
head/contrib/gcc/fold-const.c
head/contrib/gcc/gimplify.c
head/contrib/gcc/tree-ssa-ccp.c
head/contrib/gcc/tree-ssa-pre.c
Modified: head/contrib/gcc/ChangeLog.gcc43
==============================================================================
--- head/contrib/gcc/ChangeLog.gcc43 Tue Jun 12 14:56:08 2012 (r236961)
+++ head/contrib/gcc/ChangeLog.gcc43 Tue Jun 12 15:04:18 2012 (r236962)
@@ -1,3 +1,8 @@
+2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal at amd.com>
+
+ * doc/invoke.texi: Fix typo, 'AMD Family 10h core' instead of
+ 'AMD Family 10 core'.
+
2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal at amd.com> (r124339)
* config/i386/i386.c (override_options): Accept k8-sse3, opteron-sse3
@@ -5,10 +10,39 @@
with SSE3 instruction set support.
* doc/invoke.texi: Likewise.
+2007-05-01 Dwarakanath Rajagopal <dwarak.rajagopal at amd.com>
+
+ * config/i386/i386.c (override_options): Tuning 32-byte loop
+ alignment for amdfam10 architecture. Increasing the max loop
+ alignment to 24 bytes.
+
+2007-04-12 Richard Guenther <rguenther at suse.de>
+
+ PR tree-optimization/24689
+ PR tree-optimization/31307
+ * fold-const.c (operand_equal_p): Compare INTEGER_CST array
+ indices by value.
+ * gimplify.c (canonicalize_addr_expr): To be consistent with
+ gimplify_compound_lval only set operands two and three of
+ ARRAY_REFs if they are not gimple_min_invariant. This makes
+ it never at this place.
+ * tree-ssa-ccp.c (maybe_fold_offset_to_array_ref): Likewise.
+
2007-04-07 H.J. Lu <hongjiu.lu at intel.com> (r123639)
* config/i386/i386.c (ix86_handle_option): Handle SSSE3.
+2007-03-28 Dwarakanath Rajagopal <dwarak.rajagopal at amd.com>
+
+ * config.gcc: Accept barcelona as a variant of amdfam10.
+ * config/i386/i386.c (override_options): Likewise.
+ * doc/invoke.texi: Likewise.
+
+2007-02-09 Dwarakanath Rajagopal <dwarak.rajagopal at amd.com>
+
+ * config/i386/driver-i386.c: Turn on -mtune=native for AMDFAM10.
+ (bit_SSE4a): New.
+
2007-02-08 Harsha Jagasia <harsha.jagasia at amd.com> (r121726)
* config/i386/xmmintrin.h: Make inclusion of emmintrin.h
@@ -26,6 +60,173 @@
* config/i386/i386.c (override_options): Set PTA_SSSE3 for core2.
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (athlon_fldxf_k8, athlon_fld_k8,
+ athlon_fstxf_k8, athlon_fst_k8, athlon_fist, athlon_fmov,
+ athlon_fadd_load, athlon_fadd_load_k8, athlon_fadd, athlon_fmul,
+ athlon_fmul_load, athlon_fmul_load_k8, athlon_fsgn,
+ athlon_fdiv_load, athlon_fdiv_load_k8, athlon_fdiv_k8,
+ athlon_fpspc_load, athlon_fpspc, athlon_fcmov_load,
+ athlon_fcmov_load_k8, athlon_fcmov_k8, athlon_fcomi_load_k8,
+ athlon_fcomi, athlon_fcom_load_k8, athlon_fcom): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/i386.md (x86_sahf_1, cmpfp_i_mixed, cmpfp_i_sse,
+ cmpfp_i_i387, cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387,
+ swapsi, swaphi_1, swapqi_1, swapdi_rex64, fix_truncsfdi_sse,
+ fix_truncdfdi_sse, fix_truncsfsi_sse, fix_truncdfsi_sse,
+ x86_fldcw_1, floatsisf2_mixed, floatsisf2_sse, floatdisf2_mixed,
+ floatdisf2_sse, floatsidf2_mixed, floatsidf2_sse,
+ floatdidf2_mixed, floatdidf2_sse, muldi3_1_rex64, mulsi3_1,
+ mulsi3_1_zext, mulhi3_1, mulqi3_1, umulqihi3_1, mulqihi3_insn,
+ umulditi3_insn, umulsidi3_insn, mulditi3_insn, mulsidi3_insn,
+ umuldi3_highpart_rex64, umulsi3_highpart_insn,
+ umulsi3_highpart_zext, smuldi3_highpart_rex64,
+ smulsi3_highpart_insn, smulsi3_highpart_zext, x86_64_shld,
+ x86_shld_1, x86_64_shrd, sqrtsf2_mixed, sqrtsf2_sse,
+ sqrtsf2_i387, sqrtdf2_mixed, sqrtdf2_sse, sqrtdf2_i387,
+ sqrtextendsfdf2_i387, sqrtxf2, sqrtextendsfxf2_i387,
+ sqrtextenddfxf2_i387): Added amdfam10_decode.
+
+ * config/i386/athlon.md (athlon_idirect_amdfam10,
+ athlon_ivector_amdfam10, athlon_idirect_load_amdfam10,
+ athlon_ivector_load_amdfam10, athlon_idirect_both_amdfam10,
+ athlon_ivector_both_amdfam10, athlon_idirect_store_amdfam10,
+ athlon_ivector_store_amdfam10): New define_insn_reservation.
+ (athlon_idirect_loadmov, athlon_idirect_movstore): Added
+ amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (athlon_call_amdfam10,
+ athlon_pop_amdfam10, athlon_lea_amdfam10): New
+ define_insn_reservation.
+ (athlon_branch, athlon_push, athlon_leave_k8, athlon_imul_k8,
+ athlon_imul_k8_DI, athlon_imul_mem_k8, athlon_imul_mem_k8_DI,
+ athlon_idiv, athlon_idiv_mem, athlon_str): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (athlon_sseld_amdfam10,
+ athlon_mmxld_amdfam10, athlon_ssest_amdfam10,
+ athlon_mmxssest_short_amdfam10): New define_insn_reservation.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (athlon_sseins_amdfam10): New
+ define_insn_reservation.
+ * config/i386/i386.md (sseins): Added sseins to define_attr type
+ and define_attr unit.
+ * config/i386/sse.md: Set type attribute to sseins for insertq
+ and insertqi.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (sselog_load_amdfam10, sselog_amdfam10,
+ ssecmpvector_load_amdfam10, ssecmpvector_amdfam10,
+ ssecomi_load_amdfam10, ssecomi_amdfam10,
+ sseaddvector_load_amdfam10, sseaddvector_amdfam10): New
+ define_insn_reservation.
+ (ssecmp_load_k8, ssecmp, sseadd_load_k8, seadd): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (cvtss2sd_load_amdfam10,
+ cvtss2sd_amdfam10, cvtps2pd_load_amdfam10, cvtps2pd_amdfam10,
+ cvtsi2sd_load_amdfam10, cvtsi2ss_load_amdfam10,
+ cvtsi2sd_amdfam10, cvtsi2ss_amdfam10, cvtsd2ss_load_amdfam10,
+ cvtsd2ss_amdfam10, cvtpd2ps_load_amdfam10, cvtpd2ps_amdfam10,
+ cvtsX2si_load_amdfam10, cvtsX2si_amdfam10): New
+ define_insn_reservation.
+
+ * config/i386/sse.md (cvtsi2ss, cvtsi2ssq, cvtss2si,
+ cvtss2siq, cvttss2si, cvttss2siq, cvtsi2sd, cvtsi2sdq,
+ cvtsd2si, cvtsd2siq, cvttsd2si, cvttsd2siq,
+ cvtpd2dq, cvttpd2dq, cvtsd2ss, cvtss2sd,
+ cvtpd2ps, cvtps2pd): Added amdfam10_decode attribute.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/athlon.md (athlon_ssedivvector_amdfam10,
+ athlon_ssedivvector_load_amdfam10, athlon_ssemulvector_amdfam10,
+ athlon_ssemulvector_load_amdfam10): New define_insn_reservation.
+ (athlon_ssediv, athlon_ssediv_load_k8, athlon_ssemul,
+ athlon_ssemul_load_k8): Added amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/i386.h (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL): New macro.
+ (x86_sse_unaligned_move_optimal): New variable.
+
+ * config/i386/i386.c (x86_sse_unaligned_move_optimal): Enable for
+ m_AMDFAM10.
+ (ix86_expand_vector_move_misalign): Add code to generate movupd/movups
+ for unaligned vector SSE double/single precision loads for AMDFAM10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/i386.h (TARGET_AMDFAM10): New macro.
+ (TARGET_CPU_CPP_BUILTINS): Add code for amdfam10.
+ Define TARGET_CPU_DEFAULT_amdfam10.
+ (TARGET_CPU_DEFAULT_NAMES): Add amdfam10.
+ (processor_type): Add PROCESSOR_AMDFAM10.
+
+ * config/i386/i386.md: Add amdfam10 as a new cpu attribute to match
+ processor_type in config/i386/i386.h.
+ Enable imul peepholes for TARGET_AMDFAM10.
+
+ * config.gcc: Add support for --with-cpu option for amdfam10.
+
+ * config/i386/i386.c (amdfam10_cost): New variable.
+ (m_AMDFAM10): New macro.
+ (m_ATHLON_K8_AMDFAM10): New macro.
+ (x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen,
+ x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop,
+ x86_promote_QImode, x86_integer_DFmode_moves,
+ x86_partial_reg_dependency, x86_memory_mismatch_stall,
+ x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387,
+ x86_sse_partial_reg_dependency, x86_sse_typeless_stores,
+ x86_use_ffreep, x86_use_incdec, x86_four_jump_limit,
+ x86_schedule, x86_use_bt, x86_cmpxchg16b, x86_pad_returns):
+ Enable/disable for amdfam10.
+ (override_options): Add amdfam10_cost to processor_target_table.
+ Set up PROCESSOR_AMDFAM10 for amdfam10 entry in
+ processor_alias_table.
+ (ix86_issue_rate): Add PROCESSOR_AMDFAM10.
+ (ix86_adjust_cost): Add code for amdfam10.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia at amd.com>
+
+ * config/i386/i386.opt: Add new Advanced Bit Manipulation (-mabm)
+ instruction set feature flag. Add new (-mpopcnt) flag for popcnt
+ instruction. Add new SSE4A (-msse4a) instruction set feature flag.
+ * config/i386/i386.h: Add builtin definition for SSE4A.
+ * config/i386/i386.md: Add support for ABM instructions
+ (popcnt and lzcnt).
+ * config/i386/sse.md: Add support for SSE4A instructions
+ (movntss, movntsd, extrq, insertq).
+ * config/i386/i386.c: Add support for ABM and SSE4A builtins.
+ Add -march=amdfam10 flag.
+ * config/i386/ammintrin.h: Add support for SSE4A intrinsics.
+ * doc/invoke.texi: Add documentation on flags for sse4a, abm, popcnt
+ and amdfam10.
+ * doc/extend.texi: Add documentation for SSE4A builtins.
+
+2007-01-24 Jakub Jelinek <jakub at redhat.com>
+
+ * config/i386/i386.h (x86_cmpxchg16b): Remove const.
+ (TARGET_CMPXCHG16B): Define to x86_cmpxchg16b.
+ * config/i386/i386.c (x86_cmpxchg16b): Remove const.
+ (override_options): Add PTA_CX16 flag. Set x86_cmpxchg16b
+ for CPUs that have PTA_CX16 set.
+
+2007-01-18 Michael Meissner <michael.meissner at amd.com>
+
+ * config/i386/i386.c (ix86_compute_frame_layout): Make fprintf's
+ in #if 0 code type correct.
+
2007-01-17 Eric Christopher <echristo at apple.com> (r120846)
* config.gcc: Support core2 processor.
@@ -62,7 +263,30 @@
x86_pad_returns): Add m_CORE2.
(override_options): Add entries for Core2.
(ix86_issue_rate): Add case for Core2.
+
+2006-10-28 Uros Bizjak <uros at kss-loka.si>
+
+ * config/i386/i386.h (GENERAL_REGNO_P): Use STACK_POINTER_REGNUM.
+ (NON_QI_REG_P): Use IN_RANGE.
+ (REX_INT_REGNO_P): Use IN_RANGE.
+ (FP_REGNO_P): Use IN_RANGE.
+ (SSE_REGNO_P): Use IN_RANGE.
+ (REX_SSE_REGNO_P): Use IN_RANGE.
+ (MMX_REGNO_P): Use IN_RANGE.
+ (STACK_REGNO_P): New macro.
+ (STACK_REG_P): Use STACK_REGNO_P.
+ (NON_STACK_REG_P): Use STACK_REGNO_P.
+ (REGNO_OK_FOR_INDEX_P): Use REX_INT_REGNO_P.
+ (REGNO_OK_FOR_BASE_P): Use GENERAL_REGNO_P.
+ (REG_OK_FOR_INDEX_NONSTRICT_P): Use REX_INT_REGNO_P.
+ (REG_OK_FOR_BASE_NONSTRICT_P): Use GENERAL_REGNO_P.
+ (HARD_REGNO_RENAME_OK): Use !IN_RANGE.
+2006-10-28 Uros Bizjak <uros at kss-loka.si>
+
+ * config/i386/i386.c (output_387_ffreep): Create output from a
+ template string for !HAVE_AS_IX86_FFREEP.
+
2006-10-27 Vladimir Makarov <vmakarov at redhat.com> (r118090)
* config/i386/i386.h (TARGET_GEODE):
@@ -95,7 +319,31 @@
* config/i386/geode.md: New file.
* doc/invoke.texi: Add entry about geode processor.
-
+
+2006-10-24 Uros Bizjak <uros at kss-loka.si>
+
+ * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Define to 54.
+ (FIXED_REGISTERS, CALL_USED_REGISTERS): Add fpcr register.
+ (REG_ALLOC_ORDER): Add one element to allocate fpcr register.
+ (FRAME_POINTER_REGNUM): Update register number to 21.
+ (REG_CLASS_CONTENTS): Update contents for added fpcr register.
+ (HI_REGISTER_NAMES): Add "fpcr" for fpcr register.
+
+ * config/i386/i386.c (regclass_map): Add fpcr entry.
+ (dbx_register_map, dbx64_register_map, svr4_dbx_register_map):
+ Add fpcr entry.
+ (print_reg): Assert REGNO (x) != FPCR_REG.
+
+ * config/i386/i386.md (FPCR_REG, R11_REG): New constants.
+ (DIRFLAG_REG): Renumber.
+ (x86_fnstcw_1, x86_fldcw_1): Use FPCR_REG instead of FPSR_REG.
+ (*sibcall_1_rex64_v, *sibcall_value_1_rex64_v): Use R11_REG.
+ (sse_prologue_save, *sse_prologue_save_insn): Renumber
+ hardcoded SSE register numbers.
+
+ * config/i386/mmx.md (mmx_emms, mmx_femms): Renumber
+ hardcoded MMX register numbers.
+
2006-10-24 Richard Guenther <rguenther at suse.de>
PR middle-end/28796
@@ -104,6 +352,17 @@
for deciding optimizations in consistency with fold-const.c
(fold_builtin_unordered_cmp): Likewise.
+2006-10-24 Richard Guenther <rguenther at suse.de>
+
+ * builtins.c (fold_builtin_floor): Fold floor (x) where
+ x is nonnegative to trunc (x).
+ (fold_builtin_int_roundingfn): Fold lfloor (x) where x is
+ nonnegative to FIX_TRUNC_EXPR.
+
+2006-10-22 H.J. Lu <hongjiu.lu at intel.com>
+
+ * config/i386/tmmintrin.h: Remove the duplicated content.
+
2006-10-22 H.J. Lu <hongjiu.lu at intel.com> (r117958)
* config.gcc (i[34567]86-*-*): Add tmmintrin.h to extra_headers.
@@ -170,6 +429,18 @@
* doc/invoke.texi: Document -mssse3/-mno-ssse3 switches.
+2006-10-21 H.J. Lu <hongjiu.lu at intel.com>
+
+ * config/i386/i386.md (UNSPEC_LDQQU): Renamed to ...
+ (UNSPEC_LDDQU): This.
+ * config/i386/sse.md (sse3_lddqu): Updated.
+
+2006-10-21 Richard Guenther <rguenther at suse.de>
+
+ PR tree-optimization/3511
+ * tree-ssa-pre.c (phi_translate): Fold CALL_EXPRs that
+ got new invariant arguments during PHI translation.
+
2006-10-21 Richard Guenther <rguenther at suse.de>
* builtins.c (fold_builtin_classify): Fix typo.
Modified: head/contrib/gcc/builtins.c
==============================================================================
--- head/contrib/gcc/builtins.c Tue Jun 12 14:56:08 2012 (r236961)
+++ head/contrib/gcc/builtins.c Tue Jun 12 15:04:18 2012 (r236962)
@@ -7355,6 +7355,12 @@ fold_builtin_ceil (tree fndecl, tree arg
}
}
+ /* Fold floor (x) where x is nonnegative to trunc (x). */
+ if (tree_expr_nonnegative_p (arg))
+ return build_function_call_expr (mathfn_built_in (TREE_TYPE (arg),
+ BUILT_IN_TRUNC),
+ arglist);
+
return fold_trunc_transparent_mathfn (fndecl, arglist);
}
@@ -7442,6 +7448,18 @@ fold_builtin_int_roundingfn (tree fndecl
}
}
+ switch (DECL_FUNCTION_CODE (fndecl))
+ {
+ CASE_FLT_FN (BUILT_IN_LFLOOR):
+ CASE_FLT_FN (BUILT_IN_LLFLOOR):
+ /* Fold lfloor (x) where x is nonnegative to FIX_TRUNC (x). */
+ if (tree_expr_nonnegative_p (arg))
+ return fold_build1 (FIX_TRUNC_EXPR, TREE_TYPE (TREE_TYPE (fndecl)),
+ arg);
+ break;
+ default:;
+ }
+
return fold_fixed_mathfn (fndecl, arglist);
}
Modified: head/contrib/gcc/config.gcc
==============================================================================
--- head/contrib/gcc/config.gcc Tue Jun 12 14:56:08 2012 (r236961)
+++ head/contrib/gcc/config.gcc Tue Jun 12 15:04:18 2012 (r236962)
@@ -269,12 +269,12 @@ xscale-*-*)
i[34567]86-*-*)
cpu_type=i386
extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
- pmmintrin.h tmmintrin.h"
+ pmmintrin.h tmmintrin.h ammintrin.h"
;;
x86_64-*-*)
cpu_type=i386
extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
- pmmintrin.h tmmintrin.h"
+ pmmintrin.h tmmintrin.h ammintrin.h"
need_64bit_hwint=yes
;;
ia64-*-*)
@@ -1209,14 +1209,14 @@ i[34567]86-*-solaris2*)
# FIXME: -m64 for i[34567]86-*-* should be allowed just
# like -m32 for x86_64-*-*.
case X"${with_cpu}" in
- Xgeneric|Xcore2|Xnocona|Xx86-64|Xk8|Xopteron|Xathlon64|Xathlon-fx)
+ Xgeneric|Xcore2|Xnocona|Xx86-64|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx)
;;
X)
with_cpu=generic
;;
*)
echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
- echo "generic core2 nocona x86-64 k8 opteron athlon64 athlon-fx" 1>&2
+ echo "generic core2 nocona x86-64amd fam10 barcelona k8 opteron athlon64 athlon-fx" 1>&2
exit 1
;;
esac
@@ -2515,6 +2515,9 @@ if test x$with_cpu = x ; then
;;
i686-*-* | i786-*-*)
case ${target_noncanonical} in
+ amdfam10-*|barcelona-*)
+ with_cpu=amdfam10
+ ;;
k8-*|opteron-*|athlon_64-*)
with_cpu=k8
;;
@@ -2555,6 +2558,9 @@ if test x$with_cpu = x ; then
;;
x86_64-*-*)
case ${target_noncanonical} in
+ amdfam10-*|barcelona-*)
+ with_cpu=amdfam10
+ ;;
k8-*|opteron-*|athlon_64-*)
with_cpu=k8
;;
@@ -2795,7 +2801,7 @@ case "${target}" in
esac
# OK
;;
- "" | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic)
+ "" | amdfam10 | barcelona | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic)
# OK
;;
*)
Added: head/contrib/gcc/config/i386/ammintrin.h
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ head/contrib/gcc/config/i386/ammintrin.h Tue Jun 12 15:04:18 2012 (r236962)
@@ -0,0 +1,73 @@
+/* Copyright (C) 2007 Free Software Foundation, Inc.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2, or (at your option)
+ any later version.
+
+ GCC is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING. If not, write to
+ the Free Software Foundation, 51 Franklin Street, Fifth Floor,
+ Boston, MA 02110-1301, USA. */
+
+/* As a special exception, if you include this header file into source
+ files compiled by GCC, this header file does not by itself cause
+ the resulting executable to be covered by the GNU General Public
+ License. This exception does not however invalidate any other
+ reasons why the executable file might be covered by the GNU General
+ Public License. */
+
+/* Implemented from the specification included in the AMD Programmers
+ Manual Update, version 2.x */
+
+#ifndef _AMMINTRIN_H_INCLUDED
+#define _AMMINTRIN_H_INCLUDED
+
+#ifndef __SSE4A__
+# error "SSE4A instruction set not enabled"
+#else
+
+/* We need definitions from the SSE3, SSE2 and SSE header files*/
+#include <pmmintrin.h>
+
+static __inline void __attribute__((__always_inline__))
+_mm_stream_sd (double * __P, __m128d __Y)
+{
+ __builtin_ia32_movntsd (__P, (__v2df) __Y);
+}
+
+static __inline void __attribute__((__always_inline__))
+_mm_stream_ss (float * __P, __m128 __Y)
+{
+ __builtin_ia32_movntss (__P, (__v4sf) __Y);
+}
+
+static __inline __m128i __attribute__((__always_inline__))
+_mm_extract_si64 (__m128i __X, __m128i __Y)
+{
+ return (__m128i) __builtin_ia32_extrq ((__v2di) __X, (__v16qi) __Y);
+}
+
+#define _mm_extracti_si64(X, I, L) \
+((__m128i) __builtin_ia32_extrqi ((__v2di)(X), I, L))
+
+static __inline __m128i __attribute__((__always_inline__))
+_mm_insert_si64 (__m128i __X,__m128i __Y)
+{
+ return (__m128i) __builtin_ia32_insertq ((__v2di)__X, (__v2di)__Y);
+}
+
+#define _mm_inserti_si64(X, Y, I, L) \
+((__m128i) __builtin_ia32_insertqi ((__v2di)(X), (__v2di)(Y), I, L))
+
+
+#endif /* __SSE4A__ */
+
+#endif /* _AMMINTRIN_H_INCLUDED */
Modified: head/contrib/gcc/config/i386/athlon.md
==============================================================================
--- head/contrib/gcc/config/i386/athlon.md Tue Jun 12 14:56:08 2012 (r236961)
+++ head/contrib/gcc/config/i386/athlon.md Tue Jun 12 15:04:18 2012 (r236962)
@@ -29,6 +29,8 @@
(const_string "vector")]
(const_string "direct")))
+(define_attr "amdfam10_decode" "direct,vector,double"
+ (const_string "direct"))
;;
;; decode0 decode1 decode2
;; \ | /
@@ -131,18 +133,22 @@
;; Jump instructions are executed in the branch unit completely transparent to us
(define_insn_reservation "athlon_branch" 0
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "ibr"))
"athlon-direct,athlon-ieu")
(define_insn_reservation "athlon_call" 0
(and (eq_attr "cpu" "athlon,k8,generic64")
(eq_attr "type" "call,callv"))
"athlon-vector,athlon-ieu")
+(define_insn_reservation "athlon_call_amdfam10" 0
+ (and (eq_attr "cpu" "amdfam10")
+ (eq_attr "type" "call,callv"))
+ "athlon-double,athlon-ieu")
;; Latency of push operation is 3 cycles, but ESP value is available
;; earlier
(define_insn_reservation "athlon_push" 2
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "push"))
"athlon-direct,athlon-agu,athlon-store")
(define_insn_reservation "athlon_pop" 4
@@ -153,12 +159,16 @@
(and (eq_attr "cpu" "k8,generic64")
(eq_attr "type" "pop"))
"athlon-double,(athlon-ieu+athlon-load)")
+(define_insn_reservation "athlon_pop_amdfam10" 3
+ (and (eq_attr "cpu" "amdfam10")
+ (eq_attr "type" "pop"))
+ "athlon-direct,(athlon-ieu+athlon-load)")
(define_insn_reservation "athlon_leave" 3
(and (eq_attr "cpu" "athlon")
(eq_attr "type" "leave"))
"athlon-vector,(athlon-ieu+athlon-load)")
(define_insn_reservation "athlon_leave_k8" 3
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(eq_attr "type" "leave"))
"athlon-double,(athlon-ieu+athlon-load)")
@@ -167,6 +177,11 @@
(and (eq_attr "cpu" "athlon,k8,generic64")
(eq_attr "type" "lea"))
"athlon-direct,athlon-agu,nothing")
+;; Lea executes in AGU unit with 1 cycle latency on AMDFAM10
+(define_insn_reservation "athlon_lea_amdfam10" 1
+ (and (eq_attr "cpu" "amdfam10")
+ (eq_attr "type" "lea"))
+ "athlon-direct,athlon-agu,nothing")
;; Mul executes in special multiplier unit attached to IEU0
(define_insn_reservation "athlon_imul" 5
@@ -176,29 +191,35 @@
"athlon-vector,athlon-ieu0,athlon-mult,nothing,nothing,athlon-ieu0")
;; ??? Widening multiply is vector or double.
(define_insn_reservation "athlon_imul_k8_DI" 4
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "imul")
(and (eq_attr "mode" "DI")
(eq_attr "memory" "none,unknown"))))
"athlon-direct0,athlon-ieu0,athlon-mult,nothing,athlon-ieu0")
(define_insn_reservation "athlon_imul_k8" 3
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "imul")
(eq_attr "memory" "none,unknown")))
"athlon-direct0,athlon-ieu0,athlon-mult,athlon-ieu0")
+(define_insn_reservation "athlon_imul_amdfam10_HI" 4
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "imul")
+ (and (eq_attr "mode" "HI")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-vector,athlon-ieu0,athlon-mult,nothing,athlon-ieu0")
(define_insn_reservation "athlon_imul_mem" 8
(and (eq_attr "cpu" "athlon")
(and (eq_attr "type" "imul")
(eq_attr "memory" "load,both")))
"athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,nothing,athlon-ieu")
(define_insn_reservation "athlon_imul_mem_k8_DI" 7
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "imul")
(and (eq_attr "mode" "DI")
(eq_attr "memory" "load,both"))))
"athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,athlon-ieu")
(define_insn_reservation "athlon_imul_mem_k8" 6
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "imul")
(eq_attr "memory" "load,both")))
"athlon-vector,athlon-load,athlon-ieu,athlon-mult,athlon-ieu")
@@ -209,21 +230,23 @@
;; other instructions.
;; ??? Experiments show that the idiv can overlap with roughly 6 cycles
;; of the other code
+;; Using the same heuristics for amdfam10 as K8 with idiv
(define_insn_reservation "athlon_idiv" 6
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "idiv")
(eq_attr "memory" "none,unknown")))
"athlon-vector,(athlon-ieu0*6+(athlon-fpsched,athlon-fvector))")
(define_insn_reservation "athlon_idiv_mem" 9
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "idiv")
(eq_attr "memory" "load,both")))
"athlon-vector,((athlon-load,athlon-ieu0*6)+(athlon-fpsched,athlon-fvector))")
;; The parallelism of string instructions is not documented. Model it same way
;; as idiv to create smaller automata. This probably does not matter much.
+;; Using the same heuristics for amdfam10 as K8 with idiv
(define_insn_reservation "athlon_str" 6
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "str")
(eq_attr "memory" "load,both,store")))
"athlon-vector,athlon-load,athlon-ieu0*6")
@@ -234,34 +257,62 @@
(and (eq_attr "unit" "integer,unknown")
(eq_attr "memory" "none,unknown"))))
"athlon-direct,athlon-ieu")
+(define_insn_reservation "athlon_idirect_amdfam10" 1
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "direct")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-direct,athlon-ieu")
(define_insn_reservation "athlon_ivector" 2
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "vector")
(and (eq_attr "unit" "integer,unknown")
(eq_attr "memory" "none,unknown"))))
"athlon-vector,athlon-ieu,athlon-ieu")
+(define_insn_reservation "athlon_ivector_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-vector,athlon-ieu,athlon-ieu")
+
(define_insn_reservation "athlon_idirect_loadmov" 3
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "imov")
(eq_attr "memory" "load")))
"athlon-direct,athlon-load")
+
(define_insn_reservation "athlon_idirect_load" 4
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "direct")
(and (eq_attr "unit" "integer,unknown")
(eq_attr "memory" "load"))))
"athlon-direct,athlon-load,athlon-ieu")
+(define_insn_reservation "athlon_idirect_load_amdfam10" 4
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "direct")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-load,athlon-ieu")
(define_insn_reservation "athlon_ivector_load" 6
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "vector")
(and (eq_attr "unit" "integer,unknown")
(eq_attr "memory" "load"))))
"athlon-vector,athlon-load,athlon-ieu,athlon-ieu")
+(define_insn_reservation "athlon_ivector_load_amdfam10" 6
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "load"))))
+ "athlon-vector,athlon-load,athlon-ieu,athlon-ieu")
+
(define_insn_reservation "athlon_idirect_movstore" 1
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "imov")
(eq_attr "memory" "store")))
"athlon-direct,athlon-agu,athlon-store")
+
(define_insn_reservation "athlon_idirect_both" 4
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "direct")
@@ -270,6 +321,15 @@
"athlon-direct,athlon-load,
athlon-ieu,athlon-store,
athlon-store")
+(define_insn_reservation "athlon_idirect_both_amdfam10" 4
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "direct")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "both"))))
+ "athlon-direct,athlon-load,
+ athlon-ieu,athlon-store,
+ athlon-store")
+
(define_insn_reservation "athlon_ivector_both" 6
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "vector")
@@ -279,6 +339,16 @@
athlon-ieu,
athlon-ieu,
athlon-store")
+(define_insn_reservation "athlon_ivector_both_amdfam10" 6
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "both"))))
+ "athlon-vector,athlon-load,
+ athlon-ieu,
+ athlon-ieu,
+ athlon-store")
+
(define_insn_reservation "athlon_idirect_store" 1
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "direct")
@@ -286,6 +356,14 @@
(eq_attr "memory" "store"))))
"athlon-direct,(athlon-ieu+athlon-agu),
athlon-store")
+(define_insn_reservation "athlon_idirect_store_amdfam10" 1
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "direct")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "store"))))
+ "athlon-direct,(athlon-ieu+athlon-agu),
+ athlon-store")
+
(define_insn_reservation "athlon_ivector_store" 2
(and (eq_attr "cpu" "athlon,k8,generic64")
(and (eq_attr "athlon_decode" "vector")
@@ -293,6 +371,13 @@
(eq_attr "memory" "store"))))
"athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu,
athlon-store")
+(define_insn_reservation "athlon_ivector_store_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "amdfam10_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "store"))))
+ "athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu,
+ athlon-store")
;; Athlon floatin point unit
(define_insn_reservation "athlon_fldxf" 12
@@ -302,7 +387,7 @@
(eq_attr "mode" "XF"))))
"athlon-vector,athlon-fpload2,athlon-fvector*9")
(define_insn_reservation "athlon_fldxf_k8" 13
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fmov")
(and (eq_attr "memory" "load")
(eq_attr "mode" "XF"))))
@@ -314,7 +399,7 @@
(eq_attr "memory" "load")))
"athlon-direct,athlon-fpload,athlon-fany")
(define_insn_reservation "athlon_fld_k8" 2
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fmov")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fstore")
@@ -326,7 +411,7 @@
(eq_attr "mode" "XF"))))
"athlon-vector,(athlon-fpsched+athlon-agu),(athlon-store2+(athlon-fvector*7))")
(define_insn_reservation "athlon_fstxf_k8" 8
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fmov")
(and (eq_attr "memory" "store,both")
(eq_attr "mode" "XF"))))
@@ -337,16 +422,16 @@
(eq_attr "memory" "store,both")))
"athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
(define_insn_reservation "athlon_fst_k8" 2
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fmov")
(eq_attr "memory" "store,both")))
"athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
(define_insn_reservation "athlon_fist" 4
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fistp,fisttp"))
"athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
(define_insn_reservation "athlon_fmov" 2
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fmov"))
"athlon-direct,athlon-fpsched,athlon-faddmul")
(define_insn_reservation "athlon_fadd_load" 4
@@ -355,12 +440,12 @@
(eq_attr "memory" "load")))
"athlon-direct,athlon-fpload,athlon-fadd")
(define_insn_reservation "athlon_fadd_load_k8" 6
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fop")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fadd")
(define_insn_reservation "athlon_fadd" 4
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fop"))
"athlon-direct,athlon-fpsched,athlon-fadd")
(define_insn_reservation "athlon_fmul_load" 4
@@ -369,16 +454,16 @@
(eq_attr "memory" "load")))
"athlon-direct,athlon-fpload,athlon-fmul")
(define_insn_reservation "athlon_fmul_load_k8" 6
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fmul")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fmul")
(define_insn_reservation "athlon_fmul" 4
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fmul"))
"athlon-direct,athlon-fpsched,athlon-fmul")
(define_insn_reservation "athlon_fsgn" 2
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fsgn"))
"athlon-direct,athlon-fpsched,athlon-fmul")
(define_insn_reservation "athlon_fdiv_load" 24
@@ -387,7 +472,7 @@
(eq_attr "memory" "load")))
"athlon-direct,athlon-fpload,athlon-fmul")
(define_insn_reservation "athlon_fdiv_load_k8" 13
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fdiv")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fmul")
@@ -396,16 +481,16 @@
(eq_attr "type" "fdiv"))
"athlon-direct,athlon-fpsched,athlon-fmul")
(define_insn_reservation "athlon_fdiv_k8" 11
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(eq_attr "type" "fdiv"))
"athlon-direct,athlon-fpsched,athlon-fmul")
(define_insn_reservation "athlon_fpspc_load" 103
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "type" "fpspc")
(eq_attr "memory" "load")))
"athlon-vector,athlon-fpload,athlon-fvector")
(define_insn_reservation "athlon_fpspc" 100
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fpspc"))
"athlon-vector,athlon-fpsched,athlon-fvector")
(define_insn_reservation "athlon_fcmov_load" 7
@@ -418,12 +503,12 @@
(eq_attr "type" "fcmov"))
"athlon-vector,athlon-fpsched,athlon-fvector")
(define_insn_reservation "athlon_fcmov_load_k8" 17
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fcmov")
(eq_attr "memory" "load")))
"athlon-vector,athlon-fploadk8,athlon-fvector")
(define_insn_reservation "athlon_fcmov_k8" 15
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(eq_attr "type" "fcmov"))
"athlon-vector,athlon-fpsched,athlon-fvector")
;; fcomi is vector decoded by uses only one pipe.
@@ -434,13 +519,13 @@
(eq_attr "memory" "load"))))
"athlon-vector,athlon-fpload,athlon-fadd")
(define_insn_reservation "athlon_fcomi_load_k8" 5
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fcmp")
(and (eq_attr "athlon_decode" "vector")
(eq_attr "memory" "load"))))
"athlon-vector,athlon-fploadk8,athlon-fadd")
(define_insn_reservation "athlon_fcomi" 3
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(and (eq_attr "athlon_decode" "vector")
(eq_attr "type" "fcmp")))
"athlon-vector,athlon-fpsched,athlon-fadd")
@@ -450,18 +535,18 @@
(eq_attr "memory" "load")))
"athlon-direct,athlon-fpload,athlon-fadd")
(define_insn_reservation "athlon_fcom_load_k8" 4
- (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "cpu" "k8,generic64,amdfam10")
(and (eq_attr "type" "fcmp")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fadd")
(define_insn_reservation "athlon_fcom" 2
- (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
(eq_attr "type" "fcmp"))
"athlon-direct,athlon-fpsched,athlon-fadd")
;; Never seen by the scheduler because we still don't do post reg-stack
;; scheduling.
;(define_insn_reservation "athlon_fxch" 2
-; (and (eq_attr "cpu" "athlon,k8,generic64")
+; (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
; (eq_attr "type" "fxch"))
; "athlon-direct,athlon-fpsched,athlon-fany")
@@ -516,6 +601,23 @@
(and (eq_attr "type" "mmxmov,ssemov")
(eq_attr "memory" "load")))
"athlon-direct,athlon-fploadk8,athlon-fstore")
+;; On AMDFAM10 all double, single and integer packed and scalar SSEx data
+;; loads generated are direct path, latency of 2 and do not use any FP
+;; executions units. No seperate entries for movlpx/movhpx loads, which
+;; are direct path, latency of 4 and use the FADD/FMUL FP execution units,
+;; as they will not be generated.
+(define_insn_reservation "athlon_sseld_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "ssemov")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8")
+;; On AMDFAM10 MMX data loads generated are direct path, latency of 4
+;; and can use any FP executions units
+(define_insn_reservation "athlon_mmxld_amdfam10" 4
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "mmxmov")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8, athlon-fany")
(define_insn_reservation "athlon_mmxssest" 3
(and (eq_attr "cpu" "k8,generic64")
(and (eq_attr "type" "mmxmov,ssemov")
@@ -533,6 +635,25 @@
(and (eq_attr "type" "mmxmov,ssemov")
(eq_attr "memory" "store,both")))
"athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
+;; On AMDFAM10 all double, single and integer packed SSEx data stores
+;; generated are all double path, latency of 2 and use the FSTORE FP
+;; execution unit. No entries seperate for movupx/movdqu, which are
+;; vector path, latency of 3 and use the FSTORE*2 FP execution unit,
+;; as they will not be generated.
+(define_insn_reservation "athlon_ssest_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "ssemov")
+ (and (eq_attr "mode" "V4SF,V2DF,TI")
+ (eq_attr "memory" "store,both"))))
+ "athlon-double,(athlon-fpsched+athlon-agu),((athlon-fstore+athlon-store)*2)")
+;; On AMDFAM10 all double, single and integer scalar SSEx and MMX
+;; data stores generated are all direct path, latency of 2 and use
+;; the FSTORE FP execution unit
+(define_insn_reservation "athlon_mmxssest_short_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "mmxmov,ssemov")
+ (eq_attr "memory" "store,both")))
+ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
(define_insn_reservation "athlon_movaps_k8" 2
(and (eq_attr "cpu" "k8,generic64")
(and (eq_attr "type" "ssemov")
@@ -578,6 +699,11 @@
(and (eq_attr "type" "sselog,sselog1")
(eq_attr "memory" "load")))
"athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
+(define_insn_reservation "athlon_sselog_load_amdfam10" 4
+ (and (eq_attr "cpu" "amdfam10")
+ (and (eq_attr "type" "sselog,sselog1")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)")
(define_insn_reservation "athlon_sselog" 3
(and (eq_attr "cpu" "athlon")
(eq_attr "type" "sselog,sselog1"))
@@ -586,6 +712,11 @@
(and (eq_attr "cpu" "k8,generic64")
(eq_attr "type" "sselog,sselog1"))
"athlon-double,athlon-fpsched,athlon-fmul")
+(define_insn_reservation "athlon_sselog_amdfam10" 2
+ (and (eq_attr "cpu" "amdfam10")
+ (eq_attr "type" "sselog,sselog1"))
+ "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
+
;; ??? pcmp executes in addmul, probably not worthwhile to bother about that.
(define_insn_reservation "athlon_ssecmp_load" 2
(and (eq_attr "cpu" "athlon")
@@ -594,13 +725,13 @@
(eq_attr "memory" "load"))))
"athlon-direct,athlon-fpload,athlon-fadd")
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
More information about the svn-src-head
mailing list