Discussion:
[PATCH 0/3] MIPS executable stack protection
Leonid Yegoshin
2014-10-04 03:17:14 UTC
Permalink
The following series implements an executable stack protection in MIPS.

It sets up a per-thread 'VDSO' page and appropriate TLB support.
Page is set write-protected from user and is maintained via kernel VA.
MIPS FPU emulation is shifted to new page and stack is relieved for
execute protection as is as all data pages in default setup during ELF
binary initialization. The real protection is controlled by GLIBC and
it can do stack protected now as it is done in other architectures and
I learned today that GLIBC team is ready for this.

Note: actual execute-protection depends from HW capability, of course.

This patch is required for MIPS32/64 R2 emulation on MIPS R6 architecture.
Without it 'ssh-keygen' crashes pretty fast on attempt to execute instruction
in stack.
---

Leonid Yegoshin (3):
MIPS: mips_flush_cache_range is added
MIPS: Setup an instruction emulation in VDSO protected page instead of user stack
MIPS: set stack/data protection as non-executable


arch/mips/include/asm/cacheflush.h | 3 +
arch/mips/include/asm/mmu.h | 2 +
arch/mips/include/asm/page.h | 2 -
arch/mips/include/asm/processor.h | 2 -
arch/mips/include/asm/switch_to.h | 12 ++++
arch/mips/include/asm/thread_info.h | 3 +
arch/mips/include/asm/tlbmisc.h | 1
arch/mips/include/asm/vdso.h | 2 +
arch/mips/kernel/process.c | 7 ++
arch/mips/kernel/vdso.c | 41 ++++++++++++-
arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
arch/mips/mm/c-octeon.c | 8 ++
arch/mips/mm/c-r3k.c | 8 ++
arch/mips/mm/c-r4k.c | 43 +++++++++++++
arch/mips/mm/c-tx39.c | 9 +++
arch/mips/mm/cache.c | 4 +
arch/mips/mm/fault.c | 5 ++
arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
18 files changed, 273 insertions(+), 32 deletions(-)
--
Signature
Leonid Yegoshin
2014-10-04 03:17:22 UTC
Permalink
New function mips_flush_cache_range() is added.
It flushes D-cache on kernel VA and I-cache on user VA.
It is significant in case of cache aliasing systems.
It can be used to flush a short sequence of newly written code
to user space and especially usefull in ptrace() and dsemul().
Today a full page is flushed by flush_cache_page in ptrace().

Signed-off-by: Leonid Yegoshin <***@imgtec.com>
---
arch/mips/include/asm/cacheflush.h | 3 +++
arch/mips/mm/c-octeon.c | 8 +++++++
arch/mips/mm/c-r3k.c | 8 +++++++
arch/mips/mm/c-r4k.c | 43 ++++++++++++++++++++++++++++++++++++
arch/mips/mm/c-tx39.c | 9 ++++++++
arch/mips/mm/cache.c | 4 +++
6 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index e08381a..8305241 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -94,6 +94,9 @@ extern void (*flush_cache_sigtramp)(unsigned long addr);
extern void (*flush_icache_all)(void);
extern void (*local_flush_data_cache_page)(void * addr);
extern void (*flush_data_cache_page)(unsigned long addr);
+extern void (*mips_flush_data_cache_range)(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size);

/*
* This flag is used to indicate that the page pointed to by a pte
diff --git a/arch/mips/mm/c-octeon.c b/arch/mips/mm/c-octeon.c
index 05b1d7c..38349d1 100644
--- a/arch/mips/mm/c-octeon.c
+++ b/arch/mips/mm/c-octeon.c
@@ -178,6 +178,13 @@ static void octeon_flush_kernel_vmap_range(unsigned long vaddr, int size)
BUG();
}

+static void octeon_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ octeon_flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
/**
* Probe Octeon's caches
*
@@ -292,6 +299,7 @@ void octeon_cache_init(void)
flush_cache_sigtramp = octeon_flush_cache_sigtramp;
flush_icache_all = octeon_flush_icache_all;
flush_data_cache_page = octeon_flush_data_cache_page;
+ mips_flush_data_cache_range = octeon_flush_data_cache_range;
flush_icache_range = octeon_flush_icache_range;
local_flush_icache_range = local_octeon_flush_icache_range;

diff --git a/arch/mips/mm/c-r3k.c b/arch/mips/mm/c-r3k.c
index 135ec31..93b4810 100644
--- a/arch/mips/mm/c-r3k.c
+++ b/arch/mips/mm/c-r3k.c
@@ -273,6 +273,13 @@ static void r3k_flush_data_cache_page(unsigned long addr)
{
}

+static void r3k_mips_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ r3k_flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
static void r3k_flush_cache_sigtramp(unsigned long addr)
{
unsigned long flags;
@@ -322,6 +329,7 @@ void r3k_cache_init(void)
__flush_cache_all = r3k___flush_cache_all;
flush_cache_mm = r3k_flush_cache_mm;
flush_cache_range = r3k_flush_cache_range;
+ mips_flush_data_cache_range = r3k_mips_flush_data_cache_range;
flush_cache_page = r3k_flush_cache_page;
flush_icache_range = r3k_flush_icache_range;
local_flush_icache_range = r3k_flush_icache_range;
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index ad6ff7b..ee014e4 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -636,6 +636,48 @@ static void r4k_flush_data_cache_page(unsigned long addr)
r4k_on_each_cpu(local_r4k_flush_data_cache_page, (void *) addr);
}

+struct mips_flush_data_cache_range_args {
+ struct vm_area_struct *vma;
+ unsigned long vaddr;
+ unsigned long start;
+ unsigned long len;
+};
+
+static inline void local_r4k_mips_flush_data_cache_range(void *args)
+{
+ struct mips_flush_data_cache_range_args *f_args = args;
+ unsigned long vaddr = f_args->vaddr;
+ unsigned long start = f_args->start;
+ unsigned long len = f_args->len;
+ struct vm_area_struct * vma = f_args->vma;
+
+ blast_dcache_range(start, start + len);
+
+ if ((vma->vm_flags & VM_EXEC) && !cpu_has_ic_fills_f_dc) {
+ wmb();
+
+ /* vma is given for exec check only, mmap is current,
+ so - no non-current vma page flush, just user or kernel */
+ protected_blast_icache_range(vaddr, vaddr + len);
+ }
+}
+
+/* flush dirty kernel data and a corresponding user instructions (if needed).
+ used in copy_to_user_page() */
+static void r4k_mips_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long start,
+ unsigned long len)
+{
+ struct mips_flush_data_cache_range_args args;
+
+ args.vma = vma;
+ args.vaddr = vaddr;
+ args.start = start;
+ args.len = len;
+
+ r4k_on_each_cpu(local_r4k_mips_flush_data_cache_range, (void *)&args);
+}
+
struct flush_icache_range_args {
unsigned long start;
unsigned long end;
@@ -1656,6 +1698,7 @@ void r4k_cache_init(void)
flush_icache_all = r4k_flush_icache_all;
local_flush_data_cache_page = local_r4k_flush_data_cache_page;
flush_data_cache_page = r4k_flush_data_cache_page;
+ mips_flush_data_cache_range = r4k_mips_flush_data_cache_range;
flush_icache_range = r4k_flush_icache_range;
local_flush_icache_range = local_r4k_flush_icache_range;

diff --git a/arch/mips/mm/c-tx39.c b/arch/mips/mm/c-tx39.c
index 8d909db..9316e92 100644
--- a/arch/mips/mm/c-tx39.c
+++ b/arch/mips/mm/c-tx39.c
@@ -230,6 +230,13 @@ static void tx39_flush_data_cache_page(unsigned long addr)
tx39_blast_dcache_page(addr);
}

+static void local_flush_data_cache_range(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size)
+{
+ flush_cache_page(vma, addr, page_to_pfn(page));
+}
+
static void tx39_flush_icache_range(unsigned long start, unsigned long end)
{
if (end - start > dcache_size)
@@ -371,6 +378,7 @@ void tx39_cache_init(void)

flush_cache_sigtramp = (void *) tx39h_flush_icache_all;
local_flush_data_cache_page = (void *) tx39h_flush_icache_all;
+ mips_flush_data_cache_range = (void *) local_flush_data_cache_range;
flush_data_cache_page = (void *) tx39h_flush_icache_all;

_dma_cache_wback_inv = tx39h_dma_cache_wback_inv;
@@ -402,6 +410,7 @@ void tx39_cache_init(void)

flush_cache_sigtramp = tx39_flush_cache_sigtramp;
local_flush_data_cache_page = local_tx39_flush_data_cache_page;
+ mips_flush_data_cache_range = (void *) local_flush_data_cache_range;
flush_data_cache_page = tx39_flush_data_cache_page;

_dma_cache_wback_inv = tx39_dma_cache_wback_inv;
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 7e3ea77..b7bdd01 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -44,10 +44,14 @@ void (*__invalidate_kernel_vmap_range)(unsigned long vaddr, int size);
void (*flush_cache_sigtramp)(unsigned long addr);
void (*local_flush_data_cache_page)(void * addr);
void (*flush_data_cache_page)(unsigned long addr);
+void (*mips_flush_data_cache_range)(struct vm_area_struct *vma,
+ unsigned long vaddr, struct page *page, unsigned long addr,
+ unsigned long size);
void (*flush_icache_all)(void);

EXPORT_SYMBOL_GPL(local_flush_data_cache_page);
EXPORT_SYMBOL(flush_data_cache_page);
+EXPORT_SYMBOL(mips_flush_data_cache_range);
EXPORT_SYMBOL(flush_icache_all);

#if defined(CONFIG_DMA_NONCOHERENT) || defined(CONFIG_DMA_MAYBE_COHERENT)
Leonid Yegoshin
2014-10-04 03:17:30 UTC
Permalink
Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page which is set
write-protected for application access. VDSO page itself is now per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are possible
in MIPS32/64 R6 emulation mix with FPU emulation.

Explanation of problem (current state before patch):

If I set eXecute-disabled stack in ELF binary initialisation then GLIBC ignores
it and may change stack protection at library load. If this library has
eXecute-disabled stack then anything is OK, but if this section (PT_GNU_STACK)
is just missed as usual, then GLIBC applies it's own default == eXecute-enabled
stack.
So, ssh_keygen is built explicitly with eXecute-disabled stack. But GLIBC
ignores it and set stack executable. And because of that - anything works,
FPU emulation and hacker tools.
However, if I use all *.SO libraries with eXecute-disabled stack in PT_GNU_STACK
section then GLIBC keeps stack non-executable but things fails at FPU emulation
later.

Here are two issues which are bind together and to solve an incorrect
behaviour of GLIBC (ignoring X ssh-keygen intention) the splitting both issues
is needed. So, I did a kernel emulation protected and out of stack.

Signed-off-by: Leonid Yegoshin <***@imgtec.com>
---
arch/mips/include/asm/mmu.h | 2 +
arch/mips/include/asm/processor.h | 2 -
arch/mips/include/asm/switch_to.h | 12 ++++
arch/mips/include/asm/thread_info.h | 3 +
arch/mips/include/asm/tlbmisc.h | 1
arch/mips/include/asm/vdso.h | 2 +
arch/mips/kernel/process.c | 7 ++
arch/mips/kernel/vdso.c | 41 ++++++++++++-
arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
arch/mips/mm/fault.c | 5 ++
arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
11 files changed, 197 insertions(+), 31 deletions(-)

diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
index c436138..96266b8 100644
--- a/arch/mips/include/asm/mmu.h
+++ b/arch/mips/include/asm/mmu.h
@@ -3,7 +3,9 @@

typedef struct {
unsigned long asid[NR_CPUS];
+ unsigned long vdso_asid[NR_CPUS];
void *vdso;
+ struct vm_area_struct *vdso_vma;
} mm_context_t;

#endif /* __ASM_MMU_H */
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 05f0843..c87b1da 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -40,7 +40,7 @@ extern unsigned int vced_count, vcei_count;
* A special page (the vdso) is mapped into all processes at the very
* top of the virtual memory space.
*/
-#define SPECIAL_PAGES_SIZE PAGE_SIZE
+#define SPECIAL_PAGES_SIZE (PAGE_SIZE * 2)

#ifdef CONFIG_32BIT
#ifdef CONFIG_KVM_GUEST
diff --git a/arch/mips/include/asm/switch_to.h b/arch/mips/include/asm/switch_to.h
index b928b6f..0968f51 100644
--- a/arch/mips/include/asm/switch_to.h
+++ b/arch/mips/include/asm/switch_to.h
@@ -17,6 +17,7 @@
#include <asm/dsp.h>
#include <asm/cop2.h>
#include <asm/msa.h>
+#include <asm/mmu_context.h>

struct task_struct;

@@ -104,6 +105,16 @@ do { \
disable_msa(); \
} while (0)

+static inline void flush_vdso_page(void)
+{
+ if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
+ (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
+ cpu_asid(raw_smp_processor_id(), current->mm))) {
+ local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
+ current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
+ }
+}
+
#define finish_arch_switch(prev) \
do { \
u32 __c0_stat; \
@@ -118,6 +129,7 @@ do { \
__restore_dsp(current); \
if (cpu_has_userlocal) \
write_c0_userlocal(current_thread_info()->tp_value); \
+ flush_vdso_page(); \
__restore_watch(); \
} while (0)

diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 7de8658..8d16003 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -34,6 +34,8 @@ struct thread_info {
* 0x7fffffff for user-thead
* 0xffffffff for kernel-thread
*/
+ unsigned long vdso_offset;
+ struct page *vdso_page;
struct restart_block restart_block;
struct pt_regs *regs;
};
@@ -49,6 +51,7 @@ struct thread_info {
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
+ .vdso_page = NULL, \
.restart_block = { \
.fn = do_no_restart_syscall, \
}, \
diff --git a/arch/mips/include/asm/tlbmisc.h b/arch/mips/include/asm/tlbmisc.h
index 3a45228..abd7bf6 100644
--- a/arch/mips/include/asm/tlbmisc.h
+++ b/arch/mips/include/asm/tlbmisc.h
@@ -6,5 +6,6 @@
*/
extern void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask);
+int install_vdso_tlb(void);

#endif /* __ASM_TLBMISC_H */
diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h
index cca56aa..baf3402 100644
--- a/arch/mips/include/asm/vdso.h
+++ b/arch/mips/include/asm/vdso.h
@@ -11,6 +11,8 @@

#include <linux/types.h>

+void mips_thread_vdso(struct thread_info *ti);
+void arch_release_thread_info(struct thread_info *info);

#ifdef CONFIG_32BIT
struct mips_vdso {
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 636b074..762738a 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -42,6 +42,7 @@
#include <asm/isadep.h>
#include <asm/inst.h>
#include <asm/stacktrace.h>
+#include <asm/vdso.h>

#ifdef CONFIG_HOTPLUG_CPU
void arch_cpu_idle_dead(void)
@@ -59,6 +60,8 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
{
unsigned long status;

+ mips_thread_vdso(current_thread_info());
+
/* New thread loses kernel privileges. */
status = regs->cp0_status & ~(ST0_CU0|ST0_CU1|ST0_FR|KU_MASK);
status |= KU_USER;
@@ -75,6 +78,7 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)

void exit_thread(void)
{
+ arch_release_thread_info(current_thread_info());
}

void flush_thread(void)
@@ -103,6 +107,9 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,

preempt_enable();

+ ti->vdso_page = NULL;
+ mips_thread_vdso(ti);
+
/* set up new TSS. */
childregs = (struct pt_regs *) childksp - 1;
/* Put the stack after the struct pt_regs. */
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 0f1af58..7b31bba 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -19,6 +19,8 @@

#include <asm/vdso.h>
#include <asm/uasm.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>

/*
* Including <asm/unistd.h> would give use the 64-bit syscall numbers ...
@@ -88,14 +90,18 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)

ret = install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_MAYREAD|VM_MAYEXEC,
&vdso_page);

if (ret)
goto up_fail;

mm->context.vdso = (void *)addr;
+ /* if cache aliasing - use a different cache flush later */
+ if (cpu_has_rixi && cpu_has_dc_aliases)
+ mm->context.vdso_vma = find_vma(mm,addr);

+ mips_thread_vdso(current_thread_info());
up_fail:
up_write(&mm->mmap_sem);
return ret;
@@ -107,3 +113,36 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return "[vdso]";
return NULL;
}
+
+void mips_thread_vdso(struct thread_info *ti)
+{
+ struct page *vdso;
+ unsigned long addr;
+
+ if (cpu_has_rixi && ti->task->mm && !ti->vdso_page) {
+ vdso = alloc_page(GFP_USER);
+ if (!vdso)
+ return;
+ ti->vdso_page = vdso;
+ ti->vdso_offset = PAGE_SIZE;
+ addr = (unsigned long)page_address(vdso);
+ copy_page((void *)addr, (void *)page_address(vdso_page));
+ if (!cpu_has_ic_fills_f_dc)
+ flush_data_cache_page(addr);
+ /* any vma in mmap is used, just to get ASIDs back from mm */
+ local_flush_tlb_page(ti->task->mm->mmap,addr);
+ }
+}
+
+void arch_release_thread_info(struct thread_info *info)
+{
+ if (info->vdso_page) {
+ if (info->task->mm) {
+ /* any vma in mmap is used, just to get ASIDs */
+ local_flush_tlb_page(info->task->mm->mmap,(unsigned long)page_address(info->vdso_page));
+ info->task->mm->context.vdso_asid[smp_processor_id()] = 0;
+ }
+ __free_page(info->vdso_page);
+ info->vdso_page = NULL;
+ }
+}
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 4f514f3..274f9b7 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -3,6 +3,7 @@
#include <asm/fpu_emulator.h>
#include <asm/inst.h>
#include <asm/mipsregs.h>
+#include <asm/vdso.h>
#include <asm/uaccess.h>

#include "ieee754.h"
@@ -30,12 +31,15 @@ struct emuframe {
mips_instruction cookie;
unsigned long epc;
};
+/* round structure size to N*8 to force a fit two instructions in a single cache line */
+#define EMULFRAME_ROUNDED_SIZE ((sizeof(struct emuframe) + 0x7) & ~0x7)

int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
{
extern asmlinkage void handle_dsemulret(void);
struct emuframe __user *fr;
int err;
+ unsigned char *pg_addr;

if ((get_isa16_mode(regs->cp0_epc) && ((ir >> 16) == MM_NOP16)) ||
(ir == 0)) {
@@ -48,7 +52,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
pr_debug("dsemul %lx %lx\n", regs->cp0_epc, cpc);

/*
- * The strategy is to push the instruction onto the user stack
+ * The strategy is to push the instruction onto the user stack/VDSO page
* and put a trap after it which we can catch and jump to
* the required address any alternative apart from full
* instruction emulation!!.
@@ -65,37 +69,78 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
* handler (single entry point).
*/

- /* Ensure that the two instructions are in the same cache line */
- fr = (struct emuframe __user *)
- ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
-
- /* Verify that the stack pointer is not competely insane */
- if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
- return SIGBUS;
-
- if (get_isa16_mode(regs->cp0_epc)) {
- err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
- err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ if (current_thread_info()->vdso_page) {
+ /*
+ * Use VDSO page and fill structure via kernel VA,
+ * user write is disabled
+ */
+ pg_addr = (unsigned char *)page_address(current_thread_info()->vdso_page);
+ fr = (struct emuframe __user *)
+ (pg_addr + current_thread_info()->vdso_offset -
+ EMULFRAME_ROUNDED_SIZE);
+
+ /* verify that we don't overflow into trampoline areas */
+ if ((unsigned char *)fr < (unsigned char *)(((struct mips_vdso *)pg_addr) + 1)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ *(u16 *)&fr->emul = (u16)(ir >> 16);
+ *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);
+ } else {
+ fr->emul = ir;
+ fr->badinst = (mips_instruction)BREAK_MATH;
+ }
+ fr->cookie = (mips_instruction)BD_COOKIE;
+ fr->epc = cpc;
+
+ /* fill CP0_EPC with user VA */
+ regs->cp0_epc = ((unsigned long)(current->mm->context.vdso +
+ current_thread_info()->vdso_offset)) |
+ get_isa16_mode(regs->cp0_epc);
+ if (cpu_has_dc_aliases)
+ mips_flush_data_cache_range(current->mm->context.vdso_vma,
+ regs->cp0_epc, current_thread_info()->vdso_page,
+ (unsigned long)fr, sizeof(struct emuframe));
+ else
+ /* it is a less expensive on CPU with correct SYNCI */
+ flush_cache_sigtramp((unsigned long)fr);
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ /* Ensure that the two instructions are in the same cache line */
+ fr = (struct emuframe __user *)
+ ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
+
+ /* Verify that the stack pointer is not competely insane */
+ if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
+ return SIGBUS;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
+ err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
+ err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
+ err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ } else {
+ err = __put_user(ir, &fr->emul);
+ err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ }
+
+ err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
+ err |= __put_user(cpc, &fr->epc);
+
+ if (unlikely(err)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ regs->cp0_epc = ((unsigned long) &fr->emul) |
+ get_isa16_mode(regs->cp0_epc);
+
+ flush_cache_sigtramp((unsigned long)&fr->badinst);
}

- err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
- err |= __put_user(cpc, &fr->epc);
-
- if (unlikely(err)) {
- MIPS_FPU_EMU_INC_STATS(errors);
- return SIGBUS;
- }
-
- regs->cp0_epc = ((unsigned long) &fr->emul) |
- get_isa16_mode(regs->cp0_epc);
-
- flush_cache_sigtramp((unsigned long)&fr->badinst);
-
return SIGILL; /* force out of emulation loop */
}

@@ -156,6 +201,17 @@ int do_dsemulret(struct pt_regs *xcp)
return 0;
}

+ if (current_thread_info()->vdso_page) {
+ /* restore VDSO stack level */
+ current_thread_info()->vdso_offset += EMULFRAME_ROUNDED_SIZE;
+ if (current_thread_info()->vdso_offset > PAGE_SIZE) {
+ /* This is not a good situation to be in */
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+ force_sig(SIGBUS, current);
+
+ return 0;
+ }
+ }
/* Set EPC to return to post-branch instruction */
xcp->cp0_epc = epc;

diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index becc42b..516045a 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -26,6 +26,7 @@
#include <asm/uaccess.h>
#include <asm/ptrace.h>
#include <asm/highmem.h> /* For VMALLOC_END */
+#include <asm/tlbmisc.h>
#include <linux/kdebug.h>

/*
@@ -138,6 +139,9 @@ good_area:
#endif
goto bad_area;
}
+ if (((address & PAGE_MASK) == (unsigned long)(mm->context.vdso)) &&
+ install_vdso_tlb())
+ goto up_return;
} else {
if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
goto bad_area;
@@ -186,6 +190,7 @@ good_area:
}
}

+up_return:
up_read(&mm->mmap_sem);
return;

diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index fa6ebd4..a857e21 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -350,6 +350,45 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
local_irq_restore(flags);
}

+int install_vdso_tlb(void)
+{
+ int tlbidx;
+ unsigned long flags;
+
+ if (!current_thread_info()->vdso_page)
+ return(0);
+
+ local_irq_save(flags);
+ write_c0_entryhi(((unsigned long)current->mm->context.vdso & (PAGE_MASK << 1)) |
+ cpu_asid(smp_processor_id(), current->mm));
+
+ mtc0_tlbw_hazard();
+ tlb_probe();
+ tlb_probe_hazard();
+ tlbidx = read_c0_index();
+#if defined(CONFIG_64BIT_PHYS_ADDR) && defined(CONFIG_CPU_MIPS32)
+ write_c0_entrylo0(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))>>32);
+#else
+ write_c0_entrylo0(pte_to_entrylo(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))));
+#endif
+ write_c0_entrylo1(0);
+ mtc0_tlbw_hazard();
+ if (tlbidx < 0)
+ tlb_write_random();
+ else
+ tlb_write_indexed();
+ tlbw_use_hazard();
+
+ current->mm->context.vdso_asid[smp_processor_id()] = cpu_asid(smp_processor_id(), current->mm);
+ local_irq_restore(flags);
+
+ return(1);
+}
+
void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask)
{
Peter Zijlstra
2014-10-04 20:00:16 UTC
Permalink
Post by Leonid Yegoshin
--- a/arch/mips/include/asm/switch_to.h
+++ b/arch/mips/include/asm/switch_to.h
@@ -17,6 +17,7 @@
#include <asm/dsp.h>
#include <asm/cop2.h>
#include <asm/msa.h>
+#include <asm/mmu_context.h>
struct task_struct;
@@ -104,6 +105,16 @@ do { \
disable_msa(); \
} while (0)
+static inline void flush_vdso_page(void)
+{
+ if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
+ (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
+ cpu_asid(raw_smp_processor_id(), current->mm))) {
+ local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
+ current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
+ }
+}
Why raw_smp_processor_id() and why evaluate it 3 times, sure compilers
can be expected to do some CSE but something like:

int cpu = smp_processor_id();

if ( ... [cpu] ...)

is far more readable as well.
Post by Leonid Yegoshin
+
#define finish_arch_switch(prev) \
do { \
u32 __c0_stat; \
@@ -118,6 +129,7 @@ do { \
__restore_dsp(current); \
if (cpu_has_userlocal) \
write_c0_userlocal(current_thread_info()->tp_value); \
+ flush_vdso_page(); \
__restore_watch(); \
} while (0)
So what I didn't see is any talk about the cost of this. Surely a TLB
flush isn't exactly free.
Leonid Yegoshin
2014-10-05 05:52:49 UTC
Permalink
Post by Peter Zijlstra
Post by Leonid Yegoshin
--- a/arch/mips/include/asm/switch_to.h
+++ b/arch/mips/include/asm/switch_to.h
Why raw_smp_processor_id() and why evaluate it 3 times, sure compilers
int cpu = smp_processor_id();
if ( ... [cpu] ...)
is far more readable as well.
Sure. But may be it has sense to use raw_smp_processor_id() due to elevated preemption counter.
Post by Peter Zijlstra
Post by Leonid Yegoshin
+ flush_vdso_page(); \
So what I didn't see is any talk about the cost of this. Surely a TLB
flush isn't exactly free.
Well, flush_vdso_page() uses a local version of TLB page flushing and it is cheap 'per se' in comparison with system-wide.
And I take precautions to flush only if it matches the same memory map, so it is the situation then one pthread on some map is replaced by some pthread on the same map on the same CPU.
So, it flushes only after real use in previous pthread of that map.

However, some performance loss can be expected due to killing TLB.
In low-end cores, with small TLB array we can expect that this TLB can be kicked off anyway after context switch.
In high end cores we should expect FPU unit available and float point emulation can be very rare (un-normalized operations or so).

The only question is a middle line which has enough TLB (32 or more) but may have no float point processor. However, the emulation itself is very slow and it is natural to expect performance degradation because of float point emulation here in much higher degree than possible penalty due to early loss of TLB element.
Paul Burton
2014-10-06 12:29:17 UTC
Permalink
Post by Leonid Yegoshin
Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page which is set
write-protected for application access. VDSO page itself is now per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are possible
in MIPS32/64 R6 emulation mix with FPU emulation.
If I set eXecute-disabled stack in ELF binary initialisation then GLIBC ignores
it and may change stack protection at library load. If this library has
eXecute-disabled stack then anything is OK, but if this section (PT_GNU_STACK)
is just missed as usual, then GLIBC applies it's own default == eXecute-enabled
stack.
So, ssh_keygen is built explicitly with eXecute-disabled stack. But GLIBC
ignores it and set stack executable. And because of that - anything works,
FPU emulation and hacker tools.
However, if I use all *.SO libraries with eXecute-disabled stack in PT_GNU_STACK
section then GLIBC keeps stack non-executable but things fails at FPU emulation
later.
Here are two issues which are bind together and to solve an incorrect
behaviour of GLIBC (ignoring X ssh-keygen intention) the splitting both issues
is needed. So, I did a kernel emulation protected and out of stack.
Hi Leonid,

First some general questions: is there any reason to need the page used
to be at the same virtual address for each thread? I can't think of one,
and if that's the case then why not simply allocate a series of pages
per-thread via mmap_region or similar, on demand when a thread first
executes an FP branch instruction? That would of course consume some
more virtual address space, but would avoid the hassles of manually
prodding at the TLB, tracking ASIDs & flushing the caches. Perhaps the
shrinker interface could be used to allow freeing those pages if & when
it becomes necessary for long running threads.

Also VDSO is really a misnomer throughout, as I've pointed out to you
before. I'm aware it's an existing misnomer, but it would be nice if
we could clear that up rather than repeat it...
Post by Leonid Yegoshin
---
arch/mips/include/asm/mmu.h | 2 +
arch/mips/include/asm/processor.h | 2 -
arch/mips/include/asm/switch_to.h | 12 ++++
arch/mips/include/asm/thread_info.h | 3 +
arch/mips/include/asm/tlbmisc.h | 1
arch/mips/include/asm/vdso.h | 2 +
arch/mips/kernel/process.c | 7 ++
arch/mips/kernel/vdso.c | 41 ++++++++++++-
arch/mips/math-emu/dsemul.c | 114 ++++++++++++++++++++++++++---------
arch/mips/mm/fault.c | 5 ++
arch/mips/mm/tlb-r4k.c | 39 ++++++++++++
11 files changed, 197 insertions(+), 31 deletions(-)
diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h
index c436138..96266b8 100644
--- a/arch/mips/include/asm/mmu.h
+++ b/arch/mips/include/asm/mmu.h
@@ -3,7 +3,9 @@
typedef struct {
unsigned long asid[NR_CPUS];
+ unsigned long vdso_asid[NR_CPUS];
void *vdso;
+ struct vm_area_struct *vdso_vma;
} mm_context_t;
#endif /* __ASM_MMU_H */
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 05f0843..c87b1da 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -40,7 +40,7 @@ extern unsigned int vced_count, vcei_count;
* A special page (the vdso) is mapped into all processes at the very
* top of the virtual memory space.
*/
-#define SPECIAL_PAGES_SIZE PAGE_SIZE
+#define SPECIAL_PAGES_SIZE (PAGE_SIZE * 2)
#ifdef CONFIG_32BIT
#ifdef CONFIG_KVM_GUEST
diff --git a/arch/mips/include/asm/switch_to.h b/arch/mips/include/asm/switch_to.h
index b928b6f..0968f51 100644
--- a/arch/mips/include/asm/switch_to.h
+++ b/arch/mips/include/asm/switch_to.h
@@ -17,6 +17,7 @@
#include <asm/dsp.h>
#include <asm/cop2.h>
#include <asm/msa.h>
+#include <asm/mmu_context.h>
struct task_struct;
@@ -104,6 +105,16 @@ do { \
disable_msa(); \
} while (0)
+static inline void flush_vdso_page(void)
+{
+ if (current->mm && cpu_context(raw_smp_processor_id(), current->mm) &&
+ (current->mm->context.vdso_asid[raw_smp_processor_id()] ==
+ cpu_asid(raw_smp_processor_id(), current->mm))) {
+ local_flush_tlb_page(current->mm->mmap, (unsigned long)current->mm->context.vdso);
+ current->mm->context.vdso_asid[raw_smp_processor_id()] = 0;
+ }
+}
+
#define finish_arch_switch(prev) \
do { \
u32 __c0_stat; \
@@ -118,6 +129,7 @@ do { \
__restore_dsp(current); \
if (cpu_has_userlocal) \
write_c0_userlocal(current_thread_info()->tp_value); \
+ flush_vdso_page(); \
__restore_watch(); \
} while (0)
diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 7de8658..8d16003 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -34,6 +34,8 @@ struct thread_info {
* 0x7fffffff for user-thead
* 0xffffffff for kernel-thread
*/
+ unsigned long vdso_offset;
+ struct page *vdso_page;
struct restart_block restart_block;
struct pt_regs *regs;
};
@@ -49,6 +51,7 @@ struct thread_info {
.cpu = 0, \
.preempt_count = INIT_PREEMPT_COUNT, \
.addr_limit = KERNEL_DS, \
+ .vdso_page = NULL, \
.restart_block = { \
.fn = do_no_restart_syscall, \
}, \
diff --git a/arch/mips/include/asm/tlbmisc.h b/arch/mips/include/asm/tlbmisc.h
index 3a45228..abd7bf6 100644
--- a/arch/mips/include/asm/tlbmisc.h
+++ b/arch/mips/include/asm/tlbmisc.h
@@ -6,5 +6,6 @@
*/
extern void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask);
+int install_vdso_tlb(void);
#endif /* __ASM_TLBMISC_H */
diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h
index cca56aa..baf3402 100644
--- a/arch/mips/include/asm/vdso.h
+++ b/arch/mips/include/asm/vdso.h
@@ -11,6 +11,8 @@
#include <linux/types.h>
+void mips_thread_vdso(struct thread_info *ti);
+void arch_release_thread_info(struct thread_info *info);
#ifdef CONFIG_32BIT
struct mips_vdso {
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 636b074..762738a 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -42,6 +42,7 @@
#include <asm/isadep.h>
#include <asm/inst.h>
#include <asm/stacktrace.h>
+#include <asm/vdso.h>
#ifdef CONFIG_HOTPLUG_CPU
void arch_cpu_idle_dead(void)
@@ -59,6 +60,8 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
{
unsigned long status;
+ mips_thread_vdso(current_thread_info());
+
/* New thread loses kernel privileges. */
status = regs->cp0_status & ~(ST0_CU0|ST0_CU1|ST0_FR|KU_MASK);
status |= KU_USER;
@@ -75,6 +78,7 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
void exit_thread(void)
{
+ arch_release_thread_info(current_thread_info());
}
void flush_thread(void)
@@ -103,6 +107,9 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
preempt_enable();
+ ti->vdso_page = NULL;
+ mips_thread_vdso(ti);
+
/* set up new TSS. */
childregs = (struct pt_regs *) childksp - 1;
/* Put the stack after the struct pt_regs. */
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 0f1af58..7b31bba 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -19,6 +19,8 @@
#include <asm/vdso.h>
#include <asm/uasm.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
/*
* Including <asm/unistd.h> would give use the 64-bit syscall numbers ...
@@ -88,14 +90,18 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
ret = install_special_mapping(mm, addr, PAGE_SIZE,
VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ VM_MAYREAD|VM_MAYEXEC,
&vdso_page);
if (ret)
goto up_fail;
mm->context.vdso = (void *)addr;
+ /* if cache aliasing - use a different cache flush later */
+ if (cpu_has_rixi && cpu_has_dc_aliases)
+ mm->context.vdso_vma = find_vma(mm,addr);
+ mips_thread_vdso(current_thread_info());
up_write(&mm->mmap_sem);
return ret;
@@ -107,3 +113,36 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return "[vdso]";
return NULL;
}
+
+void mips_thread_vdso(struct thread_info *ti)
+{
+ struct page *vdso;
+ unsigned long addr;
+
+ if (cpu_has_rixi && ti->task->mm && !ti->vdso_page) {
+ vdso = alloc_page(GFP_USER);
+ if (!vdso)
+ return;
+ ti->vdso_page = vdso;
+ ti->vdso_offset = PAGE_SIZE;
+ addr = (unsigned long)page_address(vdso);
+ copy_page((void *)addr, (void *)page_address(vdso_page));
+ if (!cpu_has_ic_fills_f_dc)
+ flush_data_cache_page(addr);
+ /* any vma in mmap is used, just to get ASIDs back from mm */
+ local_flush_tlb_page(ti->task->mm->mmap,addr);
+ }
+}
+
+void arch_release_thread_info(struct thread_info *info)
+{
+ if (info->vdso_page) {
+ if (info->task->mm) {
+ /* any vma in mmap is used, just to get ASIDs */
+ local_flush_tlb_page(info->task->mm->mmap,(unsigned long)page_address(info->vdso_page));
+ info->task->mm->context.vdso_asid[smp_processor_id()] = 0;
+ }
+ __free_page(info->vdso_page);
+ info->vdso_page = NULL;
+ }
+}
diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c
index 4f514f3..274f9b7 100644
--- a/arch/mips/math-emu/dsemul.c
+++ b/arch/mips/math-emu/dsemul.c
@@ -3,6 +3,7 @@
#include <asm/fpu_emulator.h>
#include <asm/inst.h>
#include <asm/mipsregs.h>
+#include <asm/vdso.h>
#include <asm/uaccess.h>
#include "ieee754.h"
@@ -30,12 +31,15 @@ struct emuframe {
mips_instruction cookie;
unsigned long epc;
};
+/* round structure size to N*8 to force a fit two instructions in a single cache line */
+#define EMULFRAME_ROUNDED_SIZE ((sizeof(struct emuframe) + 0x7) & ~0x7)
int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
{
extern asmlinkage void handle_dsemulret(void);
struct emuframe __user *fr;
int err;
+ unsigned char *pg_addr;
if ((get_isa16_mode(regs->cp0_epc) && ((ir >> 16) == MM_NOP16)) ||
(ir == 0)) {
@@ -48,7 +52,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
pr_debug("dsemul %lx %lx\n", regs->cp0_epc, cpc);
/*
- * The strategy is to push the instruction onto the user stack
+ * The strategy is to push the instruction onto the user stack/VDSO page
* and put a trap after it which we can catch and jump to
* the required address any alternative apart from full
* instruction emulation!!.
@@ -65,37 +69,78 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, unsigned long cpc)
* handler (single entry point).
*/
- /* Ensure that the two instructions are in the same cache line */
- fr = (struct emuframe __user *)
- ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
-
- /* Verify that the stack pointer is not competely insane */
- if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
- return SIGBUS;
-
- if (get_isa16_mode(regs->cp0_epc)) {
- err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
- err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
- err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
- err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ if (current_thread_info()->vdso_page) {
+ /*
+ * Use VDSO page and fill structure via kernel VA,
+ * user write is disabled
+ */
+ pg_addr = (unsigned char *)page_address(current_thread_info()->vdso_page);
+ fr = (struct emuframe __user *)
+ (pg_addr + current_thread_info()->vdso_offset -
+ EMULFRAME_ROUNDED_SIZE);
+
+ /* verify that we don't overflow into trampoline areas */
+ if ((unsigned char *)fr < (unsigned char *)(((struct mips_vdso *)pg_addr) + 1)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ *(u16 *)&fr->emul = (u16)(ir >> 16);
+ *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);
This microMIPS case doesn't set badinst, as I pointed out internally. I
think it would be much cleaner if you were to do something along the
lines of:

struct emuframe __user _fr, *fr, *user_fr;

if (current_thread_info()->vdso_page) {
fr = user_fr = (struct emuframe __user *)
(pg_addr + current_thread_info()->vdso_offset -
EMULFRAME_ROUNDED_SIZE);
} else {
fr = &_fr;
user_fr = (struct emuframe __user *)
((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
}

/* initialize frame */
fr->inst = ...;
fr->badinst = ...;

if (fr != user_fr)
copy_to_user(user_fr, fr, sizeof(*fr));

That way you don't duplicate the initialization of the struct emuframe
and leave it less likely you'll miss cases like above.

Or the on-stack case could be removed entirely.

Paul
Post by Leonid Yegoshin
+ } else {
+ fr->emul = ir;
+ fr->badinst = (mips_instruction)BREAK_MATH;
+ }
+ fr->cookie = (mips_instruction)BD_COOKIE;
+ fr->epc = cpc;
+
+ /* fill CP0_EPC with user VA */
+ regs->cp0_epc = ((unsigned long)(current->mm->context.vdso +
+ current_thread_info()->vdso_offset)) |
+ get_isa16_mode(regs->cp0_epc);
+ if (cpu_has_dc_aliases)
+ mips_flush_data_cache_range(current->mm->context.vdso_vma,
+ regs->cp0_epc, current_thread_info()->vdso_page,
+ (unsigned long)fr, sizeof(struct emuframe));
+ else
+ /* it is a less expensive on CPU with correct SYNCI */
+ flush_cache_sigtramp((unsigned long)fr);
} else {
- err = __put_user(ir, &fr->emul);
- err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ /* Ensure that the two instructions are in the same cache line */
+ fr = (struct emuframe __user *)
+ ((regs->regs[29] - sizeof(struct emuframe)) & ~0x7);
+
+ /* Verify that the stack pointer is not competely insane */
+ if (unlikely(!access_ok(VERIFY_WRITE, fr, sizeof(struct emuframe))))
+ return SIGBUS;
+
+ if (get_isa16_mode(regs->cp0_epc)) {
+ err = __put_user(ir >> 16, (u16 __user *)(&fr->emul));
+ err |= __put_user(ir & 0xffff, (u16 __user *)((long)(&fr->emul) + 2));
+ err |= __put_user(BREAK_MATH >> 16, (u16 __user *)(&fr->badinst));
+ err |= __put_user(BREAK_MATH & 0xffff, (u16 __user *)((long)(&fr->badinst) + 2));
+ } else {
+ err = __put_user(ir, &fr->emul);
+ err |= __put_user((mips_instruction)BREAK_MATH, &fr->badinst);
+ }
+
+ err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
+ err |= __put_user(cpc, &fr->epc);
+
+ if (unlikely(err)) {
+ MIPS_FPU_EMU_INC_STATS(errors);
+ return SIGBUS;
+ }
+
+ regs->cp0_epc = ((unsigned long) &fr->emul) |
+ get_isa16_mode(regs->cp0_epc);
+
+ flush_cache_sigtramp((unsigned long)&fr->badinst);
}
- err |= __put_user((mips_instruction)BD_COOKIE, &fr->cookie);
- err |= __put_user(cpc, &fr->epc);
-
- if (unlikely(err)) {
- MIPS_FPU_EMU_INC_STATS(errors);
- return SIGBUS;
- }
-
- regs->cp0_epc = ((unsigned long) &fr->emul) |
- get_isa16_mode(regs->cp0_epc);
-
- flush_cache_sigtramp((unsigned long)&fr->badinst);
-
return SIGILL; /* force out of emulation loop */
}
@@ -156,6 +201,17 @@ int do_dsemulret(struct pt_regs *xcp)
return 0;
}
+ if (current_thread_info()->vdso_page) {
+ /* restore VDSO stack level */
+ current_thread_info()->vdso_offset += EMULFRAME_ROUNDED_SIZE;
+ if (current_thread_info()->vdso_offset > PAGE_SIZE) {
+ /* This is not a good situation to be in */
+ current_thread_info()->vdso_offset -= EMULFRAME_ROUNDED_SIZE;
+ force_sig(SIGBUS, current);
+
+ return 0;
+ }
+ }
/* Set EPC to return to post-branch instruction */
xcp->cp0_epc = epc;
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index becc42b..516045a 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -26,6 +26,7 @@
#include <asm/uaccess.h>
#include <asm/ptrace.h>
#include <asm/highmem.h> /* For VMALLOC_END */
+#include <asm/tlbmisc.h>
#include <linux/kdebug.h>
/*
#endif
goto bad_area;
}
+ if (((address & PAGE_MASK) == (unsigned long)(mm->context.vdso)) &&
+ install_vdso_tlb())
+ goto up_return;
} else {
if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
goto bad_area;
}
}
up_read(&mm->mmap_sem);
return;
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index fa6ebd4..a857e21 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -350,6 +350,45 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
local_irq_restore(flags);
}
+int install_vdso_tlb(void)
+{
+ int tlbidx;
+ unsigned long flags;
+
+ if (!current_thread_info()->vdso_page)
+ return(0);
+
+ local_irq_save(flags);
+ write_c0_entryhi(((unsigned long)current->mm->context.vdso & (PAGE_MASK << 1)) |
+ cpu_asid(smp_processor_id(), current->mm));
+
+ mtc0_tlbw_hazard();
+ tlb_probe();
+ tlb_probe_hazard();
+ tlbidx = read_c0_index();
+#if defined(CONFIG_64BIT_PHYS_ADDR) && defined(CONFIG_CPU_MIPS32)
+ write_c0_entrylo0(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))>>32);
+#else
+ write_c0_entrylo0(pte_to_entrylo(pte_val(pfn_pte(
+ page_to_pfn(current_thread_info()->vdso_page),
+ __pgprot(_page_cachable_default|_PAGE_VALID)))));
+#endif
+ write_c0_entrylo1(0);
+ mtc0_tlbw_hazard();
+ if (tlbidx < 0)
+ tlb_write_random();
+ else
+ tlb_write_indexed();
+ tlbw_use_hazard();
+
+ current->mm->context.vdso_asid[smp_processor_id()] = cpu_asid(smp_processor_id(), current->mm);
+ local_irq_restore(flags);
+
+ return(1);
+}
+
void add_wired_entry(unsigned long entrylo0, unsigned long entrylo1,
unsigned long entryhi, unsigned long pagemask)
{
Leonid Yegoshin
2014-10-06 20:42:32 UTC
Permalink
Post by Paul Burton
Post by Paul Burton
First some general questions: is there any reason to need the page used
to be at the same virtual address for each thread? I can't think of one,
and if that's the case then why not simply allocate a series of pages
per-thread via mmap_region or similar, on demand when a thread first
executes an FP branch instruction? That would of course consume some
more virtual address space, but would avoid the hassles of manually
prodding at the TLB, tracking ASIDs & flushing the caches. Perhaps the
shrinker interface could be used to allow freeing those pages if & when
it becomes necessary for long running threads.
The only reason to have the same virtual address is to keep mmap
accounting the same. An original 'VDSO' is presented in mmap for all
threads of the same mmap.

As for another approach, I think it may be too much code to handle it
and too much implicit interlinks with common Linux code and GLIBC/bionic.
Post by Paul Burton
Post by Paul Burton
Also VDSO is really a misnomer throughout, as I've pointed out to you
before. I'm aware it's an existing misnomer, but it would be nice if
we could clear that up rather than repeat it...
Yes, I agree but that is outside of this patch. I think it has sense to
rename the current stuff to something like "Emulation" right before some
patch which implement the real VDSO capability on MIPS.
Post by Paul Burton
Post by Paul Burton
+ if (get_isa16_mode(regs->cp0_epc)) {
+ *(u16 *)&fr->emul = (u16)(ir >> 16);
+ *((u16 *)(&fr->emul) + 1) = (u16)(ir & 0xffff);
This microMIPS case doesn't set badinst, as I pointed out internally.
Thank you, I missed it, may be due to long citation. I will add it.
Post by Paul Burton
I
think it would be much cleaner if you were to do something along the
I try to keep it as close as an original code for better understanding.
Even with it there are questions.

Your variant may be cleaner but it may be some next style change patch.

- Leonid.

David Daney
2014-10-06 18:05:44 UTC
Permalink
Post by Leonid Yegoshin
Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page which is set
write-protected for application access. VDSO page itself is now per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are possible
in MIPS32/64 R6 emulation mix with FPU emulation.
Can you explain how this per-thread mapping works.

I am especially interested in what happens when a different thread from
the thread using the special mapping, issues flush_tlb_mm(), and
invalidates the TLBs on all CPUs. How does the TLB entry for the
special mapping survive this?

David Daney
Leonid Yegoshin
2014-10-06 20:03:43 UTC
Permalink
Post by David Daney
Post by Leonid Yegoshin
Historically, during FPU emulation MIPS runs live BD-slot instruction in stack.
This is needed because it was the only way to correctly handle branch
exceptions with unknown COP2 instructions in BD-slot. Now there is
an eXecuteInhibit feature and it is desirable to protect stack from execution
for security reasons.
This patch moves FPU emulation from stack area to VDSO-located page which is set
write-protected for application access. VDSO page itself is now per-thread and
it's addresses and offsets are stored in thread_info.
Small stack of emulation blocks is supported because nested traps are possible
in MIPS32/64 R6 emulation mix with FPU emulation.
Can you explain how this per-thread mapping works.
I am especially interested in what happens when a different thread
from the thread using the special mapping, issues flush_tlb_mm(), and
invalidates the TLBs on all CPUs. How does the TLB entry for the
special mapping survive this?
This patch works as long as 'install_special_mapping()' doesn't change
PTE itself but installs Page Fault handler. It is the only hidden
dependency from common Linux code.

MIPS code allocates a page (copy of a standard 'VDSO' page) and links it
to thread_info and handles all allocation/deallocation/thread creation
via arch hooks. It does it only for thread which have a memory map, not
for kernel threads. Oh, it does all stuff only if CPU has RI/XI
capability - the HW execute inhibit feature, otherwise it works as is
done today.

It still does attachment of a standard 'VDSO' page to memory map for
accounting purpose, so /proc/.../maps shows [VDSO] page. However the new
(per-thread) page is actually a shadow.

Then TLB refill happens it loads an empty PTE and subsequent TLBL (TLB
load Page Fault) comes to MIPS C-code which recognizes 'VDSO' address
and asks install_vdso_tlb() to fill TLB directly and marks ASID of it in
memory map for this CPU.

At process (read - thread) reschedule there is a check that on this CPU
some previous thread of the same memory map loads TLB via comparing
ASIDs. If that happend and ASIDs are the same, then local_flush_tlb_page
is called to eliminate this TLB because it has the same ASID but can
have a different per-thread page.

Because PTE stays as 0x00..00 and never changes then this activity
starts again after eviction of TLB due to some reason - either
flush_tlb_mm(), either other flush or either eviction due to TLB array
HW or SW replacements, but only if page is demanded again.

Now, the emulation part: some stack of emulation blocks can be used
from top of page. Each time during emulation of FPU instruction from
BD-slot it takes a kernel VA of page and puts that into stack but
changes a thread EPC to user VA of that block. It uses a cache flush via
different addresses here (D-cache via kernel VA and I-cache via user VA)
in case of cache aliasing and new functions is needed to avoid a huge
performance loss from flush_cache_page(). It uses a regular
flush_cache_sigtramp() in absence of cache aliasing because in some
systems it can be much faster (via SYNCI).

Stack of emulation blocks is needed because I work on MIPS32/64 R6
architecture kernel and there is a need for emulation of some removed
MIPS R2 instructions. And a reentry of emulation may happens in some
rare cases - FPU emulation and MIPS R2 emulation subsystems are
different pieces.


Note: After Peter Zijlstra note about performance I am thinking about
adding the check of situation then the same single thread is rescheduled
again on the same CPU and don't flush TLB in this case. It just requires
yet another array of process-ids or 'VDSO' pages - one element per CPU
and I am weighting it against schedule time interval. Today array is max
8 elements for MIPS but it can change in future. There is also a
possibility to write a special TLB flush function which compares TLB
element address with page address and skips TLB element eviction if
address compares.

- Leonid.
Leonid Yegoshin
2014-10-04 03:17:39 UTC
Permalink
This is a last step of 3 patches which shift FPU emulation out of
stack into protected area. So, it disables a default executable stack.

Additionally, it sets a default data area non-executable protection.

Signed-off-by: Leonid Yegoshin <***@imgtec.com>
---
arch/mips/include/asm/page.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index 3be8180..d49ba81 100644
--- a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -230,7 +230,7 @@ extern int __virt_addr_valid(const volatile void *kaddr);
#define virt_addr_valid(kaddr) \
__virt_addr_valid((const volatile void *) (kaddr))

-#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_EXEC | \
+#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | \
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)

#define UNCAC_ADDR(addr) ((addr) - PAGE_OFFSET + UNCAC_BASE)
Peter Zijlstra
2014-10-04 08:23:07 UTC
Permalink
Post by Leonid Yegoshin
The following series implements an executable stack protection in MIPS.
It sets up a per-thread 'VDSO' page and appropriate TLB support.
So traditionally we've always avoided per-thread pages like that. What
makes it worth it on MIPS?
Linus Torvalds
2014-10-04 16:03:53 UTC
Permalink
Post by Peter Zijlstra
Post by Leonid Yegoshin
The following series implements an executable stack protection in MIPS.
It sets up a per-thread 'VDSO' page and appropriate TLB support.
So traditionally we've always avoided per-thread pages like that. What
makes it worth it on MIPS?
Nothing makes it worth it on MIPS.

It may be easy to implement when you have all software-fill of TLB's,
but it's a mistake even then - because it means that you'll never be
able to do hardware TLB walkers.

And MIPS *does* have hardware TLB walkers, even if they are not
necessarily available everywhere.

So this is a horrible idea. Don't do it. Page tables need to be
per-process, not per thread, so that page tables can be shared.

Linus
Leonid Yegoshin
2014-10-04 16:17:03 UTC
Permalink
Linus, it works on CPU with hardware page table walker - MIPS P5600 aka Apache.

I was involved in architecture development of HTW and took care of it.
Post by Peter Zijlstra
Post by Leonid Yegoshin
The following series implements an executable stack protection in MIPS.
It sets up a per-thread 'VDSO' page and appropriate TLB support.
So traditionally we've always avoided per-thread pages like that. What
makes it worth it on MIPS?
Nothing makes it worth it on MIPS.

It may be easy to implement when you have all software-fill of TLB's,
but it's a mistake even then - because it means that you'll never be
able to do hardware TLB walkers.

And MIPS *does* have hardware TLB walkers, even if they are not
necessarily available everywhere.

So this is a horrible idea. Don't do it. Page tables need to be
per-process, not per thread, so that page tables can be shared.

Linus
Linus Torvalds
2014-10-04 16:27:59 UTC
Permalink
On Sat, Oct 4, 2014 at 9:17 AM, Leonid Yegoshin
Post by Leonid Yegoshin
Linus, it works on CPU with hardware page table walker - MIPS P5600 aka Apache.
I was involved in architecture development of HTW and took care of it.
Ok, as long as it works architecturally, per-thread TLB fills are fine by me.

Linus
Leonid Yegoshin
2014-10-04 15:33:36 UTC
Permalink
Post by Peter Zijlstra
Post by Leonid Yegoshin
It sets up a per-thread 'VDSO' page and appropriate TLB support.
So traditionally we've always avoided per-thread pages like that.
What makes it worth it on MIPS?
MIPS has branch delay slots - it is an instruction after branch which is executed
before branch is taken. If branch fails due to FPU unavailability then that
instruction should be emulated as well as branch itself.
However, MIPS allows to have a customisable coprocessor 2 instructions
and it is impractical to emulate it and big amount of other traditional MIPS
instructions inside of kernel.

So, some per thread space is needed to put instruction into it, enclose it with
a return kernel call and switch temporary execution into it.

Currently, this space is space at SP register (user stack) but it prevents
switching stack as non-executable.

Handle another stack set (one stack per thread) in common user map is
impractical because of management, scalability and perfo
Loading...