bit twiddling
• java (and c, c++) allow for handling integer types as sequences of bits. no “conversion to bits” needed: they already are.
• operations and their uses:
mask set flip flip all
00101100 00101100 00101100
& 10100111 i 10100111 10100111 10100111 00100100 10101111 10001011 01011000
• shifting:
left arithmetic right logical right
10101101 << 3 10101101 >> 3 10101100 >>> 3
01101000 11110101 00010101
(—1) >>> 29? =.
. what is:
x>> n?
(‘ >>> 3) & ((1<<5)—1)?
lcst rnofrired: wed oct 1 14:17:19 2014 c56l: lecture 14 9
j acrobat file edit view window help 7 + [‘ (4:39) fri 1:26 pm paul hilfinger qb
iectl4.pdf
:reate
13 / 13
bookmarks
cs61b lecture 14
integers
1p integer types arid literals
modular arithmetic
modular 4rithmetic:
examples
modular arithmetic and sits
negative numbers con’iersion
promotion bit twiddling
flip
00101100
 10100111
arithmetic right
3 10101101 >> 3
11110101
+
49.6%
!
i=
—


lj 
tools
comment share
j
bit twiddling
• java (and c. c++) allow for handling integer types as sequences of bats no conversion to bit? needed: they already are.
operations and their uses:
mask
00101100
10100111
set
00101100
i 10100111
flip all
10101111
00100100
• shifting:
left
10101101
01101000
10001011
 10100111
01011000
(—1) >>> 29
.r << 71?
. what is:
i >> n?
logical right
10101100 >>> 3
00010101
(i >>> 3)
=4
= i 
= .r y’ (i.e., rounded down). ((i.<<5)—1)? sbit integer, bits 3—7 of r.
y1j
11.00 x 8.50 in
c561b lecture #15
announcements:
• project #1 to be released today (we hope).
• programming contest s tomorrow!
readings for today: data structures (into java), chapter 1; readings for next topics: data structures, chapter 24
lcsr moificd: fi, oct 3 12:15:29 2014 cs6l8: .cci’zrc 15
k
what are the questions?
• cost is a principal concern throughout engineering:
an engineer is someone who can do for a dime what any fool can do for a dollar.”
• cost can mean
— operational cost (for programs, time to run, space requirements).
 development costs: flow much engineering time? when delivered?
 costs of failure: how robust? how safe?
• is this program fast enough? depends on:
 for whcrl purpose;
 whot input daft..
• how much space (memory, disk space)?
 again depends on what input data.
• how will it scale, as input gets big?
lzst mothfcd: fri oct 3 12:15:29 2014 c5al: lccturc l5 2
k
enhightenihg example
problem: scan a text corpus (say io bytes or so), and find and print the 20 most frequently used words, together with counts of how often they occur.
• solution 1 (kriuth): heavyduty data structures
 hash trie implementation, randomized placement, pointers galore, several pages long.
• solution 2 (doug mcllroy): unix shell script:
tr —c —s [:alpha:] ‘ [\n*]’ < file i \
sort i \
uniq—c i \
sort —n —r —k 1,1 i \
sed 20q
• which is better?
 #1 is much faster,
 but #2 took 5 minutes to write and processes 20mg in 1 minute.
ipick#2.
• in most cases, anything will do: keep it simple.
lcz.cr motif led: fri oct 3 12:15:29 2014 csála: cturc 15 3
cost measures (time)
• wallclock or execution time
 you can do this at home:
time java findprimes 1000
 advantages: easy to measure, meaning is obvious.
 appropriate where time is critical (realtime systems, e.g.).
 disadvantages: applies only to specific data set, compiler, machine, etc.
• number of times certain statements are executed:
 advantages: more general (not sensitive to speed of machine).
 disadvantages: doesn’t tell you actual time, still applies only to specific data sets.
• symbolic execution times:
 that is, formulas for execution times or statement counts in terms of input size.
 advantages: applies to all inputs, makes scaling clear.
 disadvantage: practical formula must be approximate, may tell very little about actual time.
lcst morficd: fri oci 3 12:15:29 2014 csala: irc 15 4
asymptotic cost
• symbolic execution time lets us see shape of the cost function.
• since we are approximating anyway, pointless to be precise about certain things:
 ehavior on small inputs:
* can always preca(culate some results. :qc times for small inputs not usually important.
— constant factors (as in ‘off by factor of 2”):
* just changing machines causes constantfactor change.
• ow to abstract away from (i.e., ignore) these thinys2
lcst rnorficd: fi, oct 3 12:15:29 2014 csala: lccturc 15 5
handy tool: order notation
• idea: bori’t try to produce specific functions that specify size, but rather families of similar functions.
• say something like f is bounded by g if it is in g’s family.”
• for any function g(i’), the functions 2g(x), l000g(i), or for any k > 0, k g(i), all have the same ‘shape”. so put all of them into g’s family.
• any function h.ix) such that h(x) = k gx) for x > m (for some constant m) has gs shape “except for small values.” so put all of these in g’s family.
• if we want upper limits, throw in all functions that are everywhere <some other member of g’s family. call this family 0(g) or 0(g(n)).
• or, if we want lower limits, throw in all functions that are everywhere some other member of g’s family. call this family g).
• finally, define e(g) = 0(g) fl !2(g’—the set of functions bracketed by members of g’s family.
lest rnoifrcd: fri oct 3 12:15:29 2014 csó1: lccturc 15 6
big oh
. goal: specify bounding from above.
here,
f(i) <2g(x) a long as i> 1,
• so f(i) is in g’s upperbound family, written f(’) € q(g(r)).
even though f(x) > g(x) everywhere.
_\f= 1
fk’)
y x)
list mo&iied: fri oct 3 12:15:29 2014
c5al: lxturc 15 7
big omega
. goal: specify bounding from below:
f’(i) 1g(i) as
longasi >11
/ fj .l
f’(x
ij..5g( .r:’
• so f’(i’) is in g’s lowerbound family, written f’(i) c(g(x)).
• .. even thouyh f(i) <gx) everywhere.
• in fact, we also have f’(i’) o(g(.r)) and f(i) ci(g(.r)) and so we
can also write
f(’). f’(i) c
_j= i
/
/
/
i
/
lost rnorhcd: fn oct 3 12:15:29 2014
csal:locturc15 8
why it mafters
• computer scientists often talk as if constant factors didn’t matter at all, only the difference of (3(n) vs.
• in reality they do, but we still have a point: at some point, constants get swamped.
n 16 ig n n ii ig n n2
2 16 1.1 2 1 5 4
4 :32 2 4 16 64 16
8 48 2.8 8 24 64 512 256
16 61 1 16 64 256 1. 096 65. 6:36
:32 50 7 :32 160 1024 :32. 768 1.2 x io
(31 96 s 61 :354 4.096 262.111 1.sx 1u
128 112 11 128 896 16.384 2.1 x io :34 x 1o
1.021 160 :32 1.024 1(1240 1.0 x 106 1.1 x i09 1.3 x1o°
)2o :320 11y24 1.0 x 106 2.1 x 10 1.1 x 10 1.2 x 101s 6.7 x l03i5.62
lc.rr no&ficd: fri oct 3 12:15:29 2014 csa1: lccturc 15 9
some intuition on meaning of growth
• how big a problem can you solve in a given time?
• in the following table, left column shows time in microseconds to solve a given problem as a. function of problem size n.
• entries show the size of problem that can be solved in ci second, hour, month (31 days), and century, for various relationships between time required and problem size.
• n = problem size
time (psec) for max n possible in
problem size n 1 second 1 hour 1 month 1 century
i’
ig n 10:300000 101000000000 1o10 10910
n 10 :3.6. i09 2.7 10’ :3.2. io’5
nlg n 6:3000 1.:3 i0 ta 1010 6.0 . io’3
1000 60000 1.6. 106 3.6. io
n3 100 1500 14000 150000
20 :32 41 51
last rnoificd: fri oct 3 12:15:29 2014 cs6lb: lccture 15 10
using the notation
• can use this order notation for any kind of realvalued function.
• we will use them to describe cost functions. example:
/** find position of x in list l. return —1 if not found */ mt find (list l, object x) {
mt c;
for (c = 0; l != null; l = l.next, c ÷= 1)
if (x.equals (l.head)) return c;
return 1;
}
• choose representative operation: number of . equals tests.
• if n is length of l, then loop does at most n tests: worstcase time is s tests.
• in fact, total # of instructions executed is roughly proportional to n in the worst case, so can also say worstcase time is 0(x), regardless of units used to measure.
• use n> 11 provision (in defn. of 0 . i) to handle empty list.
lc.cr meifrcd: fr oct 3 12:15:29 2014 cs6i8: lecture l5 h