摘要:一、HashMap1.1 hashmap 結(jié)構(gòu) HashMap使用hash算法來(lái)實(shí)現(xiàn)字典數(shù)據(jù)結(jié)構(gòu)(dictionary),任何hash算法必然伴隨著hash沖突問(wèn)題,而常用解決的方案有大致上有線(xiàn)性探測(cè),二次探測(cè),再hash等等,Hashmap中使用重鏈表法來(lái)解決。思路非常簡(jiǎn)單,即hash table中存儲(chǔ)的不是數(shù)據(jù),而是一個(gè)鏈表的數(shù)據(jù)結(jié)構(gòu),當(dāng)發(fā)生沖突的時(shí)候,使用鏈表一并保存。static&nbs
一、HashMap
1.1 hashmap 結(jié)構(gòu)
HashMap使用hash算法來(lái)實(shí)現(xiàn)字典數(shù)據(jù)結(jié)構(gòu)(dictionary),任何hash算法必然伴隨著hash沖突問(wèn)題,而常用解決的方案有大致上有線(xiàn)性探測(cè),二次探測(cè),再hash等等,Hashmap中使用重鏈表法來(lái)解決。思路非常簡(jiǎn)單,即hash table中存儲(chǔ)的不是數(shù)據(jù),而是一個(gè)鏈表的數(shù)據(jù)結(jié)構(gòu),當(dāng)發(fā)生沖突的時(shí)候,使用鏈表一并保存。
static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; //....部分源碼
1.2 hash算法的實(shí)現(xiàn)
hash算法是什么?簡(jiǎn)單來(lái)說(shuō)就是通過(guò)一種將Object->int將任何的對(duì)象轉(zhuǎn)換成一個(gè)int數(shù)值,根據(jù)這個(gè)int值進(jìn)行再次計(jì)算,(可以是對(duì)整個(gè)數(shù)組長(zhǎng)度去模)實(shí)現(xiàn)迅速定位數(shù)組下標(biāo),從而進(jìn)行高效put和get操作的算法。HashMap中并不是使用取模,而是使用了更加巧妙的方法,下文中會(huì)進(jìn)行描述。
static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }
說(shuō)明:(h>>>16),將得到的32位int往右移動(dòng)了16位和 h=(key.hashCode())計(jì)算到的hash碼做^運(yùn)算,結(jié)果就是高16位不變,低16位和高16位異或計(jì)算。同時(shí)用到了高16位和低16位,計(jì)算結(jié)果更加發(fā)散。
取數(shù)組下標(biāo)的時(shí)候運(yùn)算是 (n-1) & hash進(jìn)行計(jì)算,其中n=tab.length,hash是計(jì)算到的hash結(jié)果,一般的想法都是計(jì)算hash值然后對(duì)n-1進(jìn)行取模,設(shè)計(jì)者先抑或計(jì)算可以使數(shù)據(jù)更加的分散,從而減少hash聚集(hash聚集就是hash沖突的誘因),減少hash沖突。而且(n-1)&hash算法使得HashMap resize更加的巧妙和容易。下面進(jìn)行描述
1.3 resize算法的實(shí)現(xiàn)
HashMap中的長(zhǎng)度默認(rèn)是16,即2的4次方,每次擴(kuò)容也是2倍,是2的整數(shù)倍,根據(jù)上面的定位算法,(n-1)&hash。當(dāng)從16擴(kuò)容為32時(shí)
因此,在擴(kuò)容的時(shí)候,不需要重新的計(jì)算hash,hash算法也是有相當(dāng)cost的,只需要看原來(lái)的hash值往左一位是0還是1,是0的話(huà)不變,是1的話(huà)+oldCap,例如
這樣設(shè)計(jì)非常巧妙,即省去了重新計(jì)算hash值的時(shí)間,同時(shí)保證了一定的均勻。
1.4 put(Object key)實(shí)現(xiàn)思路
(1) key對(duì)象必須重寫(xiě)hashCode方法以計(jì)算hash值,通過(guò)上文描述的方式來(lái)計(jì)算索引位置index
(2) 如果沒(méi)有發(fā)生hash碰撞,以鏈表的的形式存入buckets數(shù)組對(duì)應(yīng)位置
(3) 如果發(fā)生了hash碰撞:
a. 如果碰撞導(dǎo)致鏈表過(guò)長(zhǎng)>TREEIFY_THRESHOLD,將鏈表結(jié)構(gòu)轉(zhuǎn)換為紅黑樹(shù)結(jié)構(gòu)(TreeNode就是紅黑樹(shù)),提高lookup效率
b. 沒(méi)有過(guò)長(zhǎng)就直接放到鏈表結(jié)構(gòu)中
(4) 注意會(huì)調(diào)用key的Equals()方法判斷對(duì)象已經(jīng)存在,存在就替換
(5) 如果是新增且bucket超過(guò)了factor*capacity限制,resize操作
1.5 get(Object key)實(shí)現(xiàn)思路
根據(jù)key的hashcode計(jì)算索引:
(1) 不存在就返回null
(2) 存在,鏈表結(jié)構(gòu),根據(jù)equals()方法依次判斷..O(n)的時(shí)間復(fù)雜度
(3) 存在,紅黑樹(shù)結(jié)構(gòu),根據(jù)equals()方法查找... O(logN)的時(shí)間復(fù)雜度
1.6 總結(jié)
HashMap的文檔其實(shí)非常詳細(xì),而根據(jù)以上描述的內(nèi)容,應(yīng)該比較好理解該文檔。
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly[粗糙的] equivalent[等價(jià)] to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses[分散] the elements properly among the buckets. Iteration over collection views requires time proportional[比例,均衡] to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
(跟hash算法有關(guān)的數(shù)據(jù)結(jié)構(gòu)中,Key對(duì)象Class必須重寫(xiě)Equals()方法和hashCode()方法)
上述文檔中描述了以下內(nèi)容:
(1) 基于map接口實(shí)現(xiàn)的hash表,實(shí)現(xiàn)了map接口所有方法,允許null values和the null key(多個(gè)null values 和唯一的null key),不保證map中元素的有序;
(2) 算法實(shí)現(xiàn)中,put和get方法保證常量級(jí)別的get()和put()效率,且有較好的分散性;
(3) 對(duì)集合的迭代操作耗時(shí)與(hashmap對(duì)象的大小,即number of buckets * bucket中元素個(gè)數(shù))成比例
(4) 初始化hashmap的時(shí)候不要把size調(diào)太高或者loadfactor調(diào)太低;
(5) 一個(gè)hashmap的對(duì)象實(shí)例有2個(gè)參數(shù)很重要:
initital capacity: The capacity is the number of the buckets in the hash table
load factor: The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically incresed.
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed. (that is , 內(nèi)部數(shù)據(jù)結(jié)構(gòu)會(huì)重建),因此hash table大約擴(kuò)容2倍
(6) 通用準(zhǔn)備,默認(rèn)的load factor為0.75比較合適。較大的load factor會(huì)decrease the space overhead,即降低空間上的負(fù)載(增長(zhǎng)的慢),但是會(huì)增加the lookup cost(reflected in most of the operations of the HashMap class),為什么較大的factor會(huì)降低get和put的效率,hashmap設(shè)計(jì)采用重鏈表法來(lái)解決hash,較大的factor值導(dǎo)致了更少的可用索引,自然會(huì)導(dǎo)致更多的hash聚集,增加了hash沖突的概率,每個(gè)數(shù)據(jù)節(jié)點(diǎn)鏈表中mappings多的話(huà),查找和刪除的成本自然就增大了。
(7) 如果非常多的映射要存,設(shè)置一個(gè)相對(duì)較大的capacity可以減少resize次數(shù),提高效率.如果Key Class實(shí)現(xiàn)的hashcode()方法不高明,或者說(shuō)這個(gè)算法容易出現(xiàn)hash聚集現(xiàn)象,HashMap性能勢(shì)必不高.
(8) HashMap線(xiàn)程不安全的
(9) 迭代器操作采取是的fail-fast的方式,即假如正在進(jìn)行迭代操作,如果有其他的線(xiàn)程或者當(dāng)前線(xiàn)程中調(diào)用remove()或者put()方法修改了數(shù)據(jù),直接跑出異常ConcurrentModificationException,迭代迅速失敗
(10) fail-fast并不能保證線(xiàn)程安全,如果對(duì)并發(fā)安全有要求,請(qǐng)不要使用HashMap
二、LinkedHashMap
public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V>
繼承于HashMap,LinkedHashMap實(shí)現(xiàn)和HashMap實(shí)現(xiàn)的不同之處在于,LinkedHashMap維護(hù)著一個(gè)運(yùn)行于所有條目的雙重鏈接列表。此鏈接列表定義了迭代順序,這個(gè)順序可以使插入元素的順序,也可以是訪(fǎng)問(wèn)元素的順序。
private static class Entry<K,V> extends HashMap.Entry<K,V> { Entry<K,V> before, after; …… }
2.1 構(gòu)造器
public LinkedHashMap(int initialCapacity, float loadFactor,boolean accessOrder) { super(initialCapacity, loadFactor); this.accessOrder = accessOrder; }
當(dāng)accessOrder=false時(shí)候表示元素的順序按照插入元素的順序,true則表示是訪(fǎng)問(wèn)元素的順序
由于改成了雙向鏈表,因此,重寫(xiě)了方法
void reinitialize() { super.reinitialize(); head = tail = null; }
下面是一個(gè)小demo:
//1.插入元素的順序 Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,false); //2.訪(fǎng)問(wèn)元素的順序 //Map<String, String> map = new LinkedHashMap<String, String>(16,0.75f,true); map.put("apple", "蘋(píng)果"); map.put("watermelon", "西瓜"); map.put("banana", "香蕉"); map.put("peach", "桃子"); map.get("watermelon"); map.get("peach"); Iterator iter = map.entrySet().iterator(); while (iter.hasNext()) { Map.Entry entry = (Map.Entry) iter.next(); System.out.println(entry.getKey() + "=" + entry.getValue()); }
2.2 實(shí)現(xiàn)
這里根據(jù)源碼進(jìn)行簡(jiǎn)介
LinkedHashMap維護(hù)順序的思路其實(shí)非常簡(jiǎn)單,用和table無(wú)關(guān)的2個(gè)變量head,tail根據(jù)是訪(fǎng)問(wèn)順序還是讀取順序來(lái)記錄,將以前的數(shù)據(jù)結(jié)構(gòu)改為雙端鏈表結(jié)構(gòu)即可。由于需要雙端鏈表來(lái)維護(hù)訪(fǎng)問(wèn)順序或者插入順序,需要重寫(xiě)節(jié)點(diǎn)的構(gòu)造以及刪除方法,實(shí)現(xiàn)如下:
Node<K,V> newNode(int hash, K key, V value, Node<K,V> e) { LinkedHashMap.Entry<K,V> p = new LinkedHashMap.Entry<K,V>(hash, key, value, e); linkNodeLast(p); return p; } Node<K,V> replacementNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; LinkedHashMap.Entry<K,V> t = new LinkedHashMap.Entry<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; } TreeNode<K,V> newTreeNode(int hash, K key, V value, Node<K,V> next) { TreeNode<K,V> p = new TreeNode<K,V>(hash, key, value, next); linkNodeLast(p); return p; } TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) { LinkedHashMap.Entry<K,V> q = (LinkedHashMap.Entry<K,V>)p; TreeNode<K,V> t = new TreeNode<K,V>(q.hash, q.key, q.value, next); transferLinks(q, t); return t; }
在jdk1.8中,HashMap為L(zhǎng)inkedHashMap專(zhuān)門(mén)提供了鉤子方法:
// Callbacks to allow LinkedHashMap post-actions void afterNodeAccess(Node<K,V> p) { } void afterNodeInsertion(boolean evict) { } void afterNodeRemoval(Node<K,V> p) { }
實(shí)現(xiàn)如下:
// 在節(jié)點(diǎn)刪除之后unlink,即雙端都設(shè)置為null void afterNodeRemoval(Node<K,V> e) { // unlink LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.before = p.after = null; if (b == null) head = a; else b.after = a; if (a == null) tail = b; else a.before = b; } //插入節(jié)點(diǎn)之后,很有可能要移除元素 void afterNodeInsertion(boolean evict) { // possibly remove eldest LinkedHashMap.Entry<K,V> first; if (evict && (first = head) != null && removeEldestEntry(first)) { K key = first.key; removeNode(hash(key), key, null, false, true); } } // 在元素被訪(fǎng)問(wèn)之后,用于修改鏈表順序,將被訪(fǎng)問(wèn)的的元素放置到最后 void afterNodeAccess(Node<K,V> e) { // move node to last LinkedHashMap.Entry<K,V> last; if (accessOrder && (last = tail) != e) { LinkedHashMap.Entry<K,V> p = (LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after; p.after = null; if (b == null) head = a; else b.after = a; if (a != null) a.before = b; else last = b; if (last == null) head = p; else { p.before = last; last.after = p; } tail = p; ++modCount; } }
在讀取的時(shí)候,有可能會(huì)根據(jù)訪(fǎng)問(wèn)順序來(lái)重新排序
public V get(Object key) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return null; if (accessOrder) afterNodeAccess(e); return e.value; } /** * {@inheritDoc} */ public V getOrDefault(Object key, V defaultValue) { Node<K,V> e; if ((e = getNode(hash(key), key)) == null) return defaultValue; if (accessOrder) afterNodeAccess(e); return e.value; }
2.3 LinkedHashMap與LRU算法
LRU 緩存利用了這樣的一種思想。LRU 是 Least Recently Used 的縮寫(xiě),翻譯過(guò)來(lái)就是“最近最少使用”,也就是說(shuō),LRU 緩存把最近最少使用的數(shù)據(jù)移除,讓給最新讀取的數(shù)據(jù)。而往往最常讀取的,也是讀取次數(shù)最多的,所以,利用 LRU 緩存,我們能夠提高系統(tǒng)的 performance。以下主要用于演示LRU Cache的實(shí)現(xiàn)思路,實(shí)踐中LRU緩存更傾向于使用Guava中的實(shí)現(xiàn)。
https://github.com/tracylihui/LRUcache-Java上有LRUCache的簡(jiǎn)單實(shí)現(xiàn),思路非常簡(jiǎn)單,重寫(xiě)LinkedHashMap中的鉤子方法removeEldestEntry方法即可
package ysz.demo;import java.util.LinkedHashMap;import java.util.Collection;import java.util.Map;import java.util.ArrayList;/** * An LRU cache, based on <code>LinkedHashMap</code>. * * <p> * This cache has a fixed maximum number of elements (<code>cacheSize</code>). * If the cache is full and another entry is added, the LRU (least recently * used) entry is dropped. * * <p> * This class is thread-safe. All methods of this class are synchronized. * * <p> * Author: Christian d'Heureuse, Inventec Informatik AG, Zurich, Switzerland<br> * Multi-licensed: EPL / LGPL / GPL / AL / BSD. */public class LRUCache<K, V> { private static final float hashTableLoadFactor = 0.75f; private LinkedHashMap<K, V> map; private int cacheSize; /** * Creates a new LRU cache. 在該方法中,new LinkedHashMap<K,V>(hashTableCapacity, * hashTableLoadFactor, true)中,true代表使用訪(fǎng)問(wèn)順序 * * @param cacheSize * the maximum number of entries that will be kept in this cache. */ public LRUCache(int cacheSize) { this.cacheSize = cacheSize; int hashTableCapacity = (int) Math .ceil(cacheSize / hashTableLoadFactor) + 1; map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor, true) { // (an anonymous inner class) private static final long serialVersionUID = 1; @Override protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return size() > LRUCache.this.cacheSize; } }; } /** * Retrieves an entry from the cache.<br> * The retrieved entry becomes the MRU (most recently used) entry. * * @param key * the key whose associated value is to be returned. * @return the value associated to this key, or null if no value with this * key exists in the cache. */ public synchronized V get(K key) { return map.get(key); } /** * Adds an entry to this cache. The new entry becomes the MRU (most recently * used) entry. If an entry with the specified key already exists in the * cache, it is replaced by the new entry. If the cache is full, the LRU * (least recently used) entry is removed from the cache. * * @param key * the key with which the specified value is to be associated. * @param value * a value to be associated with the specified key. */ public synchronized void put(K key, V value) { map.put(key, value); } /** * Clears the cache. */ public synchronized void clear() { map.clear(); } /** * Returns the number of used entries in the cache. * * @return the number of entries currently in the cache. */ public synchronized int usedEntries() { return map.size(); } /** * Returns a <code>Collection</code> that contains a copy of all cache * entries. * * @return a <code>Collection</code> with a copy of the cache content. */ public synchronized Collection<Map.Entry<K, V>> getAll() { return new ArrayList<Map.Entry<K, V>>(map.entrySet()); } // Test routine for the LRUCache class. public static void main(String[] args) { LRUCache<String, String> c = new LRUCache<String, String>(3); c.put("1", "one"); // 1 c.put("2", "two"); // 2 1 c.put("3", "three"); // 3 2 1 c.put("4", "four"); // 4 3 2 if (c.get("2") == null) throw new Error(); // 2 4 3 c.put("5", "five"); // 5 2 4 c.put("4", "second four"); // 4 5 2 // Verify cache content. if (c.usedEntries() != 3) throw new Error(); if (!c.get("4").equals("second four")) throw new Error(); if (!c.get("5").equals("five")) throw new Error(); if (!c.get("2").equals("two")) throw new Error(); // List cache content. for (Map.Entry<String, String> e : c.getAll()) System.out.println(e.getKey() + " : " + e.getValue()); } }